My home on the web - featuring my real-life persona!

Check spelling in HTML files

This is an interesting test - is anyone still listening/reading? It has been very quiet here while I was finishing my BS in Computer Science (look how I casually slipped that in!) and we had some huge projects at work - one increased my work load to handling 14 languages.

Anyway, I believe I am not the only one who translates a lot of HTML files. I use TagEditor and I like it (yes I do, I am not getting paid to say this) but one of it’s shortcomings is the spell checker. Like many, I have used workarounds to make sure my spelling is OK. Usually, I copy the text into Word to check the spelling. If I find a typo or want to change a sentence, I go back into TagEditor, make the change there to update the TM and then export another target file. And I always wished, I could just check the spelling in the HTML file itself.

I know a lot of WYSIWYG HTML editors have a spell checker. I can also open the HTML file in Word and make changes directly in Word. Either solution is not recommended because more often than not, they mess with the HTML code. Sometimes, it is just the indentation of the code but often they add or change tags which can cause problems on client site or even worse, corrupt the page.

The other problem is that you cannot see the file in its “natural habitat”, ie. processed by a browser. Word tries to show you what the page may look like but it is IE exclusive. Most WYSIWYG editors also emulate/integrate one browser or another to display the page but especially when using CSS styles, they usually fail.

I have been looking for a spell checker that works directly in Firefox, my default browser. There are many spell checkers that work in input locations, for example text boxes and fields, but nothing worked in the display static text of a web page. I looked back, and in December 2007 I was on the hunt the last time. Apparently I didn’t run into web pages issues since then because only today this problem came up again. Again, I did a quick search and low and behold, there now is a bookmarklet/plugin that works! I am trying to track down the origins, and I believe it originated from Urbano’s Blog written by alex. On a lot of other sites I found an additional link using that JavaScript that you can just drag and drop onto your bookmark toolbar in FF: Spellcheck Anywhere

Using it couldn’t be easier. Open the HTML page you want to check in Firefox (and this should be the default when double-clicking an HTML page anyway), then click on the bookmarklet and there you go, spell checking in your browser window! You can change the language to whatever spell checker you have installed - just click Ctrl-A to highlight all text, then right-click on any text portion and in the context menu select the desired language from Languages. If you need more languages, you can download those for free from Firefox Dictionaries & Language Packs

There is one small problem though, while it looks like it allows you to edit the HTML page, the changes are are not saved when you try to save the edited file. This is a little unfortunate because it would spare me the extra step of saving the target file in TagEditor again, but nevertheless, this will make my life so much easier!

OMG - It’s full of mistakes!

So, we just had one of our applications translated into Greek. It is a very big application for a total of 13,000 words just strings. Initially, we had about a month so time was not a big issue and the translation got started on May 8th. Of course, these things change and all of a sudden, we needed it not by the beginning of June but for a show on May 20th. That means 12 days for the translation, cleaning up the bilingual files, importing the strings, fixing truncations and other issues, testing functionality and compiling DLLs. Of course we made it!

Now I was waiting for feedback. Nothing at all from the guys who were at the show. No “good job”, no “shame on you” - nothing. After a week, I inquired and I got the reply back that there were “a big number of errors”. That sent a shiver down my spine. We don’t have many translations into Greek, only one other application so I don’t know this translator very well. We don’t have any Greek reference material, but I asked and he confirmed that he knew the subject matter. And I myself can of course not check anything in Greek.

Turns out, it wasn’t all that bad. We had issues for all language because unless you are a printing press operator, you really can’t figure out some things. I remember asking our German guys questions and they had no clue either. Unfortunately, some terms that were wrong occured 50 or even 100 times so yeah, it looks like a lot. Correcting all strings took me a couple of hours of manual copy/paste, which is not bad at all.

It just irks me that the only feedback I get was that there are a lot of errors (which wasn’t even true). He never acknowledged that we did the impossible by turning this around so fast and that it worked fine. Only the tester mentioned that this must have been the fastest turn-around we had for any language but I am also getting a lot better at handling languages I know nothing about. The last translation we did for that was Russian - I am fine navigating through French, Spanish, Italian and Portuguese, but Russian and Greek are a whole different animal. If I see a truncation at runtime, I can’t just type in the text I see and search for it - I need a virtual keyboard and go letter by letter type in a keyword to search for. And I am amazed how nicely Trados and TagEditor handle the different character sets. I don’t think many people know what an ordeal it can be to have an application ready for non-Western character sets.

Ah well, believe it or not, I still love doing it - it’s a big girl puzzle and I am getting paid to solve it!

Fun with character encodings

What do ASCII, ANSI, Latin-1, Windows-1252, Unicode and UTF have in common?

They are a pain in the neck for translators - but also, they are ways to encode characters in files, even in plain text files that usually seem as “un-encoded” as possible. Most of the time, you don’t have a problem with it, you open a txt file, you don’t really know (or need to know) what character format it has. The only reason why most people even know about this is because of the “bush hid the facts” (see below) trick in Notepad. I am not going into the history and details of the various formats, at the bottom are some links to other pages that deal with that if you want to learn more. I am merely looking at the consequences it can have for me during translation.

What I care more about is the fact that it can really break your neck during translation of string files. I run into that on and off and every time it happens, I learn a little bit more about it. I wanted to write about it since quite a while, and since the whole thing came down again earlier this week, I think it is time now.

We have a little update tool for an application that is written in Java. Java programs usually have their strings in .properties files. Those files are usually encoded in the 8-bit characters of ISO 8859-1 (aka Latin-1) which contains most “regular” characters but lacks support for language specific characters like ü Ü é or ñ. Those characters have to be converted into Unicode escape characters sometimes referred to as Java escape characters. I think most of us have experienced other escape characters, for example the \n for a new line, \t for a tab. Unicode escape characters are a little more involved, using a \uHHHH notation, where HHHH is the hex index of the character in the Unicode character set. So, for example the ß in a Java properties file has to be encoded into \u00df. To convert those characters, I use Rainbow which is part of the Okapi Framework. It has a handy Encoding Conversion Utility that allows you to convert files from one encoding to another.

Sounds really easy, right? Right? Now what is this woman complaining about again? Well, it’s not that easy. The conversion tool is designed to work with 8-bit ASCII-based encodings. Now, so what IS the problem - it was just stated that Java properties files are ASCII-based encodings? Well, TagEditor takes the ASCII file and when you “Save as Target” after translation, it converts the file into a UTF-8. And that is still not the problem, the problem is that it uses a UTF-8 format without a BOM (Byte Order Mark). The BOM is an (invisible) 2 byte sequence in the beginning of a file which basically tells a program “This is a Unicode file”. Without the BOM, some programs do not recognize the encoding of the file and assume ASCII - and that is the problem with Rainbow (and also with Passolo, a program that just got bought by SDL).

If you try to convert the encoding of a BOMless Unicode file, it goes terribly wrong. As I mentioned, the correct conversion of ß will give you \u00df. Converting a BOMless file will “double escape” the extended characters, and you get \u00c3\u0178 - clearly not the same. The “double escape” is actually a good indicator that something went wrong, if you check your file and see that your extended characters are represented by two escape sequences, you know something went wrong. Of course, that can be difficult when dealing with languages like Greek, Russian or Asian languages, simply because every single character is escaped. I usually try to find a short string and count.

Now, how do you know how a file is encoded? Right now, I use Notepad++ to check. It has a handy little Format menu and allows you to see which encoding is used and it also allows you to convert from one encoding to another. Supported formats are Windows, UNIX, Mac, ANSI, UTF-8 w/o BOM, UTF-8 and UCS-2 Big and Little Endian. Surprisingly, Windows Notepad is one of the few programs that actually manages to decipher the Unicode encoding even without a BOM, just open the BOMless file in Windows Notepad and save them without change. Unfortunately, you usually just don’t know and usually it isn’t even an issue.

I actually happen to get to talk to Yves Savourel, who is working at ENLASO and with the Okapi Framework (and about a gazillion other things related to localization), and he has been very helpful. He explained a few things to me a little better.

    The issue:

  • a BOMless UTF-8 file is recognized as “windows-1252″ encoding
  • a UTF-8 file uses two or more bytes to encode the extended characters
  • the application thinks each of those bytes is a separate character and converts each into a Unicode escape sequence
    The solution:

  • in Rainbow, manually force the encoding of the source file to UTF-8
  • in Rainbow, use the Add/Remove BOM utility to set the BOM properly

If you got through all this stuff, you may now wonder if you’ll ever run into this issue. It is also not just about BOM or not, the whole file encoding raises issues in other applications too. To be honest, I don’t know how often freelance translators are confronted with these types of files, but here are the situations where I keep my eye peeled:

  • Java files (.properties)
    This was the most recent issue that triggered this post.
  • String export files (often XML files or even plain txt)
    I tend to get the strings for REALBasic applications in XML files, though I believe they are created by RegexBuddy.
  • Non-Windows files or Windows files that will be used on other OSs
    We run into this issue with txt files the were created on a Mac and that will be used in InstallShield-type applications, for example to display the license agreement or a readme file.
  • All files
    Haha, very funny - I know. What I mean is, I have experienced various issues with files, if I have to process them through different applications in order to get CAT-translatable files, for example if we receive a weird string file that Trados doesn’t understand and where we need to find a managable way to extract translatable text.

Anyway, maybe this will help someone else in the situation where the client comes back and claims the files are corrupt or so. Otherwise, I apologize for boring the heck out of you. You should have stopped reading my post a long time ago :-)

Some interesting links with related information:

Okapi Framework
Notepad++
Bush hid the facts hoax and Bush hid the facts on Wikipedia
Mojibake
How to Determine Text File Encoding
Cast of Characters: ASCII, ANSI, UTF-8 and all that

SDL Synergy follow-up

A short while back, I explained how I started to use SDL Synergy to manage my multilingual projects. Back then I realized that I need to use the “packages” function in order to really utilize the functions. I was a little uncomfortable with this because I don’t really like to impose a new process on my freelancers, so I actually asked them what they think and how comfortable they are working with those files. Well, they said it is no big deal and one actually replied “Don’t worry, most of the projects I work on are more painful!” - I take that as a compliment. I have only used the packages with my two main translators though, but that is simply because nothing has come up for my “rare” languages.

Now it’s almost 3 weeks later, and I still like it. It really is a lot faster for me to process the files for translation and receiving back the return packages is also very easy. I am so happy that I don’t have to switch back and forth between the different translation memories anymore.

Does anyone else receive packages from clients? How do you like working with them?

Trados customer service (or customer service in general)

Since it just came up on the Trados mailing list again, I just like to post my rant here too.

A lot of people complain about how bad Trados is and even more, how horrible their customer service is. Now, don’t get me wrong, I have had my share of issues with Trados and their service, but in all honesty - it’s the same at almost every other company.

I think one big part of the Trados support issues is that most translators need and use Trados 8 to 10 hours a day and that many rely on Trados for their livelihood. At the same time, many people are not willing to pay for support (Trados PSMA costs 20% of the product cost). Has anyone ever looked up how much support people make? Last time I checked, even jobs labeled “Support Specialist” (ie. not call center) earned well under $40,000 a year. If you know a lot about computers and if you have the required experience, do you want to listen to customers on a support hotline 8 hours a day for that money? Yeah, me neither…

Here are some of my past support experiences:

  • Comcast: their service is horrible and I dread having to call them. Every now and then you get someone who actually knows their stuff and the conversion is pleasant. Most times, they don’t know anything, read of solutions from a script and put you on hold forever.
  • Best Buy: the bulb of our fairly new projector blew after 90 days even though it was supposed to last 3000 hours on eco mode. Since projectors are fickle little things, we had actually purchased the extended support warranty for $180. We wrapped up the projector and went to the close by Best Buy to get the bulb exchanged. Well, not so much. They said we have to go home and file the claim online. We could also call support; the number is also on the web. Say what? So we went back home and I called customer support. I talked to someone who had absolutely no clue and kept asking me for the screen size of my projection TV. I kept telling her it is a PROJECTOR not a REAR PROJECTION TV, and she gave up and connected me to someone else. That someone else told me, that the bulb shouldn’t blow that fast (Duh!) and I should better contact the manufacturer because it is still under their warranty. We ended up having to send the projector back to Optoma (UPS insured for $60) and “just” two weeks later, we had our device back. And that was service we actually had paid for.
  • Dell: I just had one brief encounter trying to change an order for a monitor after it all of a sudden showed “out of stock” with a delivery time of 4 weeks. My call was routed all over the world. One call I just had to hang up because I could not understand the person and apparently, he couldn’t understand me either because my request to transfer the call was ignored. One woman I talked to said she was from the Philippines, another one probably from India. They were trying to be helpful and they were nice, but their scripted answers are just a waste of time. If I have a problem bad enough that I can’t solve it, their script most probably can’t and I get the dreaded “You have to format your hard drive” answer.
  • KIA: my husband’s remote key was in really bad condition. The buttons were punched through and the door release didn’t work. So, for his birthday I took his car, went to the KIA dealer and asked if I can get a new remote key. The guy looked it up and said they have one key - costs around $80, but they can’t cut it. He sent me to a locksmith that they had a deal with to get it cut. After that I have to get back to get it programmed. So I paid for the key, went to the locksmith who cut it for free and went back to the dealer. They sent out a guy (reeking of last night’s party) with a little computer who clearly didn’t know what he was doing. First, he couldn’t find the location to connect the computer to the car. Once he found that he started punching the keys while quietly cursing. After ten minutes, he said it doesn’t work and that it is the car’s fault. I asked him what to do now, and he said I have to get the car repaired - and with that he turned around and left me standing there. I went back in trying to explain that this key is basically worthless to me now, since the remote opening function was what I was looking for. Now, since it was cut I couldn’t return it, and since they couldn’t tell me what was wrong, the repair cost was “undetermined”.
  • Airlines: I don’t believe I have to get into this - do I? I haven’t had a trouble-free flight for at least 5 years. In those 5 years, I have been stranded over night in Manchester (New Hampshire), Dallas, and in Chicago. They have lost my luggage on the way to Germany; I have missed flights because of delays. I was on standby even though I had a ticket in Philadelphia, I spent hours sitting in a plane that wasn’t moving or circling the tarmac and of course I had plain old delays. And you know what? I don’t even travel much!

Now, I have to add some positive notes, the customer support at T-Mobile (my cell phone provider) and Medion (for my laptop) have been pretty good. T-Mobile has the friendliest support staff, and I am not talking the usual fake support friendliness but people who are cordial and who actually are able to crack a little joke every now and then.

In the end, I am happy that I don’t have many computer problems, and that I can solve most of my computer issues myself. At times, I actually enjoy doing so and playing computer surgeon, but I know it is not for everyone. But if that is the case, you may have to pay someone.