My home on the web - featuring my real-life persona!

Cold callers asking for call-back

Is this normal? I am receiving an unsolicited call from a translation agency which I have never done business with. The name appeared on the caller ID so I didn’t pick up. The caller left a message, asking me to call him back. Is this normal? Why would I want to call him back? Usually, cold callers will just try again, but not ask you to call them.

I hope this doesn’t go around, and next I have Planned Parenthood, the local Police Department, the local Fire Department, Clean Water Action, the Juvenile Diabetes Research Foundation and what not call and ask me to call them back. Now, don’t get me wrong - I would like to get this message and then be able to decide whether I call back or not. And of course, if I DON’T call back, they should take the hint that I am not interested. But they don’t and just keep calling.

In recent days, this procedure and the whole “charities begging for money thing” has actually turned me away from giving money to anyone. If you give something once, they will call you every other month and ask for more. And they will not take No for an answer.

Fun with character encodings

What do ASCII, ANSI, Latin-1, Windows-1252, Unicode and UTF have in common?

They are a pain in the neck for translators - but also, they are ways to encode characters in files, even in plain text files that usually seem as “un-encoded” as possible. Most of the time, you don’t have a problem with it, you open a txt file, you don’t really know (or need to know) what character format it has. The only reason why most people even know about this is because of the “bush hid the facts” (see below) trick in Notepad. I am not going into the history and details of the various formats, at the bottom are some links to other pages that deal with that if you want to learn more. I am merely looking at the consequences it can have for me during translation.

What I care more about is the fact that it can really break your neck during translation of string files. I run into that on and off and every time it happens, I learn a little bit more about it. I wanted to write about it since quite a while, and since the whole thing came down again earlier this week, I think it is time now.

We have a little update tool for an application that is written in Java. Java programs usually have their strings in .properties files. Those files are usually encoded in the 8-bit characters of ISO 8859-1 (aka Latin-1) which contains most “regular” characters but lacks support for language specific characters like ü Ü é or ñ. Those characters have to be converted into Unicode escape characters sometimes referred to as Java escape characters. I think most of us have experienced other escape characters, for example the \n for a new line, \t for a tab. Unicode escape characters are a little more involved, using a \uHHHH notation, where HHHH is the hex index of the character in the Unicode character set. So, for example the ß in a Java properties file has to be encoded into \u00df. To convert those characters, I use Rainbow which is part of the Okapi Framework. It has a handy Encoding Conversion Utility that allows you to convert files from one encoding to another.

Sounds really easy, right? Right? Now what is this woman complaining about again? Well, it’s not that easy. The conversion tool is designed to work with 8-bit ASCII-based encodings. Now, so what IS the problem - it was just stated that Java properties files are ASCII-based encodings? Well, TagEditor takes the ASCII file and when you “Save as Target” after translation, it converts the file into a UTF-8. And that is still not the problem, the problem is that it uses a UTF-8 format without a BOM (Byte Order Mark). The BOM is an (invisible) 2 byte sequence in the beginning of a file which basically tells a program “This is a Unicode file”. Without the BOM, some programs do not recognize the encoding of the file and assume ASCII - and that is the problem with Rainbow (and also with Passolo, a program that just got bought by SDL).

If you try to convert the encoding of a BOMless Unicode file, it goes terribly wrong. As I mentioned, the correct conversion of ß will give you \u00df. Converting a BOMless file will “double escape” the extended characters, and you get \u00c3\u0178 - clearly not the same. The “double escape” is actually a good indicator that something went wrong, if you check your file and see that your extended characters are represented by two escape sequences, you know something went wrong. Of course, that can be difficult when dealing with languages like Greek, Russian or Asian languages, simply because every single character is escaped. I usually try to find a short string and count.

Now, how do you know how a file is encoded? Right now, I use Notepad++ to check. It has a handy little Format menu and allows you to see which encoding is used and it also allows you to convert from one encoding to another. Supported formats are Windows, UNIX, Mac, ANSI, UTF-8 w/o BOM, UTF-8 and UCS-2 Big and Little Endian. Surprisingly, Windows Notepad is one of the few programs that actually manages to decipher the Unicode encoding even without a BOM, just open the BOMless file in Windows Notepad and save them without change. Unfortunately, you usually just don’t know and usually it isn’t even an issue.

I actually happen to get to talk to Yves Savourel, who is working at ENLASO and with the Okapi Framework (and about a gazillion other things related to localization), and he has been very helpful. He explained a few things to me a little better.

    The issue:

  • a BOMless UTF-8 file is recognized as “windows-1252″ encoding
  • a UTF-8 file uses two or more bytes to encode the extended characters
  • the application thinks each of those bytes is a separate character and converts each into a Unicode escape sequence
    The solution:

  • in Rainbow, manually force the encoding of the source file to UTF-8
  • in Rainbow, use the Add/Remove BOM utility to set the BOM properly

If you got through all this stuff, you may now wonder if you’ll ever run into this issue. It is also not just about BOM or not, the whole file encoding raises issues in other applications too. To be honest, I don’t know how often freelance translators are confronted with these types of files, but here are the situations where I keep my eye peeled:

  • Java files (.properties)
    This was the most recent issue that triggered this post.
  • String export files (often XML files or even plain txt)
    I tend to get the strings for REALBasic applications in XML files, though I believe they are created by RegexBuddy.
  • Non-Windows files or Windows files that will be used on other OSs
    We run into this issue with txt files the were created on a Mac and that will be used in InstallShield-type applications, for example to display the license agreement or a readme file.
  • All files
    Haha, very funny - I know. What I mean is, I have experienced various issues with files, if I have to process them through different applications in order to get CAT-translatable files, for example if we receive a weird string file that Trados doesn’t understand and where we need to find a managable way to extract translatable text.

Anyway, maybe this will help someone else in the situation where the client comes back and claims the files are corrupt or so. Otherwise, I apologize for boring the heck out of you. You should have stopped reading my post a long time ago :-)

Some interesting links with related information:

Okapi Framework
Notepad++
Bush hid the facts hoax and Bush hid the facts on Wikipedia
Mojibake
How to Determine Text File Encoding
Cast of Characters: ASCII, ANSI, UTF-8 and all that

Localization: Engineering Software for a Global Market

Here is a PDF that I printed from a Powerpoint presentation. This PPT was part of a presentation I did during my CS350 class (Introduction to Software Engineering). I have also added it to the company intranet as a quick reference for new developers - apparently, localization is rarely touched in CS classes.

If you work in software localization, have a look at it and let me know what you think. Is there something I should add?

Localization: Engineering Software for a Global Market

If you like to use this for anything, it would be nice to mention me as the creator. I have spent a lot of time creating this and you wouldn’t want to be credited for someone else’s work, would you?

SDL Synergy follow-up

A short while back, I explained how I started to use SDL Synergy to manage my multilingual projects. Back then I realized that I need to use the “packages” function in order to really utilize the functions. I was a little uncomfortable with this because I don’t really like to impose a new process on my freelancers, so I actually asked them what they think and how comfortable they are working with those files. Well, they said it is no big deal and one actually replied “Don’t worry, most of the projects I work on are more painful!” - I take that as a compliment. I have only used the packages with my two main translators though, but that is simply because nothing has come up for my “rare” languages.

Now it’s almost 3 weeks later, and I still like it. It really is a lot faster for me to process the files for translation and receiving back the return packages is also very easy. I am so happy that I don’t have to switch back and forth between the different translation memories anymore.

Does anyone else receive packages from clients? How do you like working with them?

Translation agencies - Pt. 1

I’ll call this “Pt. 1″ since I am sure there will be more.

Every now and then, I get sales calls from translation agencies trying to sell their services. I don’t like sales people, not the people per se but in their profession. As soon as they put on their “sales hat”, they are all the same and it doesn’t matter if they sell cars, insurances, electronics or anything else on commission.

Most phone calls start with them asking me about our process, and I explain to them how we handle translation here. It must be a substantial part of sales training, to never accept “No” as an answer. They keep listing their top selling points and how it would benefit us and I keep telling them that we are doing fine with our part in-house, part freelance setup. Everything I say is met with an answer that basically tries to tell me that they can do it better. This can go on forever with the sales person “Ma’m”-ing me and me deconstructing their pro arguments because it simply does not apply to us.

I really like it when they tell me that this could save us so much money by eliminating the in-house translation department. Say what? I am the in-house translation department.

Another favorite is when they try to sell us “solutions”. This is usually some workflow management system which in all honesty sounds good, but has an exorbitant price that sometimes exceeds what we spend on translations per year in total for all languages.

And sometimes, for whatever reason, agencies mass mail several different people in-house and then I get the same sales mail 10 times because everyone forwards them to me.

At times, they were so annoying that I told them to not ever call me again. Sometimes I am trying to explain to them that they really really really are wasting their time. And if all else fails, I just let them send their expensive, fancy, glossy brochures and then I dodge their follow-up calls.

Now, right before the ATA Conference, we had another good one. Someone from an agency sent an email to my boss (who is a tech writer and not a translator), which he forwarded. The first email ended with the words:

Most of our clients are looking for ways to reduce the overall cost of translation while streamlining and improving internal processes. XXX is one of the world’s leading providers of translation and localization management solutions. I would appreciate the opportunity to talk about how we may be able to help you. I understand that you are busy, and guarantee that our discussion will not waste your time.

Wow, a guarantee to not waste our time, that sounds…still not interesting. Send a politely declining email, explaining that we have our own department and handle translations and outsourcing internally. Well, not a good enough decline, now he moves on trying to improve our workflow:

[Our solution] would be more focused around technology, rather than translation services. We partner with a number of in-house translation groups, providing technology to facilitate workflow management, translation memory management, etc.

This email we now follow up with the usual “As mentioned in the previous email, our current system meets our requirements” answer. So now he knows that we don’t need translation, and that we don’t need workflow management. Oh, this agency has more to offer, because now we are approached from a different angle:

I was thinking about our exchange and [your] technical documentation. XXX offers a software tool (plug in to MS Word), that allows our clients’ technical writers to actually reduce the amount of English content. It helps reduce content that requires translation decreasing translation costs, accelerating turn times, as well as lowering printing and shipping costs.

Our clients typically see ROI the first time that they use it.

[...] the tool is completely independent of services - concretely meaning that you can use the tool and change nothing with your current processes.

Awesome, my boss actually forwarded it to me with the words “Sounds like magic”. Not sure if there was another reply from our side, it would have been interesting to see what else this person had to offer.

A week later I am in the exhibition hall of the ATA Conference, and I actually saw a booth of this translation agency. Curious as I am, I think that this is a great moment to have a look at these highly praised management tools without giving a sales person the hope of making a sale. I walk up to the person manning the stand - and she has no idea what I am talking about. I try to explain. She looks at me and asks something like “Do you mean Catalyst?” (which they apparently just acquired). Of course I don’t, but I take it as an indicator that this isn’t going anywhere and most certainly is not leading to a demo of the management tools.

Is there a conclusion to this story? Naaa, not really. I am getting ready for the long weekend - and this is a blog, what did you expect?

« Previous PageNext Page »