Internationalization questions and the inner life of Ã³

**alweis** · Nov 17th, 2009, 07:32 PM

Well it has taken me most of the day to figure out how to even ask these questions.

I am internationalizing one of my applications. For the moment, I am just interested in western European languages. I have integrated the Google Translator to assist my users in making their own translations. Translation of all the strings in the app can be stored and modified. I started the testing with Spanish because I am reasonably comfortable with it.

The translations are not too bad and the speed is surprising even though the translation requests are made over the net. The first hundred or so translations went well and were properly displayed in my textbox; for example, "ñ" and "é". I thought I was home free until I tried the phrase "go to the next past due item". The following string was returned: "Ir al tema de los prÃ³ximos en situaciÃ³n de mora". Ã³ should be ó. I inspected the incoming packet with WireShark and found what appeared to be mixed single and multibyte chars.

Code:

 00000807h: 49 72 20 61 6C 20 74 65 6D 61 20 64 65 20 6C 6F ; Ir al tema de lo
 00000817h: 73 20 70 72 C3 B3 78 69 6D 6F 73 20 65 6E 20 73 ; s prÃ³ximos en s
 00000827h: 69 74 75 61 63 69 C3 B3 6E 20 64 65 20 6D 6F 72 ; ituaciÃ³n de mor
 00000837h: 61                                              ; a

A little research and I learned the return is probably UTF-8 encoded, that apparently, VB does not like. Then I discovered that English and Spanish use the same code page. So it should be possible to display the necessary characters using the Windows 1252 codepage (ISO-8859-1). My solution was to replace the multibyte chars that are returned for the accented chars "á", "í","ó" and "ú" with the single-byte equivalent. Although functional, this completely lacks elegance. Additionally, I will have to figure out what all the what all possible multibyte chars are for each of the western European languages.

My Questions

Have I go this figured out right?
Is there an alternative (better) method?
What have i missed here?
Is there a solution that does not involve determing in advance each multibyte char and creating a substitution table for them?
Does anyone have a list of known exceptions for western European languages?

Thread: Internationalization questions and the inner life of Ã³

Thread Tools

Display

Threaded View

Internationalization questions and the inner life of Ã³

Posting Permissions