Results 1 to 3 of 3

Thread: Internationalization questions and the inner life of ó

  1. #1

    Thread Starter
    Junior Member
    Join Date
    May 2009
    Posts
    29

    Internationalization questions and the inner life of ó

    Well it has taken me most of the day to figure out how to even ask these questions. I am internationalizing one of my applications. For the moment, I am just interested in western European languages. I have integrated the Google Translator to assist my users in making their own translations. Translation of all the strings in the app can be stored and modified. I started the testing with Spanish because I am reasonably comfortable with it.

    The translations are not too bad and the speed is surprising even though the translation requests are made over the net. The first hundred or so translations went well and were properly displayed in my textbox; for example, "ñ" and "é". I thought I was home free until I tried the phrase "go to the next past due item". The following string was returned: "Ir al tema de los próximos en situación de mora". ó should be ó. I inspected the incoming packet with WireShark and found what appeared to be mixed single and multibyte chars.

    Code:
     00000807h: 49 72 20 61 6C 20 74 65 6D 61 20 64 65 20 6C 6F ; Ir al tema de lo
     00000817h: 73 20 70 72 C3 B3 78 69 6D 6F 73 20 65 6E 20 73 ; s próximos en s
     00000827h: 69 74 75 61 63 69 C3 B3 6E 20 64 65 20 6D 6F 72 ; ituación de mor
     00000837h: 61                                              ; a
    A little research and I learned the return is probably UTF-8 encoded, that apparently, VB does not like. Then I discovered that English and Spanish use the same code page. So it should be possible to display the necessary characters using the Windows 1252 codepage (ISO-8859-1). My solution was to replace the multibyte chars that are returned for the accented chars "á", "í","ó" and "ú" with the single-byte equivalent. Although functional, this completely lacks elegance. Additionally, I will have to figure out what all the what all possible multibyte chars are for each of the western European languages.

    My Questions
    1. Have I go this figured out right?
    2. Is there an alternative (better) method?
    3. What have i missed here?
    4. Is there a solution that does not involve determing in advance each multibyte char and creating a substitution table for them?
    5. Does anyone have a list of known exceptions for western European languages?

  2. #2

    Thread Starter
    Junior Member
    Join Date
    May 2009
    Posts
    29

    Re: Internationalization questions and the inner life of ó

    Incompetent fingers
    Last edited by alweis; Nov 17th, 2009 at 09:20 PM. Reason: Incompetent fingers

  3. #3

    Thread Starter
    Junior Member
    Join Date
    May 2009
    Posts
    29

    Re: Internationalization questions and the inner life of ó

    After doing some more thinking I realized that I can specify the code page in the http header. For example:

    Code:
        Private Const Headers = "Accept-Charset: windows-1252" _
                        & "Referer: http://www.trackpro.org" _
                        & "Accept: text/plain"
    
        hOpen = InternetOpen(UserAgent, INTERNET_OPEN_TYPE_PRECONFIG,  vbNullString, vbNullString, 0)
        hOpenUrl = InternetOpenUrl(hOpen, Url, Headers, Len(Headers), INTERNET_FLAG_RELOAD, 0)
    At least for Spanish this resolves the issue. I suppose the next step is to change the header to a variable and pick up the code page of the User's computer and insert it.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width