Results 1 to 4 of 4

Thread: Perl's handling of foreign (Polish) characters?

  1. #1

    Thread Starter
    Hyperactive Member CaptainPinko's Avatar
    Join Date
    Jan 2001
    Location
    London, Ontario, Canada
    Posts
    332

    Perl's handling of foreign (Polish) characters?

    I am try to write a script to anglicise Polish text by replacing accented characters with their english counterparts but I am encountering so really odd behaviour.

    This code (coloured to look like Kate):
    Code:
    sub anglicise ($){
    	$_ = $_[0];
    	print  "ang: $_\n" ;
    	tr/?????Ó????????ó???/ACELNOSZZacelnoszz/ ;
    	print  "anged: $_\n" ;
    	return;
    }  #end sub anglicise ()
    print  "returned: " .anglicise ("?????ó???");
    print  "\n" ; exit(0);
    produces this:
    ang: ?????ó???
    anged: AzAzAzSzSCczSzSzSz
    returned:


    [edit]

    <rant>
    *sigh* this site doesn't seem to like foreign characters either... but then again this is not surprising since it doesn't even specify a character set:


    <meta http-equiv="MSThemeCompatible" content="Yes">
    That "Yes" part made me laugh if it wasn't so sad.
    </rant>


    The garbled part of the tr// is accented versions ACELNOSZZ, the upper case followed by the lower case. The arguments to anglescise() are just the lower letters. The string starting with "ang: " is all the lower case letters correctly displayed. The string starting with "anged: " is how it actually appears and there is nothing after the "returned: " strangely.
    "There are only two things that are infinite. The universe and human stupidity... and the universe I'm not sure about." - Einstein

    If you are programming in Java use www.NetBeans.org

  2. #2

    Thread Starter
    Hyperactive Member CaptainPinko's Avatar
    Join Date
    Jan 2001
    Location
    London, Ontario, Canada
    Posts
    332
    Since Polish is based on a Latin alphabet (as opposed to large number of Slavic countries using Cyrillic), uses few special characters and the country is fairly central I assumed (yeah, I know: assume makes "a.ss" of "u" and "me") that it would be covered by ISO 8859-1 character-set.
    http://web.archive.org/web/200302072...tml#ISO-8859-2. And Googling for the ISO-8859-2 I found this image of the character set that lists all my characters. .

    After finding the link http://gershwin.ens.fr/vdaniel/Doc-L...ml#DESCRIPTION

    so I added:
    Code:
    use POSIX qw(locale_h);
    setlocale (LC_CTYPE, "pl_PL.utf8");
    to the top of my script... and while Perl doesn't complain about the syntax (surprising since I'm use strict; I was expecting a complaint about LC_CTYPE being not declared).... and nothing changed with regards to output.
    "There are only two things that are infinite. The universe and human stupidity... and the universe I'm not sure about." - Einstein

    If you are programming in Java use www.NetBeans.org

  3. #3
    Frenzied Member Jop's Avatar
    Join Date
    Mar 2000
    Location
    Amsterdam, the Netherlands
    Posts
    1,986
    Do you specify a charset from your html document itself? And have you configured the server to send out files as UTF-8?

    Try adding
    Code:
    AddDefaultCharset utf-8
    to your .htaccess file and I think that'll solve your problem.

    Good luck!
    Jop - validweb.nl

    Alcohol doesn't solve any problems, but then again, neither does milk.

  4. #4

    Thread Starter
    Hyperactive Member CaptainPinko's Avatar
    Join Date
    Jan 2001
    Location
    London, Ontario, Canada
    Posts
    332
    I'm still running this locally on my computer directly with calling Perl and passing the file as an argument (the next step). No server is involved here. Thanks forthe response though.
    "There are only two things that are infinite. The universe and human stupidity... and the universe I'm not sure about." - Einstein

    If you are programming in Java use www.NetBeans.org

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width