-
Aug 2nd, 2004, 03:47 PM
#1
Thread Starter
Hyperactive Member
Perl's handling of foreign (Polish) characters?
I am try to write a script to anglicise Polish text by replacing accented characters with their english counterparts but I am encountering so really odd behaviour.
This code (coloured to look like Kate):
Code:
sub anglicise ($){
$_ = $_[0];
print "ang: $_\n" ;
tr/?????Ó????????ó???/ACELNOSZZacelnoszz/ ;
print "anged: $_\n" ;
return;
} #end sub anglicise ()
print "returned: " .anglicise ("?????ó???");
print "\n" ; exit(0);
produces this:
ang: ?????ó???
anged: AzAzAzSzSCczSzSzSz
returned:
[edit]
<rant>
*sigh* this site doesn't seem to like foreign characters either... but then again this is not surprising since it doesn't even specify a character set:
<meta http-equiv="MSThemeCompatible" content="Yes">
That "Yes" part made me laugh if it wasn't so sad.
</rant>
The garbled part of the tr// is accented versions ACELNOSZZ, the upper case followed by the lower case. The arguments to anglescise() are just the lower letters. The string starting with "ang: " is all the lower case letters correctly displayed. The string starting with "anged: " is how it actually appears and there is nothing after the "returned: " strangely.
"There are only two things that are infinite. The universe and human stupidity... and the universe I'm not sure about." - Einstein
If you are programming in Java use www.NetBeans.org
-
Aug 2nd, 2004, 09:50 PM
#2
Thread Starter
Hyperactive Member
Since Polish is based on a Latin alphabet (as opposed to large number of Slavic countries using Cyrillic), uses few special characters and the country is fairly central I assumed (yeah, I know: assume makes "a.ss" of "u" and "me") that it would be covered by ISO 8859-1 character-set.
http://web.archive.org/web/200302072...tml#ISO-8859-2. And Googling for the ISO-8859-2 I found this image of the character set that lists all my characters. .
After finding the link http://gershwin.ens.fr/vdaniel/Doc-L...ml#DESCRIPTION
so I added:
Code:
use POSIX qw(locale_h);
setlocale (LC_CTYPE, "pl_PL.utf8");
to the top of my script... and while Perl doesn't complain about the syntax (surprising since I'm use strict; I was expecting a complaint about LC_CTYPE being not declared).... and nothing changed with regards to output.
"There are only two things that are infinite. The universe and human stupidity... and the universe I'm not sure about." - Einstein
If you are programming in Java use www.NetBeans.org
-
Aug 3rd, 2004, 07:12 AM
#3
Frenzied Member
Do you specify a charset from your html document itself? And have you configured the server to send out files as UTF-8?
Try adding
Code:
AddDefaultCharset utf-8
to your .htaccess file and I think that'll solve your problem.
Good luck!
Jop - validweb.nl
Alcohol doesn't solve any problems, but then again, neither does milk.
-
Aug 3rd, 2004, 02:16 PM
#4
Thread Starter
Hyperactive Member
I'm still running this locally on my computer directly with calling Perl and passing the file as an argument (the next step). No server is involved here. Thanks forthe response though.
"There are only two things that are infinite. The universe and human stupidity... and the universe I'm not sure about." - Einstein
If you are programming in Java use www.NetBeans.org
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|