Perl's handling of foreign (Polish) characters?
I am try to write a script to anglicise Polish text by replacing accented characters with their english counterparts but I am encountering so really odd behaviour.
This code (coloured to look like Kate):
Code:
sub anglicise ($){
$_ = $_[0];
print "ang: $_\n" ;
tr/?????Ó????????ó???/ACELNOSZZacelnoszz/ ;
print "anged: $_\n" ;
return;
} #end sub anglicise ()
print "returned: " .anglicise ("?????ó???");
print "\n" ; exit(0);
produces this:
Quote:
ang: ?????ó???
anged: AzAzAzSzSCczSzSzSz
returned:
[edit]
<rant>
*sigh* this site doesn't seem to like foreign characters either... but then again this is not surprising since it doesn't even specify a character set:
Quote:
<meta http-equiv="MSThemeCompatible" content="Yes">
That "Yes" part made me laugh if it wasn't so sad.
</rant>
The garbled part of the tr// is accented versions ACELNOSZZ, the upper case followed by the lower case. The arguments to anglescise() are just the lower letters. The string starting with "ang: " is all the lower case letters correctly displayed. The string starting with "anged: " is how it actually appears and there is nothing after the "returned: " strangely.