PDA

Click to See Complete Forum and Search --> : Language decision script - always return english.


StrangerInBeijing
Mar 13th, 2007, 05:03 AM
Hi,
This is my first ever "real" php script, so excuse any stupid code.
Unfortunately the thing always return english, even if i'm on a chinese computer.
/*Get the language that should be used.
First check cookies, as this may be set from a previous session.
Then check browser language. Can only one of the supported languages.
Default to english, and always set cookie again. */
function get_user_lang()
{
$supported_languages = array('en-us' => 'English', 'zh-cn' => 'Chinese', 'ja-jp' => 'Japanese');
$lang = '';
if ( isset($_COOKIE["lang"]) )
$lang = strtolower( $_COOKIE["lang"] );
if ( array_key_exists( $lang,$supported_languages )== false && isset( $_SERVER["HTTP_ACCEPT_LANGUAGE"] ) )
$lang = strtolower( $_SERVER["HTTP_ACCEPT_LANGUAGE"] );
if ( array_key_exists( $lang,$supported_languages ) == false )
$lang = 'en-us';
setcookie( "lang", $lang, time()+31536000 );
return $lang;
}

The idea is that if the user have been here before, go to the same language he used then.
If not, get the browser language.
Seeing we only support english, japanese and chinese, go to english if it's another language.
Default to English

kows
Mar 13th, 2007, 10:25 AM
I looked and saw that the variable $_SERVER['HTTP_ACCEPT_LANGUAGE'] isn't in the formatting you need it to be in. you'll have to manipulate it to get the data you want. I have a default install of FireFox and my preferred languages were "English-US" and "English." Both of these together produced the string en-us,en;q=X (where X was some number). So, after adding more languages, the pattern is that different languages are separated by a comma and all non-preferred languages have a semi-colon defining q, which I have to assume is some way of telling you what priority it has because it gradually goes down for each language you have (the first language that had this, 'en', was .8, while the next, 'en-ca', was .6, 'zh-cn' was .4 and 'ja' was .2 -- these proportions all changed when adding or removing languages, too) Anyway, I used this script to find out what was happening with a modified version of your function:
<?php
function get_user_lang(){
$languages = array('en-us' => 'English', 'zh-cn' => 'Chinese', 'ja-jp' => 'Japanese');
$lang = (isset($_COOKIE['lang']) && isset($languages[$_COOKIE['lang']])) ? $_COOKIE['lang'] : ((isset($_SERVER['HTTP_ACCEPT_LANGUAGE'], $languages[$_SERVER['HTTP_ACCEPT_LANGUAGE']])) ? strtolower($_SERVER['HTTP_ACCEPT_LANGUAGE']) : 'en-us');
setcookie('lang', $lang, time() + 31536000);
return $lang;
}
echo '<pre>';
if(isset($_COOKIE['lang']))
echo 'cookie is set as ' . $_COOKIE['lang'] . "\n\n";
if(isset($_SERVER['HTTP_ACCEPT_LANGUAGE']))
echo 'browser language is ' . $_SERVER['HTTP_ACCEPT_LANGUAGE'] . "\n\n";
echo get_user_lang();
echo '</pre>';
?>
this output the following after I added some languages to my browser's list of appropriate languages:
cookie is set as en-us

browser language is en-us,en;q=0.8,en-ca;q=0.6,zh-cn;q=0.4,ja;q=0.2

en-us
as you can see, that would provide absolutely nothing to your original function. it would skip to using en-us by default because no key in your array exists with all of that. so, instead, we'll just need to grab the first most language set for the browser (ie: the most preferred) by exploding it. (also, your japanese thing that is set in your array is ja-jp, but I couldn't find this selection in my list of languages. "ja" was the only thing under japanese. furthermore, there were many versions of chinese there, over five I believe, including just "zh" instead of having a specific branch of chinese based on the city (ie: the one you picked was china's chinese) -- so I'm not sure why you picked them but you may want to revise your array, because anyone picking any of those would get english instead of chinese, unless they picked "zh-cn" chinese.).

here is a revised function that will use the 'lang' cookie if found, but otherwise will decide on using the most preferred language in your browser's options if it is a supported language. please note that the preferred language does not have a semi-colon in it ever, so we need only find the top-most key in an array of the languages separated by one comma. so, we can do this:
<?php
function get_user_lang(){
$languages = array('en-us' => 'English', 'zh-cn' => 'Chinese', 'ja-jp' => 'Japanese');
//find out whether or not the user has a cookie set telling us what language to use
$lang = (isset($_COOKIE['lang']) && isset($languages[$_COOKIE['lang']])) ? $_COOKIE['lang'] : '';
//language was not previously set, let's parse the browser's language
if($lang == '' && isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])){
list($preferred) = explode(",", strtolower($_SERVER['HTTP_ACCEPT_LANGUAGE']));
//if the language is supported, use it; otherwise, use the default
$lang = (isset($languages[$preferred])) ? $preferred : 'en-us';
}
setcookie('lang', $lang, time() + 31536000);
return $lang;
}
echo '<pre>';
if(isset($_COOKIE['lang']))
echo 'cookie is set as ' . $_COOKIE['lang'] . "\n\n";
if(isset($_SERVER['HTTP_ACCEPT_LANGUAGE']))
echo 'browser language is ' . $_SERVER['HTTP_ACCEPT_LANGUAGE'] . "\n\n";
echo get_user_lang();
echo '</pre>';
?>
hope that made sense to you @_@.

CornedBee
Mar 13th, 2007, 04:47 PM
The rules for matching accept-language are clearly defined in the RFC (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html).

In plain speech, it works like this:
The accept-language header is a series of language tags, separated by the comma. Each language tag consists of the main language, an optional variation, and an optional preference. If the preference is omitted, it is defined to be 1.0. If the accept-language header is omitted completely, all languages are assumed to be valid.

Once you have split the header into the individual language tags, you sort them by their preference, descending. This gives you a list like (for kows's header):
en-us
en
en-ca
zh-cn
ja

Note that just because Firefox sends the languages in descending order, that doesn't mean all browsers do.

When you have that descending list, you also need a list of available languages. Let's say you have the document in US English, British English, and a generic German version. In that case, your language list is
en-us
en-gb (or en-uk? Can't remember)
de

This list doesn't usually have preferences attached, although it could, if for example some versions are computer-translated. However, that makes the matching algorithm absurdly complicated, so we'll treat all of them the same.

Now you walk the requested languages in order. The first one is
en-us
This is available, so you serve it.

Now in comes my request. My language headers are, already split and sorted:
de-at
en
de

The first is de-at. You don't have that. Your generic German version is not allowed to match my specific request for an Austrian version. This is very important, if only because so many systems (Microsoft's knowledge base app, for example) gets it wrong. Microsoft happily serves me auto-translated (and therefore useless) generic German versions over the original en-us because they're too stupid to read an RFC.

OK, rant over.

After you've done the right thing and ignored my de-at, the next thing tag is en. Generic language requests are allowed to match specific available versions, so your server can do three things:
1) It can send back the US version.
2) It can send back the UK version.
3) It can respond with a 300 "Multiple Choices" status and present a page where I can choose between the UK and US versions.

Third scenario. A Canadian purist comes along with his browser set to this accept-language: "en-ca,fr-ca". This guy only wants Canadian versions, but he's fine with English or French - as long as it's the Canadian variant.
Your server is free to use either en-ca or fr-ca first. However, neither is matched. Your server has two choices.
1) Send a 406 "Not Acceptable" error to the client, which tells him that you anti-Canadian swine don't offer a Canadian version. However, you should also send a list of the languages you DO offer, linked in such a way that it overrides your server error.
2) Send any other version back (preferably one that matches a major-language subtag of his accept-language header). This is valid, but not recommended by the RFC.

I hope this clears things up. Now go and do the right thing.

StrangerInBeijing
Mar 13th, 2007, 05:40 PM
Thanks a lot guys. This really helped a lot to clear things up.

Where can I get a list of all possible languages?
For instance, I realize now that I want to go to the chinese version of my site if the language is any form of Chinese. So for japanese. Any other language must go to english.

kows
Mar 13th, 2007, 09:03 PM
The rules for matching accept-language are clearly defined in the RFC (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html).thanks for the simple write up! I'd never done anything with languages and was on a short time line on the post I had made so I quickly checked out the information at hand and went with it. made a lot of sense though, and I probably wouldn't have taken the time to look through that RFC document unless I was doing my own project with multiple languages.