Jump to Navigation

We've moved! The new address is http://www.henriettes-herb.com - update your links and bookmarks!

Character encoding woes

Blog categories:

Just in case somebody else has problems with this ...

It's mysql 4.x, mysql 5, knoda, php, utf-8 and latin-1 (which is also called iso-8859-1). And what it really spells is trouble.

I've switched my database from paradox 8 (running under win4lin on suse 9.1) to mysql and knoda 0.8.2 (running on suse 10.1). Knoda rocks.

(If you want the latest version of knoda to run on Suse 10.1, don't try to install from source - go for the opensuse backports repository instead, and install both hk_classes and knoda from the supplied rpm files. Trust me on this, it'll save you hours of "bloody, it needs that too?" - and you'll end up with an install which still won't work properly.)

So I did one last transfer of the data from paradox to mysql 4.0.15 on suse, on the old machine. I used sqlyog for the transfer, which is nice - but my 10 MB database (on mysql) takes hours to transfer over. (... granted, it's 200 MB on paradox.)

Next I did a myqsldump on localhost (on the old machine) and ftp'd the file to henriettesherbal.com, after which I ran "mysql -u user -p database " to get the dump onto that mysql server. Which is version 4.1.14 or so. (The mysql client on dreamhost is 5.0.16, or so). The new data is not accessible by world + dog: this particular update isn't done yet, as in, the 700+ new pics that I have in the database aren't online yet.

Then I did another mysqldump, cos 4.1.14 does the character transformation from mysql 4.0 to 4.1 rather nicely.

That latter datadump landed on the laptop, which runs, wait for it, mysql server 5.0.18.

Mysql 4.x to mysql 5.0 troubles

There are problems when you convert things from mysql 4 to mysql 5. Or there are if you use fancy things like åäö, or ç, or ñ, in your text.

As in, while my php scripts showed the lot as åäö, or ç, and ñ, knoda showed pretty squares.

It's extremely difficult to update a database if you cannot use your local letters. You can try it out yourself even if you're working in English: just try to copy-paste a different square of your choice for each of "a" "e" "i" "o" and "u", when entering data. (Or "ä" "ö" "å" "ü" and, say, "á", which hits rather closer to home: those are all used frequently in my plant name database.) It's undoable, for all practical purposes.

A fix

So I dove into the documentation and checked what was set where and how (thanks to Kristian Köhntopp for this hint):

laptop, mysql 5.0.18:
mysql> show variables like "character_set_%";
| Variable_name | Value |
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | irrelevant |

dreamhost, mysql 4.1.16, and old machine, mysql 4.0.15:
same, but character_set_server was latin1.

So I ran this:
mysql> set character_set_server='latin1';
on the laptop, and knoda was all hunky dory. Woot!

System character encoding

There's more to it that all that. There's also the underlying system's character encoding. Like this:

laptop:~/files/> locale
LANG=en_US.UTF-8

(and the same for the rest of them)

and
dreamhost> locale
LANG=en_US

(and so on for the rest of them).

And, based on that, here's what the output from my php scripts looks like on henriettesherbal.com:

meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1":
... Deu.: Echter Fächer-Ahorn, Japanischer Ahorn. Suom.: japaninvaahtera. Sven.: japansk lönn. Fran.: érable du Japon.

meta http-equiv="Content-Type" content="text/html; charset=utf-8":
... Deu.: Echter F�cher-Ahorn, Japanischer Ahorn. Suom.: japaninvaahtera. Sven.: japansk l�nn. Fran.: �rable du Japon.

And this is what the same script looks like on localhost (on the laptop):

meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1":
... Deu.: Echter Fächer-Ahorn, Japanischer Ahorn. Suom.: japaninvaahtera. Sven.: japansk lönn. Fran.: érable du Japon.

meta http-equiv="Content-Type" content="text/html; charset=utf-8":
...Deu.: Echter Fächer-Ahorn, Japanischer Ahorn. Suom.: japaninvaahtera. Sven.: japansk lönn. Fran.: érable du Japon.

--

And that means that I'll have to have different charsets on my php scripts on localhost and online. Bleh for that. But anyway, things work. Woot!



Main menu 2