Posted by & filed under Databases.

ä, ö and ü: if these characters look familiar too you, then you have been plagued by an encoding-problem when dumping mysql-databases. You think, you are doing something wrong? So did I, but it looks like this is just a bug in the utf8-encoder of mysql.

After all these years that utf8 has been around, it is still hard. Maybe it is because Americans do not really care that much about utf8 and seem mostly happy with ASCII, maybe it is because getting encodings right actually is really hard. Whatever the reason, more often than not, umlaute turn into garbish, typographic dashes end up in illegible mumbo-jumbo when data is passed from one database to the other.

My most common problem: Dumping data from an ut8-database using mysqldump and then importing it into another utf8-database. What can possibly go wrong with that, when you make sure that you pass –default-character-set=utf8 into every command?

Everything: It looks like mysqldump simply has a bug and double-encodes utf8-values. Many many thanks to Peter Vandenbeele for finding the bug and explaining it lengthy why it happens. Passing –default-character-set=latin1 to mysqldump and then changing the SET NAMES to utf8 from latin1 does the deal: the utf8 Values are not encoded again and although it says latin1, in fact it really is utf8. Afterwards, importing works flawless and all umlaute are properly encoded. This is not entirely rational, but hey, what is in the end?!

  • http://twitter.com/kiilo_ kiilo

    what to do, if you have only a doubel encoded mysqldump?

    asks a
    dubbel

    • German

       If you have a double encoded to utf8 mysql dump text file (sql), just open it in text editor (if the size allows) save it as latin1, and re-open it. Voilá!

  • http://twitter.com/kiilo_ kiilo

    what to do if you just have that double encoded dump?

    asked by a dubbel

  • elpela

    Thanks,
    elpela