Some problems with MySQL and collation
by pstradomski at 2009-02-12 19:10
Just had a minor "WTF" moment with MySQL, related to collation. It turns out that in the default unicode collation (utf8_general_ci) accents are ignored (but not all of them). Yes, I should have checked in the documentation.
This is a problem for us, as UNIQUE indices do not allow two different words, because mysql consider them equal. This means we need to use utf8_bin — utf8_polish_ci won't work for the task.
Here are samples of some queries in different collations:
mysql> SET @@collation_connection = 'utf8_general_ci'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT "Może" = "Moze"; -- Polish Ż is considered equal to Z +------------------+ | "Może" = "Moze" | +------------------+ | 1 | +------------------+ 1 row in set (0.00 sec) mysql> SELECT "Ławka" = "Lawka"; -- But Ł is different from L +--------------------+ | "Ławka" = "Lawka" | +--------------------+ | 0 | +--------------------+ 1 row in set (0.00 sec) mysql> SET @@collation_connection = 'utf8_unicode_ci'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT "Może" = "Moze"; +------------------+ | "Może" = "Moze" | +------------------+ | 1 | +------------------+ 1 row in set (0.00 sec) mysql> SELECT "Ławka" = "Lawka"; +--------------------+ | "Ławka" = "Lawka" | +--------------------+ | 0 | +--------------------+ 1 row in set (0.00 sec) mysql> SET @@collation_connection = 'utf8_bin'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT "Ławka" = "Lawka"; +--------------------+ | "Ławka" = "Lawka" | +--------------------+ | 0 | +--------------------+ 1 row in set (0.00 sec) mysql> SELECT "Może" = "Moze"; +------------------+ | "Może" = "Moze" | +------------------+ | 0 | +------------------+ 1 row in set (0.00 sec) mysql> SET @@collation_connection = 'utf8_polish_ci'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT "Ławka" = "Lawka"; +--------------------+ | "Ławka" = "Lawka" | +--------------------+ | 0 | +--------------------+ 1 row in set (0.00 sec) mysql> SELECT "Może" = "Moze"; +------------------+ | "Może" = "Moze" | +------------------+ | 0 | +------------------+ 1 row in set (0.00 sec)
So now it's time to change most VARCHARS in about 100 tables.
Comments
4alt.pl (not registered) at 2010-03-09 22:41: Może komuś się to przyda problem z literą Ł mysql lub like '%Ł%'