Square wheel*

Some problems with MySQL and collation

by pstradomski at 2009-02-12 19:10

Just had a minor "WTF" moment with MySQL, related to collation. It turns out that in the default unicode collation (utf8_general_ci) accents are ignored (but not all of them). Yes, I should have checked in the documentation.

This is a problem for us, as UNIQUE indices do not allow two different words, because mysql consider them equal. This means we need to use utf8_binutf8_polish_ci won't work for the task.

Here are samples of some queries in different collations:

mysql> SET @@collation_connection = 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT "Może" = "Moze"; -- Polish Ż is considered equal to Z
+------------------+
| "Może" = "Moze"  |
+------------------+
|                1 |
+------------------+
1 row in set (0.00 sec)

mysql> SELECT "Ławka" = "Lawka"; -- But Ł is different from L
+--------------------+
| "Ławka" = "Lawka"  |
+--------------------+
|                  0 |
+--------------------+
1 row in set (0.00 sec)

mysql> SET @@collation_connection = 'utf8_unicode_ci';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT "Może" = "Moze";
+------------------+
| "Może" = "Moze"  |
+------------------+
|                1 |
+------------------+
1 row in set (0.00 sec)

mysql> SELECT "Ławka" = "Lawka";
+--------------------+
| "Ławka" = "Lawka"  |
+--------------------+
|                  0 |
+--------------------+
1 row in set (0.00 sec)

mysql> SET @@collation_connection = 'utf8_bin';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT "Ławka" = "Lawka";
+--------------------+
| "Ławka" = "Lawka"  |
+--------------------+
|                  0 |
+--------------------+
1 row in set (0.00 sec)

mysql> SELECT "Może" = "Moze";
+------------------+
| "Może" = "Moze"  |
+------------------+
|                0 |
+------------------+
1 row in set (0.00 sec)

mysql> SET @@collation_connection = 'utf8_polish_ci';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT "Ławka" = "Lawka";
+--------------------+
| "Ławka" = "Lawka"  |
+--------------------+
|                  0 |
+--------------------+
1 row in set (0.00 sec)

mysql> SELECT "Może" = "Moze";
+------------------+
| "Może" = "Moze"  |
+------------------+
|                0 |
+------------------+
1 row in set (0.00 sec)

So now it's time to change most VARCHARS in about 100 tables.

Comments

4alt.pl (not registered) at 2010-03-09 22:41: Może komuś się to przyda problem z literą Ł mysql lub like '%Ł%'

Add comment