KDb/Database Collation Issues: Difference between revisions
Created page with "." |
No edit summary |
||
Line 1: | Line 1: | ||
. | == Introduction == | ||
=== Types === | |||
Collation type is a rule used when comparing (or sorting) text strings in a database. Lets assume we have three database records in a ''name'' column of a table: | |||
Joe | |||
Adam | |||
joan | |||
* NOCASE collation means that a comparison like ''name > "adam"'' will return: | |||
Joe | |||
Adam | |||
joan | |||
: In other words, case-insensitive comparison is performed. | |||
* BINARY collation means that a comparison like ''name > "adam"'' will return: | |||
joan | |||
: Case-sensitive comparison is performed here, so neither "Joe" nor "Adam" are ''greater'' than "adam". | |||
=== Affected SQL Operators === | |||
Following SQL comparison operators are affected: | |||
< > <= >= <> != = == LIKE. | |||
=== Implications For Data Integrity === | |||
Example for MySQL: | |||
CREATE TABLE dictionary ( | |||
word varchar(255) COLLATE latin2_bin NOT NULL, | |||
); | |||
ALTER TABLE dictionary ADD UNIQUE INDEX (word); | |||
Having BINARY collation ''latin2_bin'' defined now it's possible to insert | |||
Bag | |||
bag | |||
while, with NOCASE collation, the unique index would not allow this. | |||
=== Problem with Non-latin1 Character Sets === | |||
Database backend needs to offer built-in support for Non-latin1 character collations. Kexi could not implement a workaround at the client side if a query has to be executed at the server side. | |||
== Collation in SQLite Backend == | |||
SQLite supports BINARY collation by default. | |||
'''Non-latin1 Character Sets:''' we can patch SQLite to add such a support. See [https://www.sqlite.org/datatype3.html#collation User-defined Collation Sequences]. | |||
== Collation in MySQL Backend == | |||
''By default, MySQL searches are not case sensitive.'' | |||
See [https://dev.mysql.com/doc/refman/8.4/en/case-sensitivity.html] and [https://dev.mysql.com/doc/refman/8.4/en/charset-mysql.html] | |||
To force BINARY collation, we will add a COLLATE clause, e.g.: | |||
SELECT * FROM table WHERE name>='abc' '''COLLATE latin1_bin''' | |||
Notes: '''latin1_bin''' collation name always works. Collation and character sets can be properly selected on db creation or connecting only for MySQL >= 4.1. See also [https://dev.mysql.com/doc/refman/8.4/en/charset-table.html Table Character Set and Collation] (mysql manual). So for newer MySQL versions use specific collations, not just latin1_bin. | |||
A good default for collation/character set seems to be ''utf8_general_ci/utf8''. | |||
TODO | |||
== Collation in PostgreSQL Backend == | |||
TODO | |||
== Implications for Query Designer == | |||
A good default is a #1 priority here. In most cases users expect NOCASE collation to be the default, so we're: | |||
* adding ''NOCASE'' flag to appropriate KDbBinaryExpr objects when SQL statement is generated/anaysed | |||
* supporting ''BINARY'' keyword in the Design View of the Query Designer | |||
== Competition == | |||
* [[Kexi/Migration/MDB#MS_Access_Annoyances|MS Access does not support case-sensitive queries]]. |
Latest revision as of 22:40, 5 October 2024
Introduction
Types
Collation type is a rule used when comparing (or sorting) text strings in a database. Lets assume we have three database records in a name column of a table:
Joe Adam joan
- NOCASE collation means that a comparison like name > "adam" will return:
Joe Adam joan
- In other words, case-insensitive comparison is performed.
- BINARY collation means that a comparison like name > "adam" will return:
joan
- Case-sensitive comparison is performed here, so neither "Joe" nor "Adam" are greater than "adam".
Affected SQL Operators
Following SQL comparison operators are affected:
< > <= >= <> != = == LIKE.
Implications For Data Integrity
Example for MySQL:
CREATE TABLE dictionary ( word varchar(255) COLLATE latin2_bin NOT NULL, ); ALTER TABLE dictionary ADD UNIQUE INDEX (word);
Having BINARY collation latin2_bin defined now it's possible to insert
Bag bag
while, with NOCASE collation, the unique index would not allow this.
Problem with Non-latin1 Character Sets
Database backend needs to offer built-in support for Non-latin1 character collations. Kexi could not implement a workaround at the client side if a query has to be executed at the server side.
Collation in SQLite Backend
SQLite supports BINARY collation by default.
Non-latin1 Character Sets: we can patch SQLite to add such a support. See User-defined Collation Sequences.
Collation in MySQL Backend
By default, MySQL searches are not case sensitive.
To force BINARY collation, we will add a COLLATE clause, e.g.:
SELECT * FROM table WHERE name>='abc' COLLATE latin1_bin
Notes: latin1_bin collation name always works. Collation and character sets can be properly selected on db creation or connecting only for MySQL >= 4.1. See also Table Character Set and Collation (mysql manual). So for newer MySQL versions use specific collations, not just latin1_bin.
A good default for collation/character set seems to be utf8_general_ci/utf8.
TODO
Collation in PostgreSQL Backend
TODO
Implications for Query Designer
A good default is a #1 priority here. In most cases users expect NOCASE collation to be the default, so we're:
- adding NOCASE flag to appropriate KDbBinaryExpr objects when SQL statement is generated/anaysed
- supporting BINARY keyword in the Design View of the Query Designer