KDb/Drivers/SQLite: Difference between revisions

From KDE Community Wiki
< KDb‎ | Drivers
No edit summary
Line 156: Line 156:


===Enabling collation selection in Kexi===
===Enabling collation selection in Kexi===
This is related task, far more complex than implementing default collation in Kexi.
See [https://bugs.kde.org/show_bug.cgi?id=289293 Wish #289293 - Allow Assigning/Creating Collating Sequences for Columns in SQLite].
See [https://bugs.kde.org/show_bug.cgi?id=289293 Wish #289293 - Allow Assigning/Creating Collating Sequences for Columns in SQLite].


Line 170: Line 171:


Identified major types of collations:
Identified major types of collations:
*BINARY - binary comparison, default in raw SQLite, default in Kexi < 2.4
*BINARY - binary comparison
*UNICODE, i.e. ROOT, i.e. (<nowiki>''</nowiki>) in SQLite - provided by the ICU extension, default in Kexi 2.4+
**default in raw SQLite, compares string data using memcmp(), so since Kexi assumes that PRAGMA encoding is "UTF-8", this is comparison of UTF-8 data, byte-by-byte
**In MySQL, equivalent of UTF-8 data byte-by-byte comparison is [http://dev.mysql.com/doc/refman/5.5/en/charset-binary-collations.html utf8_bin]
**default in Kexi < 2.4
*UNICODE, i.e. ROOT (common) for all locale-aware collation
**(<nowiki>''</nowiki>) in SQLite - provided by the ICU extension
**[http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html utf8_unicode_ci] in MySQL
**default in Kexi 2.4+
*
*


TODO: improve this list (other backends...)
TODO: improve this list (other backends...)
Links:
*[http://www.sqlite.org/datatype3.html#collation Standard collations in SQLite 3]
*[http://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html Character Sets and Collations That MySQL Supports]
*[http://dev.mysql.com/doc/refman/5.5/en/charset-collations.html Collation Issues in MySQL]
IDEA: Raw collations. User agrees to select backend-dependent collation (maybe even installation-dependent). Kexi marks this fact on its own, so this information is useful in the database migration tool.


===Links===
===Links===

Revision as of 22:20, 18 December 2011

Note

This page is maintained by Jstaniek


Requirements for libsqlite3 builds

Configure libsqlite3 build as follows:

mkdir builddir
cd builddir
REQUIRED_CPPFLAGS="-DSQLITE_ENABLE_COLUMN_METADATA=1"
REQUIRED_OPTIONS="--enable-load-extension"
RECOMMENDED_CPPFLAGS="-DSQLITE_ENABLE_FTS3=1 -DSQLITE_ENABLE_FTS3_PARENTHESIS=1 \
-DSQLITE_SECURE_DELETE=1 -DSQLITE_ENABLE_MEMORY_MANAGEMENT=1 \
-DSQLITE_ENABLE_RTREE=1 -DSQLITE_ENABLE_STAT2=1 \
-DSQLITE_ENABLE_UPDATE_DELETE_LIMIT=1 -DSQLITE_ENABLE_UNLOCK_NOTIFY=1 \
-DSQLITE_SOUNDEX=1"
RECOMMENDED_OPTIONS="--enable-threadsafe"
CPPFLAGS="$REQUIRED_CPPFLAGS $RECOMMENDED_CPPFLAGS" \
../sqlite-src-???/configure --prefix=.... $REQUIRED_OPTIONS $RECOMMENDED_OPTIONS
  • --enable-load-extension is a must to have support for setting up unicode collations.
  • Optional: append --enable-debug for debug version.
  • During the build, sqlite3 shell will be also build, the build options also affect the shell, e.g. .load command is available only if --enable-load-extension is present.
  • See also explanation in Deploying SQLite blog entry.

libsqlite3 status in distros

  • Debian: OK
  • Windows: OK
    • via KDE on Windows deployment system (handled by KDE packagers)
  • Mac OS X: ??

TODO... please update

Collations

By default Kexi forces ROOT collation, i.e. combined collations for all locales. Example test session in sqlite3 shell:

'''.load path/to/libkexidb_sqlite3_icu.so'''
-- original order: 3 locales mixed (polish, german, greek)
1|aaaa
2|ąąąą
3|β
4|żżżż
5|bbbb
6|ü
7|źźźź
8|ξ
9|α
10|u
-- binary collation, sqlite default
sqlite> select * from collation order by name; 
1|aaaa
5|bbbb
10|u
6|ü
2|ąąąą
7|źźźź
4|żżżż
9|α
3|β
8|ξ
-- ROOT collation
sqlite> SELECT icu_load_collation('', '');
sqlite> select * from test order by name COLLATE '';
1|aaaa
2|ąąąą
5|bbbb
10|u
6|ü
7|źźźź
4|żżżż
9|α
3|β
8|ξ
-- ROOT collation declared for table column as default collation
sqlite> create table test2 ( name text COLLATE '' );
sqlite> insert into test2 values ( "ć" );
sqlite> insert into test2 values ( "ą" );
sqlite> insert into test2 values ( "Ć" );
sqlite> insert into test2 values ( "a" );
sqlite> insert into test2 values ( "A" );
sqlite> insert into test2 values ( "C" );
-- using the default collation
sqlite> select * from test2 order by name;
a
A
ą
C
ć
Ć
-- forcing binary collation
sqlite> select * from test2 order by name COLLATE BINARY;
A
a
C
ą
Ć
ć
-- "A" > "a" for ROOT collation 
select * from test2 where name > "A"
A
C
ą
Ć
ć

Compatibility with 3rd-party software

SQLite has three simple collation sequence types built-in: NOCASE, RTRIM. Other collations have to be loaded.

The ICU-based collation sequences is provided via the SQLite extension API, so it works both in Kexi SQLite3 driver (which loads the needed collation automatically) and the sqlite3 shell (where .load command can be used). However any 3rd-party software that intends to use tables or queries with these collations have to load the plugin. This can be done either using load_extension() or SQL command or using [http://www.sqlite.org/c3ref/load_extension.html sqlite3_load_extension() C API.

sqlite3_enable_load_extension(1) have to be called before to enable extension loading. Certain tools have no extensions enabled in their usage of SQLite3 and altering of their code may be needed.

When user attempts to use tables or queries with custom collations the following error message appears (note that the message can be hidden by the 3rd-party program):

-- ISSUE 1: executing query on table with column that use custom collation, from the previous example: create table test2 ( name text COLLATE '' )
select * from test2;
-- result:
> Error: no such collation sequence: 
-- workaround is to force BINARY collation:
select * from test2 order by name COLLATE BINARY;
> OK

-- ISSUE 2: executing query that uses custom collation on table with standard collation: select * from test order by name COLLATE ''; > Error: no such collation sequence: -- workaround is to remove COLLATE part so the collation defaults to BINARY: select * from test order by name; > OK

Ensuring Kexi is built with capable version of libsqlite3

For improved modularity and to enable proper use of sqlite3 shell, custom collations are used in Kexi via extensions API of libsqlite3. The extensions API are not enabled by default for libsqlite3 builds because --enable-load-extension option has to be explicitly added to the ./configure command. Read libsqlite3 status in distros section to check status for particular OS distribution.

In order to ensure minimal, extension support is tested at configuration stage of Kexi build process.

How to deal with incapable libsqlite3 versions?

  1. Recommended solution: add needed build option(s) for the official distribution. Make Kexi >= 2.4.0 dependent on this version of renewed libsqlite3 as minimal version.
  2. Alternative solution: create custom package of libsqlite3 (and possibly libsqlite3-dev and sqlite3 shell). Use different names for the packages to enable side-by-side installation, e.g. libsqlite3ext, sqlite3ext, libsqlite3ext-dev, respectively. Make Kexi >= 2.4.0 package dependent on the custom package instead of canonical libsqlite3.

Please consider the alternative solution only as temporary. Switch to the recommended solution as soon as possible.

In any case consider adding also recommended build options in one go (the list is published in section Requirements for libsqlite3 builds. This is beneficial for all of us.

Backward compatibility in Kexi

See Bug 278411 - Add collation sequence for unicode characters to the SQLite driver and make it the default

See Compatibility with 3rd-party software section for two compatibility issues.

Original solution for ISSUE 1 (force BINARY collation) would have to be added to Kexi < 2.4 in order to enable Kexi < 2.4 to open databases created with Kexi 2.4+. This is becasue (unlike qieries) table definitions are stored by SQLite itself so COLLATE is a constraint. So better approach is to skip adding COLLATE '' for column definitions in CREATE TABLE statements and instead to add them at runtime in SQL generated for queries. There is no clear overhead in this solution. behaviour is exactly the same as in the original solution.

Solution for ISSUE 2 is not needed because Kexi does not storequery definitions in SQLite's SQL format but in KEXISQL format. COLLATE '' specifier is added at runtime in Kexi 2.4+ only. In Kexi < 2.4 the specifier is not added and the database defaults to BINARY collation, so there is not change in behaviour in Kexi < 2.4, what is expected.

Enabling collation selection in Kexi

This is related task, far more complex than implementing default collation in Kexi. See Wish #289293 - Allow Assigning/Creating Collating Sequences for Columns in SQLite.

This wish can be implemented in Kexi >= 2.5.

PROBLEM 1: collations support is backend-dependent. E.g. SQLite and MySQL backends have different support for collations.

PROBLEM 2: collations support is installation/database/connection-dependent. E.g.

  • Two MySQL servers can be installed with different settings for default collations, so newly created database can behave differently. Solution to this is to always specify collations explicitly.
  • Two existing MySQL databases can have different default collations.
  • Two connections to the same MySQL server can have different default collations.

TODO: more problems?

Identified major types of collations:

  • BINARY - binary comparison
    • default in raw SQLite, compares string data using memcmp(), so since Kexi assumes that PRAGMA encoding is "UTF-8", this is comparison of UTF-8 data, byte-by-byte
    • In MySQL, equivalent of UTF-8 data byte-by-byte comparison is utf8_bin
    • default in Kexi < 2.4
  • UNICODE, i.e. ROOT (common) for all locale-aware collation
    • ('') in SQLite - provided by the ICU extension
    • utf8_unicode_ci in MySQL
    • default in Kexi 2.4+

TODO: improve this list (other backends...)

Links:

IDEA: Raw collations. User agrees to select backend-dependent collation (maybe even installation-dependent). Kexi marks this fact on its own, so this information is useful in the database migration tool.

Links