Jump to content

KDb/Drivers/SQLite: Difference between revisions

From KDE Community Wiki
Jstaniek (talk | contribs)
Jstaniek (talk | contribs)
Line 36: Line 36:
By default Kexi forces ROOT collation, i.e. combined collations for all locales.
By default Kexi forces ROOT collation, i.e. combined collations for all locales.


==Collation tests==
===Collation tests===
Example test session in sqlite3 shell:
Example test session in sqlite3 shell:



Revision as of 23:11, 18 December 2011

Note

This page is maintained by Jstaniek


Requirements for libsqlite3 builds

Configure libsqlite3 build as follows:

mkdir builddir
cd builddir
REQUIRED_CPPFLAGS="-DSQLITE_ENABLE_COLUMN_METADATA=1"
REQUIRED_OPTIONS="--enable-load-extension"
RECOMMENDED_CPPFLAGS="-DSQLITE_ENABLE_FTS3=1 -DSQLITE_ENABLE_FTS3_PARENTHESIS=1 \
-DSQLITE_SECURE_DELETE=1 -DSQLITE_ENABLE_MEMORY_MANAGEMENT=1 \
-DSQLITE_ENABLE_RTREE=1 -DSQLITE_ENABLE_STAT2=1 \
-DSQLITE_ENABLE_UPDATE_DELETE_LIMIT=1 -DSQLITE_ENABLE_UNLOCK_NOTIFY=1 \
-DSQLITE_SOUNDEX=1"
RECOMMENDED_OPTIONS="--enable-threadsafe"
CPPFLAGS="$REQUIRED_CPPFLAGS $RECOMMENDED_CPPFLAGS" \
../sqlite-src-???/configure --prefix=.... $REQUIRED_OPTIONS $RECOMMENDED_OPTIONS
  • --enable-load-extension is a must to have support for setting up unicode collations.
  • Optional: append --enable-debug for debug version.
  • During the build, sqlite3 shell will be also build, the build options also affect the shell, e.g. .load command is available only if --enable-load-extension is present.
  • See also explanation in Deploying SQLite blog entry.

libsqlite3 status in distros

  • Debian: OK
  • Windows: OK
    • via KDE on Windows deployment system (handled by KDE packagers)
  • Mac OS X: ??

TODO... please update

Collations

By default Kexi forces ROOT collation, i.e. combined collations for all locales.

Collation tests

Example test session in sqlite3 shell:

'''.load path/to/libkexidb_sqlite3_icu.so'''
-- original order: text with 3 locales mixed (polish, german, greek)
1|aaaa
2|ąąąą
3|β
4|żżżż
5|bbbb
6|ü
7|źźźź
8|ξ
9|α
10|u
-- binary collation, raw sqlite's default
sqlite> select * from collation order by name; 
1|aaaa
5|bbbb
10|u
6|ü
2|ąąąą
7|źźźź
4|żżżż
9|α
3|β
8|ξ
-- ROOT collation, Kexi 2.4+ default
sqlite> SELECT icu_load_collation('', '');
sqlite> select * from test order by name COLLATE '';
1|aaaa
2|ąąąą
5|bbbb
10|u
6|ü
7|źźźź
4|żżżż
9|α
3|β
8|ξ
-- ROOT collation declared for table column as default collation
sqlite> create table test2 ( name text COLLATE '' );
sqlite> insert into test2 values ( "ć" );
sqlite> insert into test2 values ( "ą" );
sqlite> insert into test2 values ( "Ć" );
sqlite> insert into test2 values ( "a" );
sqlite> insert into test2 values ( "A" );
sqlite> insert into test2 values ( "C" );
-- using the default collation
sqlite> select * from test2 order by name;
a
A
ą
C
ć
Ć
-- forcing binary collation
sqlite> select * from test2 order by name COLLATE BINARY;
A
a
C
ą
Ć
ć
-- "A" > "a" for ROOT collation 
select * from test2 where name > "A"
A
C
ą
Ć
ć
-- upper() function with binary collation
select upper(name) from collation;
AAAA
ąąąą
β
żżżż
BBBB
ü
źźźź
ξ
α
U
-- upper() function with ROOT collation
-- NOTE: it's enough to load the ROOT collation, no need to use upper(name, '')
select upper(name) from collation;
AAAA
ĄĄĄĄ
Β
ŻŻŻŻ
BBBB
Ü
ŹŹŹŹ
Ξ
Α
U

Compatibility with 3rd-party software

SQLite has three simple collation sequence types built-in: NOCASE, RTRIM. Other collations have to be loaded.

The ICU-based collation sequences is provided via the SQLite extension API, so it works both in Kexi SQLite3 driver (which loads the needed collation automatically) and the sqlite3 shell (where .load command can be used). However any 3rd-party software that intends to use tables or queries with these collations have to load the plugin. This can be done either using load_extension() or SQL command or using [http://www.sqlite.org/c3ref/load_extension.html sqlite3_load_extension() C API.

sqlite3_enable_load_extension(1) have to be called before to enable extension loading. Certain tools have no extensions enabled in their usage of SQLite3 and altering of their code may be needed.

When user attempts to use tables or queries with custom collations the following error message appears (note that the message can be hidden by the 3rd-party program):

-- ISSUE 1: executing query on table with column that use custom collation, from the previous example: create table test2 ( name text COLLATE '' )
select * from test2;
-- result:
> Error: no such collation sequence: 
-- workaround is to force BINARY collation:
select * from test2 order by name COLLATE BINARY;
> OK

-- ISSUE 2: executing query that uses custom collation on table with standard collation: select * from test order by name COLLATE ''; > Error: no such collation sequence: -- workaround is to remove COLLATE part so the collation defaults to BINARY: select * from test order by name; > OK

Ensuring Kexi is built with capable version of libsqlite3

For improved modularity and to enable proper use of sqlite3 shell, custom collations are used in Kexi via extensions API of libsqlite3. The extensions API are not enabled by default for libsqlite3 builds because --enable-load-extension option has to be explicitly added to the ./configure command. Read libsqlite3 status in distros section to check status for particular OS distribution.

In order to ensure minimal, extension support is tested at configuration stage of Kexi build process.

How to deal with incapable libsqlite3 versions?

  1. Recommended solution: add needed build option(s) for the official distribution. Make Kexi >= 2.4.0 dependent on this version of renewed libsqlite3 as minimal version.
  2. Alternative solution: create custom package of libsqlite3 (and possibly libsqlite3-dev and sqlite3 shell). Use different names for the packages to enable side-by-side installation, e.g. libsqlite3ext, sqlite3ext, libsqlite3ext-dev, respectively. Make Kexi >= 2.4.0 package dependent on the custom package instead of canonical libsqlite3.

Please consider the alternative solution only as temporary. Switch to the recommended solution as soon as possible.

In any case consider adding also recommended build options in one go (the list is published in section Requirements for libsqlite3 builds. This is beneficial for all of us.

Backward compatibility in Kexi

See Bug 278411 - Add collation sequence for unicode characters to the SQLite driver and make it the default

See Compatibility with 3rd-party software section for two compatibility issues.

Original solution for ISSUE 1 (force BINARY collation) would have to be added to Kexi < 2.4 in order to enable Kexi < 2.4 to open databases created with Kexi 2.4+. This is becasue (unlike qieries) table definitions are stored by SQLite itself so COLLATE is a constraint. So better approach is to skip adding COLLATE '' for column definitions in CREATE TABLE statements and instead to add them at runtime in SQL generated for queries. There is no clear overhead in this solution. behaviour is exactly the same as in the original solution.

Solution for ISSUE 2 is not needed because Kexi does not storequery definitions in SQLite's SQL format but in KEXISQL format. COLLATE '' specifier is added at runtime in Kexi 2.4+ only. In Kexi < 2.4 the specifier is not added and the database defaults to BINARY collation, so there is not change in behaviour in Kexi < 2.4, what is expected.

Enabling collation selection in Kexi

This is related task, far more complex than implementing default collation in Kexi. See Wish #289293 - Allow Assigning/Creating Collating Sequences for Columns in SQLite.

This wish can be implemented in Kexi >= 2.5.

PROBLEM 1: collations support is backend-dependent. E.g. SQLite and MySQL backends have different support for collations.

PROBLEM 2: collations support is installation/database/connection-dependent. E.g.

  • Two MySQL servers can be installed with different settings for default collations, so newly created database can behave differently. Solution to this is to always specify collations explicitly.
  • Two existing MySQL databases can have different default collations.
  • Two connections to the same MySQL server can have different default collations.

TODO: more problems?

Identified major types of collations:

  • BINARY - binary comparison
    • default in raw SQLite, compares string data using memcmp(), so since Kexi assumes that PRAGMA encoding is "UTF-8", this is comparison of UTF-8 data, byte-by-byte
    • In MySQL, equivalent of UTF-8 data byte-by-byte comparison is utf8_bin. latin1_bin can be used under assumption that only latin1 text is used.
    • default in Kexi < 2.4
  • UNICODE, i.e. ROOT (common) for all locale-aware collation
    • ('') in SQLite - provided by the ICU extension
    • utf8_unicode_ci in MySQL
    • default in Kexi 2.4+

SQLite specific:

  • NOCASE - The same as binary, except the 26 upper case characters of ASCII are folded to their lower case equivalents before the comparison is performed. Note that only ASCII characters are case folded.
    • Better to hide it in Kexi, because to user it is not clear this collation is latin1 and usefullness of this collation the is limited to optimizated cases.
    • Instead for SQLite driver, extend the ICU extension to support NOCASE comparison for full unicode texts. But in MySQL this may be hard to reproduce.
  • RTRIM - The same as binary, except that trailing space characters are ignored.
    • Better to hide it in Kexi, because of its rare usefulness.
    • In MySQL: Trailing space handling in comparisons. Nonbinary strings have PADSPACE behavior for all collations, including _bin collations. Trailing spaces are insignificant in comparisons. For binary strings, all characters are significant in comparisons, including trailing spaces.
      • This is very different compared to SQLite, and thus to Kexi file driver. TODO: For the SQLite driver, alter the ICU extension to ignore trailing spaces when passing to ICU. This will fix behaviour for ROOT collation but not for BINARY collation (so TODO: alter sqlite code or add own implementation of BINARY).

TODO: improve this list (other backends...)

Links:

IDEA: Raw collations. User agrees to select backend-dependent collation (maybe even installation-dependent). Kexi marks this fact on its own, so this information is useful in the database migration tool.

Links