Digikam/GSoC2010/DatabaseChanges
Digikam GSoC 2010 Database Schema Changes
This page is meant to gather ideas and information and the schema changes needed for the three Digikam GSoC projects.
Requirements and Considerations
General
Whenever possible I would like to preserve upwards compatibility, that is digikam 0.10 - 1.3 should just work with the schema-updated database, even if not using new features.
Non-Destructive Editing
We will store the version history of a file in an XML format in its metadata. The same information need also be stored in the database.
One approach is to take the history, break it into individual steps, groups steps to changesets, and assign a changeset to two images (describing the development between them). This is a relatively complex design requiring at least three tables. Another problem appears when images in the line of history are deleted; then changesets need to be merged. We need a more simple approach that does not break when images in the line of history disappear.
The simplest approach is to store the history XML, starting from the original, for each picture. The XML could be the same as stored in the metadata, containing hints at intermediate versions. But one advantage we have in the database over metadata is keeping track of file identities over changes of filename, file size or unique hash (the database is simply updated in these cases). We must make use of and preserve this advantage for storing changes as well.
Assigning unique ids (UUIDs) to edited images, as already described, can help us here. We can keep a table of imageid, UUID, history XML where the XML only contains UUID hints at files. Whenever filename/size/uniqueHash change, the database is kept up to date.
For fast searching, I propose additionally a thin table about image relations. For each image, all images it relates to, as original or intermediate, are then available fast and this information can be used for search queries.
Face recognition
For face recognition, we need to assign extra information to a tag and to a tag assignment:
For a tag, we want to add information which person is represented, link with the identifier in the face recognition data collected with libface, and ideally refer to this person in the Nepomuk storage (which will, sooner or later but invariably, give full integration with Akonadi, thus addressbook, email, appointments etc.). I suggest a TagProperties table which can contain arbitrary information about a tag.
For a tag assignment, we want now to tag not the picture as a whole but only a certain region of the picture. We cannot yet know if this part will always be a rectangle, so I suggest not to limit ourselves in the schema in this regard. Instead, we can use SVG-style XML or other text to describe the region(s). Similarly to above, I suggest an ImageTagProperties table which can contain arbitrary information about a tag assignment.
Reverse Geocoding
This project can also make use of extra information on a tag, "giving tags a meaning" as it is termed in a long-standing bug report. This includes describing a tag named "Paris" as a City or linking it to the Nepomuk storage (where the Pimo ontology should allow it to be described as a city)
Suggestions
Image History
CREATE TABLE ImageHistory
(imageid INTEGER PRIMARY KEY,
uuid TEXT,
history TEXT,);
CREATE INDEX uuid_index ON ImageHistory (uuid);
CREATE TABLE ImageRelations
(subject INTEGER,
object INTEGER,
type INTEGER);
CREATE INDEX subject_relations_index ON ImageRelations (subject);
CREATE INDEX subject_relations_index ON ImageRelations (object);
I am pondering whether to use INTEGER or TEXT for the type attribute.
If we want a small and concise table, INTEGER seems better currently.
Tags
CREATE TABLE TagProperties
(tagid INTEGER,
property TEXT,
value TEXT
);
CREATE INDEX tagproperties_index ON TagProperties (tagid);
CREATE TABLE ImageTagProperties
(imageid INTEGER,
tagid INTEGER,
property TEXT,
value TEXT
);
CREATE INDEX imagetagproperties_index ON ImageTagProperties (imageid, tagid);
There is deliberately no UNIQUE constraint on (tagid), and (imageid, tagid) respectively, to allow multiple entries per tag or assignment.