KDE PIM/Akonadi Next/Store: Difference between revisions
Cmollekopf (talk | contribs) Created page with "== Store access == Access to the entities happens through a well defined interface that defines a property-map for each supported domain type. A property map could look like: ..." |
Cmollekopf (talk | contribs) No edit summary |
||
Line 27: | Line 27: | ||
The store consists of entities that have each an id and a set of properties. Each entity is versioned. | The store consists of entities that have each an id and a set of properties. Each entity is versioned. | ||
A | A entity is uniquely identified by: | ||
* Resource + Id | * Resource + Id | ||
The additional revision identifies a specific instance/version of the entity. | The additional revision identifies a specific instance/version of the entity. | ||
Line 36: | Line 36: | ||
</pre> | </pre> | ||
== Store | == Store Entities == | ||
Each | Each entity can be as normalized/denormalized as useful. It is not necessary to have a solution that fits everything. | ||
Denormalized: | Denormalized: | ||
Line 74: | Line 74: | ||
The advantage of this is that a resource only needs to specif icy a minimal set of properties, while everything else is taken care of by the local-only buffer. This is supposed to make it easier for resource implementors to get something working. | The advantage of this is that a resource only needs to specif icy a minimal set of properties, while everything else is taken care of by the local-only buffer. This is supposed to make it easier for resource implementors to get something working. | ||
== Databases == | |||
By design we're interested in key-value stores or perhaps document databases. This is because a fixed schema is not useful for this design, which makes | |||
SQL not very useful (it would just be a very slow key-value store). While document databases would allow for indexes on certain properties (which is something we need), we did not yet find any contenders that looked like they would be useful for this system. | |||
=== Requirements === | |||
* multi-thread and multi-process concurrency with single writer and multiple readers. | |||
** This is required so we don't block clients while a resource is writing and deemed essential for performance and to reduce complexity. | |||
* Reasonably fast so we can implement all necessary queries with sufficient performance | |||
* Can deal with large amounts of data | |||
* On disk storage with ACID properties. | |||
* Memory consumption is suitable for desktop-system (no in-memory stores). | |||
Other useful properties: | |||
* Is suitable to implement some indexes (the fewer tools we need the better) | |||
* Support for transactions | |||
* Small overhead in on-disk size | |||
=== Contenders === | |||
* LMDB | |||
** support for mmapped values | |||
** good read performance, ok write performance | |||
** fairly complex API | |||
** seems to have a high on-disk overhead (2x it seems) | |||
** size limit of 4GB due to size_t? Otherwise it is supposed to perform well with large db's. | |||
* rocksdb | |||
** => no multiprocess | |||
* kyotocabinet | |||
** fast, low on-disk overhead, simple API | |||
** => no multiprocess | |||
* hamsterdb | |||
** => no multiprocess | |||
* sqlite4 | |||
** not yet released | |||
== Useful Resources == | == Useful Resources == |
Revision as of 14:18, 8 December 2014
Store access
Access to the entities happens through a well defined interface that defines a property-map for each supported domain type. A property map could look like:
Event { startDate: QDateTime subject: QString ... }
This property map can be freely extended with new properties for various features. It shouldn't adhere to any external specification and exists solely to define how to access the data.
Clients will map these properties to the values of their domain object implementations, and resources will map the properties to the values in their buffers.
Storage Model
The storage model is simple:
Entity { Id Revision { Revision-Id, Property* }+ }*
The store consists of entities that have each an id and a set of properties. Each entity is versioned.
A entity is uniquely identified by:
- Resource + Id
The additional revision identifies a specific instance/version of the entity.
Uri Scheme:
akonadi://resource/id:revision
Store Entities
Each entity can be as normalized/denormalized as useful. It is not necessary to have a solution that fits everything.
Denormalized:
- priority is that mime message stays intact (signatures/encryption)
- could we still provide a streaming api for attachments?
Mail { id mimeMessage }
Normalized:
- priority is that we can access individual members efficiently.
- we don't care about exact reproducability of e.g. ical file
Event { id subject startDate attendees ... }
Of course any combination of the two can be used, including duplicating data into individual properties while keeping the complete struct intact. The question then becomes though which copy is used for conflict resolution (perhaps this would result in more problems than it solves).
Optional Properties
For each domain type, we want to define a set of required and a set of optional properties. The required properties are the minimum bar for each resource, and are required in order for applications to work as expected. Optional properties may only be shown by the UI if actually supported by the backend.
However, we'd like to be able to support local-only storage for resources that don't support an optional property. The value of each object thus consists of:
[Resource buffer][Local-only buffer]
Each resource can freely define how the properties are split, while it wants to push as many as possible into the left side so they can be synchronized. Note that the resource is free to add more properties to it's synchronized buffer even though they may not be required by the specification.
The advantage of this is that a resource only needs to specif icy a minimal set of properties, while everything else is taken care of by the local-only buffer. This is supposed to make it easier for resource implementors to get something working.
Databases
By design we're interested in key-value stores or perhaps document databases. This is because a fixed schema is not useful for this design, which makes SQL not very useful (it would just be a very slow key-value store). While document databases would allow for indexes on certain properties (which is something we need), we did not yet find any contenders that looked like they would be useful for this system.
Requirements
- multi-thread and multi-process concurrency with single writer and multiple readers.
- This is required so we don't block clients while a resource is writing and deemed essential for performance and to reduce complexity.
- Reasonably fast so we can implement all necessary queries with sufficient performance
- Can deal with large amounts of data
- On disk storage with ACID properties.
- Memory consumption is suitable for desktop-system (no in-memory stores).
Other useful properties:
- Is suitable to implement some indexes (the fewer tools we need the better)
- Support for transactions
- Small overhead in on-disk size
Contenders
- LMDB
- support for mmapped values
- good read performance, ok write performance
- fairly complex API
- seems to have a high on-disk overhead (2x it seems)
- size limit of 4GB due to size_t? Otherwise it is supposed to perform well with large db's.
- rocksdb
- => no multiprocess
- kyotocabinet
- fast, low on-disk overhead, simple API
- => no multiprocess
- hamsterdb
- => no multiprocess
- sqlite4
- not yet released
Useful Resources
- LMDB
- Benchmarks: http://symas.com/mdb/microbench/
- Tradeoffs: http://symas.com/is-lmdb-a-leveldb-killer/
- Disk space benchmark: http://symas.com/mdb/ondisk/