Projects/Nepomuk/FileIndexing: Difference between revisions
Appearance
Changed KDE to KDE apps and platform and corrected some typos |
|||
Line 1: | Line 1: | ||
Nepomuk currently acts as the file indexer for KDE. Even though we frequently tout that we are not just a file indexer, we need to index the files properly. | Nepomuk currently acts as the file indexer for the KDE platform and KDE applications. Even though we frequently tout that we are not just a file indexer, we need to index the files properly. | ||
= File | = File indexing solutions = | ||
== Strigi == | == Strigi == | ||
KDE 4.9, currently uses libstreamanalyzer to index the files. Current problems with strigi - | The KDE platform and applications in version 4.9, currently uses libstreamanalyzer to index the files. Current problems with strigi - | ||
* Difficult to contribute to | * Difficult to contribute to | ||
Line 30: | Line 30: | ||
We | We just use exiv2 and cover almost everything. Plus the code would be super simple. | ||
== Videos == | == Videos == | ||
Line 45: | Line 45: | ||
** mobi | ** mobi | ||
** spreadsheet formats | ** spreadsheet formats | ||
** | ** presentation Formats | ||
** lyx | ** lyx | ||
** tex | ** tex | ||
Line 63: | Line 63: | ||
* ISO images | * ISO images | ||
* Executable | * Executable files |
Revision as of 16:58, 10 September 2012
Nepomuk currently acts as the file indexer for the KDE platform and KDE applications. Even though we frequently tout that we are not just a file indexer, we need to index the files properly.
File indexing solutions
Strigi
The KDE platform and applications in version 4.9, currently uses libstreamanalyzer to index the files. Current problems with strigi -
- Difficult to contribute to
- No documentation
- Un-maintained
- Does not reuse libraries
Lists the current status of indexing different files.
Roll our own?
File Formats
We list down all the different file formats, and which all are supported by the different file indexing solutions.
Images
- JPEG - Use exiv - strigi also uses exiv - currently broken
- PNG - Strigi rolls its own - detects the application name, color depth and interlace mode as well
- GIF - there isn't much metadata
- EXIF
- TIFF
- BMP
- SVG - Strigi stores them as plain text
We just use exiv2 and cover almost everything. Plus the code would be super simple.
Videos
Audio
- MP3
Documents
- doc
- docx
- odf
- pdfs
- epub
- mobi
- spreadsheet formats
- presentation Formats
- lyx
- tex
- cbz - Comic books
- Archives
- tar
- gzip
- whatever ..
- Emails
- There was a bug report
- Text Files
- Text files
- Source Code
- ISO images
- Executable files