Jump to content

Projects/Nepomuk/FileIndexing: Difference between revisions

From KDE Community Wiki
Vhanda (talk | contribs)
Listed different file formats
 
Vhanda (talk | contribs)
No edit summary
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Lists the current status of indexing different files -
This page attempts to catalogue the list of files formats Nepomuk supports, and what formats are remaining.


* Images
= Mime Types =
** JPEG
** PNG
** GIF
** EXIF
** TIFF
** BMP
** SVG


* Videos
{| class="wikitable" style="text-align: center;"
! MimeType
! Status
! Plugin
! Comments


* Audio
|-
** MP3
| image/jpeg
| Testing
| Exiv2Extractor
| No Comments


* Documents
|-
** doc
| image/png
** docx
| Testing
** odf
| Exiv2Extractor
** pdfs
| -
** epub
** mobi
** spreadsheet formats
** Presentation Formats
** lyx
** tex
** cbz - Comic books


* Archives
|-
** tar
| image/gif
** gzip
| ?
** whatever ..
| ?


* Emails
|-
** There was a bug report
| image/exif


* Text Files
|-
** Text files
| image/tiff
** Source Code


* ISO images
|-
* Executable Files
| image/bmp
 
|-
| image/svg
 
|-
| audio/mpeg
| Requires Polish
| Taglib Extractor
 
|-
| audio/mp4
 
|-
| audio/wav
 
|-
| audio/x-aiff
 
|-
| application/pdf
| Implemented - Requires Testing
| PopplerExtractor
| ---
 
|-
| Other Office Formats
| ?
 
|-
| Ebook Formats
| ?
 
|-
| Archives
| ?
 
|-
| video/mpeg
| Testing
| FFmpeg
 
|-
| video/x-msvideo
| Testing
| FFmpeg
 
|-
| Other video formats
| ?
 
|-
| text/plain
| Plain Text Extractor
| Implemented
| This should be extended to support other text files
 
|}
 
= Notes =
 
== Documents ==
 
=== Microsoft Formats ===
DOC - OLE 2 Compound Document and Office Open XML - Custom parser by Strigi. What can we use? <br\>
XSL - http://qt-project.org/wiki/Handling_Microsoft_Excel_file_format <br\>
spreadsheet formats <br\>
 
Maybe we can use some libreoffice or calligra libraries?
 
=== Open document formats ===
 
ODF - Strigi had their own inbuilt. What are our options?
 
=== Ebook formats ===
* epub - Strigi reuses their ODF parser for epub. We could use libepub
* mobi
* rtf
* lrf
 
Checkout what Okular uses for all these files and use that.
 
=== Other ===
* lyx
* tex
* cbz - Comic books
 
== Archives ==
 
We just need to add the <tt>nfo:Archive</tt> type based on the mimetype. Is there anything else that we can add?
 
== Emails ==
* mbox format - How? Something from pim?

Latest revision as of 01:23, 6 November 2012

This page attempts to catalogue the list of files formats Nepomuk supports, and what formats are remaining.

Mime Types

MimeType Status Plugin Comments
image/jpeg Testing Exiv2Extractor No Comments
image/png Testing Exiv2Extractor -
image/gif ? ?
image/exif
image/tiff
image/bmp
image/svg
audio/mpeg Requires Polish Taglib Extractor
audio/mp4
audio/wav
audio/x-aiff
application/pdf Implemented - Requires Testing PopplerExtractor ---
Other Office Formats ?
Ebook Formats ?
Archives ?
video/mpeg Testing FFmpeg
video/x-msvideo Testing FFmpeg
Other video formats ?
text/plain Plain Text Extractor Implemented This should be extended to support other text files

Notes

Documents

Microsoft Formats

DOC - OLE 2 Compound Document and Office Open XML - Custom parser by Strigi. What can we use? <br\> XSL - http://qt-project.org/wiki/Handling_Microsoft_Excel_file_format <br\> spreadsheet formats <br\>

Maybe we can use some libreoffice or calligra libraries?

Open document formats

ODF - Strigi had their own inbuilt. What are our options?

Ebook formats

  • epub - Strigi reuses their ODF parser for epub. We could use libepub
  • mobi
  • rtf
  • lrf

Checkout what Okular uses for all these files and use that.

Other

  • lyx
  • tex
  • cbz - Comic books

Archives

We just need to add the nfo:Archive type based on the mimetype. Is there anything else that we can add?

Emails

  • mbox format - How? Something from pim?