Baloo: Difference between revisions
m →Indexing limitations: simpler RedHat bug search for catdoc |
→Using Baloo: mention baloosearch and the *6 versions |
||
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[File:Mascot konqi-support-search.png|thumbnail|right|Help [[Konqi]] find what he wants!]] | [[File:Mascot konqi-support-search.png|thumbnail|right|Help [[Konqi]] find what he wants!]] | ||
Baloo is the file indexing and file search framework for KDE Plasma, with a focus on providing a very small memory footprint along with with extremely fast searching. | Baloo is the file indexing and file search framework for KDE Plasma, with a focus on providing a very small memory footprint along with with extremely fast searching. | ||
== User documentation == | |||
[https://github.com/KDE/baloo/blob/master/docs/user/searching.md User documentation] for search types and document properties that Baloo indexes | |||
== Ways to communicate == | == Ways to communicate == | ||
:Mailing List: [email protected] ([https://mail.kde.org/mailman/listinfo/kde-devel info page]) | :Mailing List: [email protected] ([https://mail.kde.org/mailman/listinfo/kde-devel info page]) | ||
:IRC Channel: #kde-devel on Libera Chat | :IRC Channel: [https://web.libera.chat/ #kde-devel on Libera Chat] | ||
:Phabricator project: https://phabricator.kde.org/project/view/261 | :Phabricator project: https://phabricator.kde.org/project/view/261 | ||
Line 19: | Line 23: | ||
** Due to a [https://gitlab.gnome.org/GNOME/glib/-/issues/2511#note_1293471 glib bug], the MIME type of HTML files can change from <code>text/html</code> to <code>application/x-extension-html</code>. The KDE file metadata extractors don't recognize the latter. That bug has a workaround to reset the MIME types to the usual values. | ** Due to a [https://gitlab.gnome.org/GNOME/glib/-/issues/2511#note_1293471 glib bug], the MIME type of HTML files can change from <code>text/html</code> to <code>application/x-extension-html</code>. The KDE file metadata extractors don't recognize the latter. That bug has a workaround to reset the MIME types to the usual values. | ||
* KFileMetadata uses the aging utilities <code>catdoc</code>, <code>xls2csv</code>, <code>catppt</code> to index content of files using the Microsoft Office Word, Excel, and PowerPoint file formats ([https://invent.kde.org/frameworks/kfilemetadata/-/blob/master/src/extractors/officeextractor.cpp#L20 source]), and these utilities have undocumented limitations ([https://bugs.kde.org/show_bug.cgi?id=438455 bug 438455]). | * KFileMetadata uses the aging utilities <code>catdoc</code>, <code>xls2csv</code>, <code>catppt</code> to index content of files using the Microsoft Office Word, Excel, and PowerPoint file formats ([https://invent.kde.org/frameworks/kfilemetadata/-/blob/master/src/extractors/officeextractor.cpp#L20 source]), and these utilities have undocumented limitations ([https://bugs.kde.org/show_bug.cgi?id=438455 bug 438455]). | ||
** [http://www.wagner.pp.ru/~vitus/software/catdoc/ catdoc home page] | |||
** [https://bugs.debian.org/cgi-bin/pkgreport.cgi?repeatmerged=no&src=catdoc Debian's bug list for catdoc]; [https://bugzilla.redhat.com/buglist.cgi?quicksearch=catdoc RedHat's bug list for catdoc] | ** [https://bugs.debian.org/cgi-bin/pkgreport.cgi?repeatmerged=no&src=catdoc Debian's bug list for catdoc]; [https://bugzilla.redhat.com/buglist.cgi?quicksearch=catdoc RedHat's bug list for catdoc] | ||
* KFileMetadata does not index file names or file contents in ZIP archives. | * KFileMetadata does not index file names or file contents in ZIP archives. | ||
Line 25: | Line 30: | ||
Other limitations: | Other limitations: | ||
* Baloo doesn't index text files (those whose MIME type is detected as "text/''something''") over 10 MB ([https://invent.kde.org/frameworks/baloo/-/blob/master/src/file/extractor/app.cpp#L143 source]). | * Baloo doesn't index text files (those whose MIME type is detected as "text/''something''") over 10 MB ([https://invent.kde.org/frameworks/baloo/-/blob/master/src/file/extractor/app.cpp#L143 source]). | ||
* The KFileMetadata extractor for text attempts to convert text to Unicode. If the file uses another encoding, such as iso-8859-1, any file contents after the first character that is invalid in Unicode will not be indexed ([https://bugs.kde.org/show_bug.cgi?id= | * The KFileMetadata extractor for text attempts to convert text to Unicode. If the file uses another encoding, such as iso-8859-1, any file contents after the first character that is invalid in Unicode will not be indexed ([https://bugs.kde.org/show_bug.cgi?id=440537 bug 440537]). You may find the <code>-i</code> option to the <code>file</code> command-line utility useful; it tries to infer the character set of a file, e.g. <kbd>file -i ''path/to/myfile.txt''</kbd>. You can use the <code>iconv</code> command-line utility to report invalid encodings and convert encodings to UTF-8. | ||
* If a file's modification time is January 1 1970 ("zero" in the Unix epoch) or earlier, baloo will reindex it each time it starts (or you run | * If a file's modification time is January 1 1970 ("zero" in the Unix epoch) or earlier, baloo will reindex it each time it starts (or you run <kbd>balooctl check</kbd>) ([https://bugs.kde.org/show_bug.cgi?id=456108 bug 456108]), and <code>balooshow</code> will be confused about the file's "Mtime" if it is before January 1 1970. As a workaround you can change th e modification time to something after 1970, e.g. <kbd>touch -m --date=2022-01-01 path/to/myfile</kbd>. | ||
* [https://discuss.kde.org/t/how-do-i-troubleshoot-baloo/2830/12 Some users] report that baloo doesn't properly index some files extracted from zip or JAR files. A workaround is to clear them from baloo's index then reindex them. with <kbd>balooctl clear ''/path/to/file''</code> then <kbd>balooctl index ''/path/to/file''</kbd> . | |||
== Other Baloo pages here == | == Other Baloo pages here == | ||
Line 37: | Line 43: | ||
KDE System Settings > File Search provides an [http://vhanda.in/blog/2014/04/desktop-search-configuration/ intentionally limited number of settings]. You can make additional adjustments in [[Baloo/Configuration | Baloo's configuration file]]. | KDE System Settings > File Search provides an [http://vhanda.in/blog/2014/04/desktop-search-configuration/ intentionally limited number of settings]. You can make additional adjustments in [[Baloo/Configuration | Baloo's configuration file]]. | ||
There is a command-line search utility, <kbd>baloosearch</kbd> | |||
=== Plasma 6 versions === | |||
In some Linux distributions that ship Plasma 6, the latest command-line utilities may have a <kbd>6</kbd> appended, thus <kbd>balooctl6</kbd>, <kbd>baloosearch6</kbd>, and <kbd>balooshow6</kbd>. | |||
== balooctl == | == balooctl == |
Latest revision as of 08:03, 5 May 2024
Baloo is the file indexing and file search framework for KDE Plasma, with a focus on providing a very small memory footprint along with with extremely fast searching.
User documentation
User documentation for search types and document properties that Baloo indexes
Ways to communicate
- Mailing List: [email protected] (info page)
- IRC Channel: #kde-devel on Libera Chat
- Phabricator project: https://phabricator.kde.org/project/view/261
Top bugs and feature requests
Bugs: https://bugs.kde.org/buglist.cgi?bug_severity=critical&bug_severity=grave&bug_severity=major&bug_severity=crash&bug_severity=normal&bug_severity=minor&bug_status=UNCONFIRMED&bug_status=CONFIRMED&bug_status=ASSIGNED&bug_status=REOPENED&list_id=1629910&priority=VHI&priority=HI&product=frameworks-baloo&query_format=advanced
Feature requests: https://bugs.kde.org/buglist.cgi?bug_severity=wishlist&bug_status=UNCONFIRMED&bug_status=CONFIRMED&bug_status=ASSIGNED&bug_status=REOPENED&list_id=1629911&priority=VHI&priority=HI&product=frameworks-baloo&query_format=advanced
Indexing limitations
Baloo uses the file metadata extractors in KFileMetadata to get information about each file it indexes. This means for a file's content to be indexed
- the file must have a recognizable MIME type
- KDE must have an extractor for that MIME type. Use the command-line utility
kmimetypefinder5
to determine a file's mime type.- Due to a glib bug, the MIME type of HTML files can change from
text/html
toapplication/x-extension-html
. The KDE file metadata extractors don't recognize the latter. That bug has a workaround to reset the MIME types to the usual values.
- Due to a glib bug, the MIME type of HTML files can change from
- KFileMetadata uses the aging utilities
catdoc
,xls2csv
,catppt
to index content of files using the Microsoft Office Word, Excel, and PowerPoint file formats (source), and these utilities have undocumented limitations (bug 438455). - KFileMetadata does not index file names or file contents in ZIP archives.
- KFileMetadata does not index the contents of Open Document Format files that are ZIP archives, nor does it index "flat" Open Document Format files that are complex XML files.
Other limitations:
- Baloo doesn't index text files (those whose MIME type is detected as "text/something") over 10 MB (source).
- The KFileMetadata extractor for text attempts to convert text to Unicode. If the file uses another encoding, such as iso-8859-1, any file contents after the first character that is invalid in Unicode will not be indexed (bug 440537). You may find the
-i
option to thefile
command-line utility useful; it tries to infer the character set of a file, e.g. file -i path/to/myfile.txt. You can use theiconv
command-line utility to report invalid encodings and convert encodings to UTF-8. - If a file's modification time is January 1 1970 ("zero" in the Unix epoch) or earlier, baloo will reindex it each time it starts (or you run balooctl check) (bug 456108), and
balooshow
will be confused about the file's "Mtime" if it is before January 1 1970. As a workaround you can change th e modification time to something after 1970, e.g. touch -m --date=2022-01-01 path/to/myfile. - Some users report that baloo doesn't properly index some files extracted from zip or JAR files. A workaround is to clear them from baloo's index then reindex them. with balooctl clear /path/to/file then balooctl index /path/to/file .
Other Baloo pages here
Information may be obsolete.
Using Baloo
Baloo is not an application, but a daemon to index files. Applications can use the Baloo framework to provide file search results. For example, Dolphin's Content search can use Baloo.
KDE System Settings > File Search provides an intentionally limited number of settings. You can make additional adjustments in Baloo's configuration file.
There is a command-line search utility, baloosearch
Plasma 6 versions
In some Linux distributions that ship Plasma 6, the latest command-line utilities may have a 6 appended, thus balooctl6, baloosearch6, and balooshow6.
balooctl
balooctl
is a CLI command to perform certain operations on Baloo. Enter balooctl --help
in a terminal app such as userbase:Konsole to list its available subcommands.
See also Baloo/Debugging.