Jump to content

Digikam/GSoC2019/FacesManagementWorkflowImprovements

From KDE Community Wiki
Revision as of 14:44, 10 March 2019 by Smueller (talk | contribs) (Project tasks)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

Hello reader,
We begin with a little story, explaining how all the digiKam face recognition related features became a GSoC project.
All began in early 2018 as the thread either face recognition screen is buggy or I still don't understand it - at least I can say that more convenient bulk change of face tags (no auto refresh/set faces via context menu) is neccessary took off. Eventually, it found its course in early 2019 what convinced the maintainer of digiKam to refurbish these features earlier than originally considered. The post what made this change was written on the 01.Feb.2019 and describes quite well what has to be polished and redesigned, respectively. If you read the post, you will notice that it content goes beyond the pure face management workflow.


the overall face detection, recognition and management workflow

Before this article goes into the details, an overall description of all involved parts is given in corresponding order.

  1. the faces detection
    It is a group of algorithms to analyse the content of images, identify the distinctive regions such as eyes, nose, mouth, etc. Most of them are OpenCV based, and work mostly fine in the background (excepted some technical issues with OpenGL cards acceleration used by OpenCV which introduce instability, but it's another challenge). These algorithms generate region where a face can be found, typically a rectangle. These areas are written as digikam internal information in digiKams core database. That information will not be added to the metadata of the images yet as this happens during the face recognition workflow, what is explained further down.

  2. the faces detection
    This introduces the four different methods based on different algorithms, more and less functional. The goal is to be able to recognize automatically a non-tagged face from images, using previous face tags registered in the database. The algorithms are complex but explained in more details in the wiki page for the GSoC faces recognition project. The 4 different methods are explained here in brief only, a more detailed description can be found in Digikam/GSoC2019/AIFaceRecognition

    1. Deep Neural Network (DNN) Dlib C++ Library
      DigiKam has already an experimental implementation of Neural Network to perform faces recognition what is rather proof of concept than a production-ready function. This DNN is based on the Dlib implementation in OpenFace project.

    2. OpenCV - Local Binary Patterns Histograms (LBPH)
      This is the most complete implementation of a face detection algorithm. Moreover, it is the oldest implementation of such an algorithm in digiKam. It's not perfect and requires at least six faces already tagged manually by the user to identify the same faces in non-tagged images.

    3. OpenCV - Eigen Faces
      An alternative algorithm what uses the OpenCV backend. It was introduced to have a different source of results for face detection, enabling to proof the DNN approaches.

    4. OpenCV - Fisher Face
      Another algorithm what uses the OpenCV backend. It was introduced for the same purposes as Eigen Faces.
      According to rumours, this one is not finalized, it is said that not all methods are implemented.

  3. The faces workflow
    This is the actual subject of this article where the search for a student(s) for the GSoC 2019 is ongoing. There are not any complex algorithms involved here.
    That is where we switch from the backend, the digiKam core, to the frontend, the GUI. There are numberless posts what could be improved or what is missing. The goal is to answer all those to make the entire workflow flawless, allowing to be widely accepted and enjoyed by the users. This requires some significant effort to assist and guide the student(s) to achieve the desired outcome. As that is related to coding the maintainers would like to so the community to take over here, to take off workload from them and enabling us users to steer the process from a user perspective. The maintainer would only ensure the quality of the code.
    The overall face workflow will not change that much, the changes are mainly under the hood, as mentioned in the chapter above. The process is
    1. Detect
    2. suggest faces
    3. user confirms / correct

    but there are many ways to achieve this. That is the place where the hard work begins. The following section tries to give guidance to the entire retrofit process aiming at collecting, outlining and streamlining all suggestions to ensure consistency and intuitive face workflow.

    I mentioned that, since as far as I know that content of person-related metadata fields are not taken into account when you search or filter a collection by certain keywords. Thus, in order to make the names findable by digiKam, the name has to be added to the keywords related metadata fields to make the magic happen.

Participation

This is a break-down fo the description of how to participate in the Summer of Code program with KDE.

  1. AS A MAINTAINER
    As a maintainer, you are responsible to know the digiKam to source code and check pull requests of the student(s) before they are merged into the master. In addition, you are the contact person for questions of the student(s) in regard to the code and ensure that the student(s)’s documentation is satisfactory.

  2. A MENTORING USER
    If you wish you more than welcome to contact any of the current users, get you an account on kde and joining the discussion here, in the mailing list and begging contributing to this article.
    Volunteered users are:
    Stefan Müller (user coordinator)

  3. AS A STUDENT
    Typically, the student must review all related Bugzilla entries given in the corresponding Bugzilla section of the project. If this project or the Bugzilla does not provide enough guidance, the student(s) must identify the top level entries to engage but with help by the listed mentors. The student is expected to work autonomous technically-wise, so the answers to challenges will not be found independently of the support of the maintainer. This does not mean that the maintainers cannot be reached by the student. Guidance will be given at any time in any case but shall that be limited to occasional situations to allow the maintainers to follow up on their work.
    Regardless of the above-mentioned channel of communication, the maintainers review and validate the code in their development branch bevor merging it to the master branch. Besides coding, it is required to submit a technical proposal, wherein is to list :
    • the problematic,
    • the code outlining, being merged into the master branch
    • the tests
    • the overall project plan for this summer,
    • documentation to write (mostly in code), etc.

Project tasks

All relevant bug reports can be found in

In the following is it tried to group them in major tasks, to give the students detailed guidance on how to close the bug reports

 
Under Construction
This is a new page, currently under construction!

latest email converstation not refelected yet

  1. SEPARATION BETWEEN TAGS AND FACES (by Stefan Müller)
    Many players in the media business, such as Adobe, use the expression tag for anything related to metadata others separating between the different types of metadata.
    All metadata records are stored in fields (see e.g. photometadata.org) which also often called tags (of the metadata), so a tag is anything that is used in digiKam to filter or search for images, e.g. keywords, colour label, star rating etc... Thus there is to much space for interpretation what leads to all these questions due to irritations caused by the use of the word tag. In order to lower the entry hurdle into the world of tagging I would suggest to be consistent with the official wording, thus new users won't be confused by this. That means that the text for the tag will be named keyword, so on the source selection pane on the left will be Keywords and in filter pane on the right, it will say Keywords Filter. The description shall rather say close to digital deals with metadata, grouped in (tags of): keywords, label, date and location.

  2. Ensure that all relevant metadata fields are filled (by Stefan Müller)
    At the end, as soon as a name is confirmed digiKam writes the data to the MP and MWG namespace of the XMP records, it sets a name and area.
    More Details about those namespaces can be found here: as Apple and Adobe write their information in the MWG namespace, I would say that MWG is the leading namespace but inconsistent may lead to unexpected behaviour of the applications what reads them. In my understanding, this information should also be written to the IPTC Person structure as mentioned in the IPTC Photo Metadata User Guide (Persons Depicted in the Image), but is not. It needs to be clarified and documented why that does not happen and may be corrected. Link face region with face name properly In order to make images findable by a person's name, the name shall also be written to the keywords field of multiple namespaces, IPTC Photo Metadata User Guide (Persons Depicted in the Image) recommends caption and keywords. I cannot tell all relevant fields/namespaces. My research tells me that should be at least those: The following has a field but ignored by digiKam, why? but to be excluded shall:
    • XMP xmp Tags, as it says non-standard: ExifTool
    • XMP xmpMM Tags as it says undocumented: ExifTool
    • XMP pdf Tags: ExifTool -> only for Adobe PDF
    I reckon there isn't any leading field as a mismatch could lead to an inconsistent search result, depending heavily on the application being used. It needs to be clarified and documented why that does not happen any may be corrected.


If I'm correct, what is the source of the list of the people pane on the left? In my opinion there are three options. 1. First, these are the keywords listed below the hierarchy level persons in the keyword list. If the user selects an name it filters images based on the keywords and shows the face area as described in the person related metadata field. 2. Second, digiKam reads the information given in the person related fields of the metadata of each image in this particular case . Afterwards it uses this data to populate the person pane. That would be quite of workload on the CPU and isn't very likely. 3. Third, it stores the information given in person related fields of the metadata of each image in the database recognition.db. Based on the information stored there digiKam knows which images are to be shown. In this case, are the face thumbnails are stored in this database as well or are they derived from each image, based on the region information?


In addition I would like to see some changes in regard to the unkown faces thumbnails. Those wishes are most likely discussed in other bug reports. For convenience I listed those created by woenx and mine again. As you see most wishes are still unresolved and mine will mostly a duplicate of presents ones I'll list them anyway in order to highlight their necessity. I would like to be able to

  1. stop auto refresh of the thumbnails to avoid confirming a wrong face accidentality. It is a pain in the arse to undo such accidents
  2. sort them at least by guessed faced.
  3. It would be preferred if sorting in any view is possible by any property what can be used to filter items
  4. drag and drop selected faces over an Person Name
  5. assign Person Name via right click menu as possible for tags
  6. group similar faces in "Unknown" faces


  1. Bugs
    1. Bug 392013: Metadata explorer does not show XMP face rectangles
    2. Bug 392017: Merging, renaming and removing face tags
    3. Bug 392009: Weird automatic subtag within "Unknown people" called "da"
    4. Bug 392008: Inconsistent behaviour of "People" Tag
  2. Wishes
    1. Wish 275671: Scan single image for faces
    2. Wish 392015: Show "Unknown" faces in a more visible and preeminent place in the "People" list
    3. Wish 392007: Face tags and regular tags are mixed together and cannot be told apart
    4. Wish 392016: Confirmed and unconfirmed faces look the same in a person's face list
    5. Wish 392020: No possible way of knowing which pictures within a regular tag have been face-tagged
    6. Wish 392022: Position of a face tag appears on top or bottom of the list, instead of being sorted alphabetically
    7. Wish 392023: Feature request: add "Ignored" group of faces:
    8. Wish 392024: Feature request: group similar faces in "Unknown" faces
      Wish 384396: Wish: display faces sorted by similarity (pre-grouped) instead of album/time/..
    9. Wish 386291: only refresh found face list/pane upon user request
    10. Wish 254099: SCAN : refresh collection with a script in commandline