Jump to content

GSoC/2019/StatusReports/ThanhTrungDinh: Difference between revisions

From KDE Community Wiki
 
(37 intermediate revisions by the same user not shown)
Line 10: Line 10:
** detect faces across various scales (e.g. big, small, etc.), with occlusion (e.g. sunglasses, scarf, mask etc.), with different orientations (e.g. up, down, left, right, side-face etc.)  
** detect faces across various scales (e.g. big, small, etc.), with occlusion (e.g. sunglasses, scarf, mask etc.), with different orientations (e.g. up, down, left, right, side-face etc.)  


'''Mentors''' : Maik Qualmann, Gilles Caulier, Stefan Müller, Marc Palaus


'''Mentors''' : Maik Qualmann, Gilles Caulier, Stefan Müller
<br>
[[File:intro_facerec2.gif]]
 
<br>


== Important Links ==
== Important Links ==
=== Proposal ===
=== Proposal ===
[https://drive.google.com/open?id=1PeWZIeR3JcgrN5QqbpkYSH8bcZsblMRJ Project Proposal]
[https://drive.google.com/open?id=1PeWZIeR3JcgrN5QqbpkYSH8bcZsblMRJ Project Proposal]
Line 21: Line 24:
[https://invent.kde.org/kde/digikam/tree/gsoc19-face-recognition gsoc19-face-recognition]
[https://invent.kde.org/kde/digikam/tree/gsoc19-face-recognition gsoc19-face-recognition]


=== Contribution ===
=== Important commits ===
* Face Recognition:
** [https://invent.kde.org/kde/digikam/commit/8d34bcc3fa84f2f8c9989ca2479ae3e244b9097f Label prediction with distance to group of similar faces]
** [https://invent.kde.org/kde/digikam/commit/cda731353d501733e9d565cff7fccd1faaf41200 Particular face alignment to improve recognition accuracy]
** [https://invent.kde.org/kde/digikam/commit/424c813236533484fabb012ad1eb909cc0994408 Use cosine similarity to improve recognition accuracy]
** [https://invent.kde.org/kde/digikam/commit/d2b38bfa5ccabe50700cac1cdb1353116a03ace5 Implementation of face recognition based on OpenFace neural network model with OpenCV DNN]
* Face Detection:
** [https://invent.kde.org/kde/digikam/commit/9e634a0df9712c4f86988c2661a94ac991e09b58 Face Detection reimplementation with SSD-MobileNet and OpenCV DNN]
** [https://invent.kde.org/kde/digikam/commit/48f1f6811f6c719a3c5b37c11775b97bf1bbd806 Improvements for SSD-MobileNet]
** [https://invent.kde.org/kde/digikam/commit/5868f104e3ef9f1ab236b888ed7735140e24f119 Restructure Face Detection module and implement YOLOv3 Face Detection with OpenCV DNN]
* Face workflow:
** [https://invent.kde.org/kde/digikam/commit/c1a295aa12df47b01dbfeea0ab1485839a1cf937 Improvements in face workflow and face detection]
** [https://invent.kde.org/kde/digikam/commit/0be666f51d9342259b4f7fdbbf04302f94180dce Clustering with k-means]
** [https://invent.kde.org/kde/digikam/commit/a2cc416487d1e1ece3724c820a025c3b21b4b0f7 Clustering with DBSCAN]
 
<br>
== Contacts ==
'''Email''': [email protected]
 
'''Github''': TrungDinhT
 
'''LinkedIn''': https://www.linkedin.com/in/thanhtrungdinh/
 
<br>


== Work report ==
== Work report ==
=== Bonding period (May 6 to May 27) ===
=== Bonding period (May 6 to May 27) ===
Generally, I familiarized myself with current Deep Learning (DL) based approach for face recognition in digiKam. I picked up the work of Yingjie Liu (the student working on that topic in 2017), investigated his codes, read his proposal, his blog posts and status report in order to understand clearly what he did and what he left. His work led me to FaceNet paper and a C++ implementation of the OpenFace face recognition library. They seemed very potential to my work. In addition, Liu also indicated the results of unit tests on his DL implementation. However, those tests were conducted externally, without using any digiKam preprocessing feature.
Generally, I familiarized myself with current Deep Learning (DL) based approach for face recognition in digiKam and picked up the work of Yingjie Liu ([https://community.kde.org/GSoC/2017/StatusReports/YingjieLiu digiKam GSoC 2017]). His proposal, blog posts and status report led me to [https://arxiv.org/abs/1503.03832 FaceNet paper] and [https://github.com/cmusatyalab/openface/issues OpenFace] - an opensource implementation of neural network (NN) model inspired by FaceNet paper. They seemed very potential to my work. In addition, Liu also implemented unit tests on his DL implementation and their results (benchmark and accuracy), which can be used as a reference for my work later.


For the rest of the bonding period, I decided to read carefully FaceNet paper and also investigated other neural network models in order to select the right model to implement when coding period begins. I also started coding test program, so that I could evaluate more exactly the benchmark of current DL implementation for face recognition in digiKam.
For the rest of the bonding period, I focused on reading carefully FaceNet paper and also looking for other promising NN models to implement when coding period begins. Besides, I also prepared unit tests for face recognition, basing on current test programs and Liu's work.


My plan for next 2 weeks of coding period is:
My plan for next 2 weeks of coding period is to:
* Finish neural network model selection
* Finish NN model selection
* Finish test codes
* Finish unit tests
* Start to port current DL implementation to OpenCV DNN module
* Start to implement chosen NN model with OpenCV DNN


<br>
=== Coding period : Phase one  (May 28 to June 23) ===
=== Coding period : Phase one  (May 28 to June 23) ===
For this phase, my work mostly concentrated on building the "first and dirty" but working prototype of face recognition with OpenCV DNN. In addition, throughout that very first draft, points that need to be improved were revealed, so as to build a better and faster face recognition module.
<br>
<br>
<br>
===== May 28 to June 11 (Week 1 - 2) - Face Recognition got an 8x speed up =====
===== May 28 to June 11 (Week 1 - 2) - Face Recognition got an 8x speed up =====
<br>
<br>
I completed my plan for these 2 weeks. I eventually came up with conclusion on using openface pretrained model, as well as the first draft working implementation with OpenCV DNN. On the other hand, I also finished my test codes and benchmarking for face recognition, and tested exhaustively current implementation of face recognition in digiKam, comparing with my new implementation using OpenCV DNN.
'''DONE'''
* NN model selection -> OpenFace pretrained model.
* First draft working implementation of face recognition with OpenCV DNN and OpenFace pretrained model.
* Unit test for evaluation and benchmarking new implementation (time and accuracy).
* Tested and compared new implementation with Liu's work basing on Dlib, for performance and accuracy.
'''TODO'''
* Improve performance (i.e. speed) and accuracy of new implementation.
* Investigate the effect of better face detection on face recognition accuracy.
<br>
OpenFace model is chosen since it is appropriate for application like digiKam. OpenFace is a pretrained  based on FaceNet paper, taking a cropped, aligned face as input and resulting a 128-D vector representing that face. The vector can be used later to compute similarity to pretagged faces, or clustered into groups of similar faces. For photo management software as digiKam, user has an "open" face database, where an unlimited number of faces and people can be added. Therefore, the face recognition should be flexible enough for extension, while having a decent accuracy. However, false positives are not so critical, since they can be always corrected by user. From all of the reason above, OpenFace is a good choice.
 


Current implementation with Dlib achieves an astonishing accuracy on [https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html orl database]. It reached above 98% accuracy for 112x92 images in orl database, with only 20% of pre-tagged images. However, it took on average 8s for each image, which is too much. New implementation with OpenCV DNN didn't reach that accuracy, but run much faster. It reached more than 80% of accuracy for 20% of pretagged images. However, it only needed about 1.3 s for each image.
Liu's implementation is based on a version of OpenFace pretrained model, customized for compatibility with Dlib. It achieves an astonishing accuracy on [https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html orl database] with above 98% of accuracy for tag prediction on faces with only 20% of pretagged images. However, its drawback is the performance. Face recognition on 112x92 images took on average 8s for a face. In case that the image is bigger, performance is worse.


I was able to identify some problems with new implementation as follow:
* '''Accuracy''': It was on prediction phase when euclidean distance was used as a metric to evaluate if a face is closer to another. There are clues that other types of distance (e.g. cosine similarity) which does not require normalized vector may give better results.
* '''Speed''': a file containing model to compute face landmarks is loaded every time a face needed to recognize. In addition, detection is conducted twice on face, one by OpenCV Haar Cascade face detection and other internally by Dlib. All of them are unnecessary and take a lot of time to finish.
* '''Modularity''': different neural network model requires different kind of preprocessing for input images.


My attempt on a new implementation with OpenCV DNN has shown great potential. While not being able to reach the accuracy of Liu's work, it runs much faster. For my first draft, the processing time for each image is only about 1.3s, while an accuracy of above 80% for 20% of pretagged images was achieved.


More details can be found in my [ blog post].


For final 2 weeks of first coding period, I intends to improve accuracy and speed of that new implementation. On the other hand, I will also investigate the effect of a better face detection model on accuracy of face recognition in digiKam. Scalability problem will be addressed later.
Indeed, talking about the potential of the implementation with OpenCV DNN, I was able to identify the points whose solution is going to improve:
* Accuracy:
** It was on prediction phase when euclidean distance between 128-D vectors was used as a metric to evaluate whether a face is "similar" to another face or not. However, other types of distance (e.g. cosine similarity) more suitable for unnormalized vectors may return better results.
** More suitable face alignment is promising for better recognition accuracy. OpenFace github has [https://github.com/cmusatyalab/openface/blob/master/openface/align_dlib.py python script] on how faces should be aligned for the best accuracy achieved by output of the neural network
* Speed:
** file containing the data to compute face landmarks (useful for aligning face before recognition) is loaded every time a face is recognized. This can be easily eliminated by loading and storing the data on memory. In addition, detection is conducted twice on face, one by OpenCV Haar Cascade face detection and other internally by Dlib. All of them are unnecessary and take a lot of time to finish.
* Modularity:
** different NN models require different kinds of preprocessing for input images (e.g. appropriate face alignment in case of OpenFace). Therefore, an abstraction needs to be implemented to allow possible future use of other NN models.
<br>


===== June 11 to June 23 (Week 3 - 4) - 90% accuracy and 100x speed up on face recognition =====
<br>
'''DONE'''
* Speed up to 80ms of processing time for each face, which is 10x faster than the last implementation and 100x faster than current implementation.
* Accuracy improvement to 90% for 20% pretagged faces.
* Tests (with python) better face detection improve accuracy up to 96% for 20% pretagged faces.
(''All tests were conducted on orl database'')
<br>
<br>
===== June 11 to June 23 (Week 3 - 4) - Face Recognition achieves above 90% and 100x speed up =====
'''TODO'''
* Abstraction layer for other NN models to be implemented.
* Implement new face detection.
* Optimize face prediction and investigate face clustering.
<br>
<br>


For the last 2 weeks, I have well completed my plan:
Following the analysis result reported for the last 2 weeks, I modified the implementation in a way that all data files needed for recognition are loaded at the launching time (in constructors), which saves a lot of time. It shows an increase of 10x in speed, since it turns out that loading data takes 90% of processing time.
* By having models loaded only once at the beginning and keeping only Haar Cascade face detection, average time needed is now reduced to around 80ms per image, which is 10x faster than last implementation and 100x faster than current implementation.
 
* Using cosine similarity instead of euclidean distance has proved an increase in recognition accuracy to 90% for 20% pre-tagged images, which opens a new direction for me.
 
* I also tested a new approach for face detection with OpenCV DNN and [https://github.com/weiliu89/caffe/tree/ssd#models SSD neural network model]. The script was on python, but has also proved an accuracy of above 96% for 20% pre-tagged images.
For accuracy improvement, cosine similarity was implemented. It's obvious that those 128-D vectors are not normalized, so cosine distance works better than euclidean distance. Besides, I wrote python scripts to test new face detection approach with [https://github.com/weiliu89/caffe/tree/ssd#models SSD-MobileNet] NN model and OpenCV DNN. It shown a possible accuracy of above 96% for 20% pretagged images, which is extremely promising.




More details can be found in my [ blog post].
Since new implementation of face recognition has achieved some promising results, I intend to concentrate on optimization for the next coding phases:
* Codes should be factored and restructured so as to allow possible implementations of different NN models with OpenCV DNN, which will require different preprocessing.
* Face detection with SSD-MobileNet should be implemented into facesengine and tested.
* Face prediction should be optimized. Indeed, the current way to predict a tag for a new face is based on finding the closest face to that faces, and then the tag of that closest face will be assigned to the new face. Intuitively, it's not a good solution. As a result, a way to determine whether a face belongs to a group of faces should be investigated, alongside with face clustering.


Since new implementation of face recognition has achieved some promissing results now, I intend to concentrate on optimization for the next coding phases:
<br>
* '''Modularity''': This should be addressed firstly. I will factor and restructure codes so that it will be easier for changing to new neural network models, which require different preprocessing.
* '''Test and benchmark''': I am thinking of changing to new dataset, such as [http://vis-www.cs.umass.edu/lfw/ LFW], which can provide better possibility to evaluate the accuracy of new implementation. Therefore, test codes need to be rewritten for better modularity.
* '''Face detection optimization''': Implement neural network approach for face detection with OpenCV DNN.
* '''Face prediction optimization''': Investigate OpenCV FLANN and types of distance indice between vectors. All of that are expected to improve the accuracy for face recognition, as well as introduce the possibility of classify unknown faces into groups of similar faces.


=== Coding period : Phase two  (June 24 to July 21) ===
=== Coding period : Phase two  (June 24 to July 21) ===
For this phase, I have finished some of the work on my check list that I stated at the end of the previous phase. Deciding to optimize to maximum face recognition beforehand, I have concentrated on '''Modularity''' and started to implement '''Face detection optimization'''. Indeed, changing to new dataset may help to better evaluate the implementation, I thought that it is more for tuning and having statistical results, which should be let to the end of GSoC.
From the results and analysis of the last phase, I mostly concentrated on factoring codes of face recognition for this phase. This allowed for an abstraction layer where more NN models can be implemented while sharing a common interface. This gives places for flexibility and extension for later development on face recognition. In addition, I also accomplish a working version for face detection with OpenCV DNN and SSD-MobileNet, which outperforms current face detection.  


<br>
<br>
===== June 24 to July 07 (Week 5 - 6) - Face Recognition is modularized for different neural network models =====
===== June 24 to July 07 (Week 5 - 6) - Face Recognition codes factoring for modularity =====
<br>
<br>
'''DONE'''
* Restructuring face recognition codes and isolating codes for preprocessing input of OpenFace NN model.
'''TODO'''
* Implement new face detection with OpenCV DNN and SSD-MobileNet model.


During these 2 weeks, I restructured the codes, so as to facilitate future development when we need to test different models of neural network for face recognition.  
<br>
Simplified UML for face recognition code factoring
[[File:UML_face_recognition_module.jpg]]
<br>
<br>
 
The motivation for restructuring face recognition codes stems from the fact that different NN models need different preprocessing techniques. Therefore, the codes to load NN model with its own preprocessing should be isolated. Hence, restructuring face recognition during this phase will facilitate significantly my work if I want to test different NN models later.


The motivation for this stems from the fact that different models need different preprocessing techniques. Therefore, the codes to load the neural network model with its own preprocessing techniques should be isolated. In addition, this allows the neural network model to be loaded only one time, which improve significantly the performance. Currently, the facesengine only takes from 40 - 60ms to process and recognize a face.


Besides, I delete the old codes using dlib, since after restructuring the codes, there are no places using dlib codes anymore. This is indeed a goal of my GSoC project this year, because this reduces efforts for maintenance, as well as eliminates dlib dependencies, compiler warning and complicated rules for compiler when compiling dlib.
Besides, I delete the old codes of Liu using dlib, since after restructuring the codes, there are no places using dlib codes anymore. This is indeed a goal of my GSoC project this year, because this reduces efforts for maintenance, as well as eliminates dlib dependencies, compiler warning and complicated rules for compiler when compiling dlib.


Discussing with my mentors and others digiKam contributors about face recognition, we were all agree that face detection is one of the key factor to improve face recognition in dk. Indeed, current face detection with OpenCV Haar cannot detect face in photos with complicated light condition, shadow or non-frontal faces. When faces cannot be detected, they cannot be recognized. Moreover, the bounding box created by OpenCV Haar are to "small" (i.e. it losts some details on the face). Consequently, this decreases the robust of face recognition.


So for next week, my plan is:
Discussing with my mentors and others digiKam contributors about face recognition, we were all agree that face detection is one of the key factors to improve face recognition in dk. Hence the next step of my work during this phase should tackle face detection.
* Study [https://arxiv.org/abs/1512.02325 SSD] (Single Shot Multibox Detector) with [https://arxiv.org/abs/1704.04861  MobileNet] model for embedded application
* Implement with OpenCV DNN module


<br>
<br>
===== July 08 to July 21 (Week 7 - 8) - Face Detection achieves 100% accuracy on orl test set =====
===== July 08 to July 21 (Week 7 - 8) - Face Detection achieves 100% accuracy on orl test set =====
<br>
<br>
'''DONE'''
* New face detection implementation with OpenCV DNN and SSD-MobileNet pretrained model.
'''TODO'''
* Improve face detection for not square (or not near-square) images.
* Improve face recognition with distance measure to face groups and implement face clustering.
<br>
At the moment of publishing, [https://arxiv.org/abs/1512.02325 SSD] (Single Shot Multibox Detector) is the state-of-the-art (SOTA) algorithm for single shot face detection, while achieved comparable results with [https://arxiv.org/abs/1504.08083 Fast R-CNN] (current SOTA algorithm) and a real-time performance at over 30 fps on CPU. SSD-MobileNet for OpenCV DNN is a pretrained model based on SSD and [https://arxiv.org/abs/1704.04861  MobileNet] architecture and can be found in the corresponding [https://github.com/opencv/opencv/tree/master/samples/dnn/face_detector github folder] of OpenCV. The customized pretrained model is lightweight and specially fits for OpenCV DNN.
New face detection with OpenCV DNN and SSD-MobileNet got a very good result. While running a little bit slower than OpenCV Haar face detector, it maintains real-time detection at around 30 fps running on CPU. On orl test set, OpenCV DNN face detector outperforms OpenCV Haar with 100% faces detected in comparing with around 90% in case of OpenCV Haar. It can also detect face in photos with complicated light condition, shadow or non-frontal faces. The bounding boxes generated are also better, since they are at various shapes and exactly fit to detected faces. A comparison of OpenCV DNN with OpenCV Haar can be found below:
<br>
[[File:Dnn haar compare.png|center]]
''Credit'': Vikas Gupta ([https://www.learnopencv.com/face-detection-opencv-dlib-and-deep-learning-c-python/ Face Detection – OpenCV, Dlib and Deep Learning ( C++ / Python )])
<br>
<br>
Despite having a very good performance and accuracy, OpenCV DNN face detector seems to work not well with rectangle photos, where the width / height ratio is far from 1.0 (i.e. square). Actually, faces are not detected or detected at the wrong places. Therefore, OpenCV DNN implementation must be studied more.
Besides, for the last phase of GSoC, I will concentrate my work on face recognition improvement and face clustering.
<br>
=== Coding period: Phase three (July 22 to August 26) ===
For this last phase of GSoC, I dedicated my work on optimizing the face detection and face recognition, while finishing my last TODOs on face clustering.
<br>
==== July 22 to August 11 (Week 9-11) - Face detection improvements, YOLOv3, k-means clustering ====
<br>
'''DONE'''
* Face detection improvements on non-square images.
* Study and implement new face detection based on OpenCV DNN with YOLOv3 pretrained model.
* Implement k-means clustering for faces.
'''TODO'''
* Try other algorithms for face clustering.
* Turn back to face recognition optimization.
<br>
Basing on my observation (i.e. face detection did not work well on rectangle image with w:h ratio much smaller or much bigger than 1.0), I found a way to improve significantly the accuracy of face detection. The idea is to resize the image while keeping its aspect ratio (in a way such that the resizing proportion is as least as possible), then pad the image so that it becomes square and reaches the required dimension. Since faces in the image are only bigger but not deformed, they are easily detected.
In addition, I also discovered that SSD-MobileNet-based face detection did not work well with photos having many faces, low-resolution photos, or landscape photos with people that are too small in comparison to other objects (e.g. river, mountain, trees, etc.). All of those cases indicate that SSD-MobileNet cannot detect faces when the proportion of face_size/photo_size is too small. Hence, I studied another NN model, which is [https://pjreddie.com/media/files/papers/YOLOv3.pdf YOLOv3]. When implementing face detection with that model, I achieved outperforming result comparing with SSD-MobileNet.
The image below shows a comparison of face detection with SSD-MobileNet and with YOLOv3
[[File:face_detection_comparison.jpg|center]]


For these 2 weeks, I have studied and implemented SSD neural network model.  
Even though it runs 10 times slower (i.e. 400 - 800ms for each image), it detects faces much more accurately. From my point of view, bounding boxes detected for faces (by face detection) take a significant amount of time if users want to modify them. Therefore, an accurate face detection algorithm should be preferred over a fast one. That's why I set YOLOv3 as default NN model to use for face detection over SSD-MobileNet.


SSD is one of the neural network models tested with OpenCV DNN models. Therefore, in the github of OpenCV there is a [https://github.com/opencv/opencv/tree/master/samples/dnn/face_detector folder] supporting face detection with SSD. The model can be downloaded with scripts from this folder. There are also examples on how to implement SSD with OpenCV DNN module.


Implementing SSD, I got a very good result on orl test set, since all faces are detected in comparing with around 90% in case of OpenCV Haar. It also increases the accuracy of face recognition to 96% for 20% of pretagged images. I have not measured the latency for face detection with OpenCV DNN yet, but it seems comparable with OpenCV Haar.  
Back to face recognition, I attempted to implement face clustering. The ideas is to cluster unknown faces into groups, helping user to tag faces more easily. However, k-means implementation requires the number of clusters in advance, which is absurd in our case as we needs to cluster first to know how many clusters there are. As a result, better clustering algorithms should be studied later, as well as optimization on face recognition.


However, when testing on photos downloaded from the internet, the implementation did not work well. Actually, faces are not detected or detected at the wrong places. Therefore, SSD implementation with OpenCV DNN must be studied more.
==== August 12 to August 18 (Week 12) - Face recognition optimization and DBSCAN face clustering ====
<br>
'''DONE'''
* Face clustering with DBSCAN implemented.
* Distance measurement between a face and a group of faces implemented.
'''TODO'''
* Face recognition UI clean up.
* Documentation and final report
<br>


For the final coding phase of GSoC this year, I intend to work on:
Face recognition optimization on distance measurement between a face and a group of faces has been abandoned for more than a month. Focusing on that for this week, I finally implemented the distance measurement as the average of distance to all the faces in the group. It is expected that face prediction will be now more robust.
* '''Face detection optimization''': Improve SSD implementation with other photos.
* '''Face prediction optimization''': Investigate OpenCV FLANN and types of distance indice between vectors. In addition, unknown faces must be classified into groups of similar faces.
* '''Test and benchmark''': Changing to [http://vis-www.cs.umass.edu/lfw/ LFW] dataset. Test codes need to be rewritten for better modularity.


== Blog Posts ==


== Contacts ==
[https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf Density-Based Spatial Clustering of Applications with Noise] (DBSCAN) was implemented for face clustering. It was selected since it does not require the number of clusters like k-means. However, an eps value (determining how we can consider 2 faces as being similar) and minPts (for minimum faces per cluster) should be given as parameter of the algorithm. The algorithm works quite well, but extremely sensitive to eps value. Indeed, human faces are quite similar one to another, so the "noise" is strong, and DBSCAN is not quite good in that situation.
'''Email''': dinhthanhtrung1996@gmail.com
 
 
For the week of work submission, things left to do are cleaning the codes and face recognition UI to replace old face recognition based on LPBH algorithm, documenting the codes and finishing the final report.
 
<br>


'''Github''': TrungDinhT
=== Future work ===
There are still many things to improve on digiKam face recognition performance and the accuracy of face clustering, as well as on the code base side. From my point of view, those improvements should be:
* The 128-D face embedding vector of each face should be computed only once, and when modified. This will save user a lot time waiting for the vector to be recomputed each time face recognition performed.
* New face clustering algorithms implemented. A candidate can be [https://ieeexplore.ieee.org/document/8359265 Density-Based Multiscale Analysis for Clustering in Strong Noise Settings With Varying Densities] (DBMAC/DBMAC-II).
* Dead codes and unused codes of old face recognition algorithms (e.g. LPBH, FisherFace, EigenFace) should be cleaned up.

Latest revision as of 07:46, 4 September 2019

digiKam AI Face Recognition with OpenCV DNN module

digiKam is KDE desktop application for photos management. For a long time, digiKam team has put a lot of efforts to develop face engine, a feature allowing to scan user photos and suggest face tags automatically basing on pre-tagged faces by users. However, that functionality is currently deactivated in digiKam, as it is slow while not adequately accurate. Thus, this project aims to improve the performance and accuracy of facial recognition in digiKam by exploiting state-of-the-art neural network models in AI and machine learning, combining with highly-optimized OpenCV DNN module.

The project includes 2 main parts:

  • Improve face recognition: implementation with OpenCV DNN module
    • reduce processing time while keeping high accuracy
    • classify unknown faces into classes of similar faces
  • Improve face detection: implementation to be investigated
    • detect faces across various scales (e.g. big, small, etc.), with occlusion (e.g. sunglasses, scarf, mask etc.), with different orientations (e.g. up, down, left, right, side-face etc.)

Mentors : Maik Qualmann, Gilles Caulier, Stefan Müller, Marc Palaus



Important Links

Proposal

Project Proposal

Git dev branch

gsoc19-face-recognition

Important commits


Contacts

Email: [email protected]

Github: TrungDinhT

LinkedIn: https://www.linkedin.com/in/thanhtrungdinh/


Work report

Bonding period (May 6 to May 27)

Generally, I familiarized myself with current Deep Learning (DL) based approach for face recognition in digiKam and picked up the work of Yingjie Liu (digiKam GSoC 2017). His proposal, blog posts and status report led me to FaceNet paper and OpenFace - an opensource implementation of neural network (NN) model inspired by FaceNet paper. They seemed very potential to my work. In addition, Liu also implemented unit tests on his DL implementation and their results (benchmark and accuracy), which can be used as a reference for my work later.

For the rest of the bonding period, I focused on reading carefully FaceNet paper and also looking for other promising NN models to implement when coding period begins. Besides, I also prepared unit tests for face recognition, basing on current test programs and Liu's work.

My plan for next 2 weeks of coding period is to:

  • Finish NN model selection
  • Finish unit tests
  • Start to implement chosen NN model with OpenCV DNN


Coding period : Phase one (May 28 to June 23)

For this phase, my work mostly concentrated on building the "first and dirty" but working prototype of face recognition with OpenCV DNN. In addition, throughout that very first draft, points that need to be improved were revealed, so as to build a better and faster face recognition module.

May 28 to June 11 (Week 1 - 2) - Face Recognition got an 8x speed up


DONE

  • NN model selection -> OpenFace pretrained model.
  • First draft working implementation of face recognition with OpenCV DNN and OpenFace pretrained model.
  • Unit test for evaluation and benchmarking new implementation (time and accuracy).
  • Tested and compared new implementation with Liu's work basing on Dlib, for performance and accuracy.

TODO

  • Improve performance (i.e. speed) and accuracy of new implementation.
  • Investigate the effect of better face detection on face recognition accuracy.


OpenFace model is chosen since it is appropriate for application like digiKam. OpenFace is a pretrained based on FaceNet paper, taking a cropped, aligned face as input and resulting a 128-D vector representing that face. The vector can be used later to compute similarity to pretagged faces, or clustered into groups of similar faces. For photo management software as digiKam, user has an "open" face database, where an unlimited number of faces and people can be added. Therefore, the face recognition should be flexible enough for extension, while having a decent accuracy. However, false positives are not so critical, since they can be always corrected by user. From all of the reason above, OpenFace is a good choice.


Liu's implementation is based on a version of OpenFace pretrained model, customized for compatibility with Dlib. It achieves an astonishing accuracy on orl database with above 98% of accuracy for tag prediction on faces with only 20% of pretagged images. However, its drawback is the performance. Face recognition on 112x92 images took on average 8s for a face. In case that the image is bigger, performance is worse.


My attempt on a new implementation with OpenCV DNN has shown great potential. While not being able to reach the accuracy of Liu's work, it runs much faster. For my first draft, the processing time for each image is only about 1.3s, while an accuracy of above 80% for 20% of pretagged images was achieved.


Indeed, talking about the potential of the implementation with OpenCV DNN, I was able to identify the points whose solution is going to improve:

  • Accuracy:
    • It was on prediction phase when euclidean distance between 128-D vectors was used as a metric to evaluate whether a face is "similar" to another face or not. However, other types of distance (e.g. cosine similarity) more suitable for unnormalized vectors may return better results.
    • More suitable face alignment is promising for better recognition accuracy. OpenFace github has python script on how faces should be aligned for the best accuracy achieved by output of the neural network
  • Speed:
    • file containing the data to compute face landmarks (useful for aligning face before recognition) is loaded every time a face is recognized. This can be easily eliminated by loading and storing the data on memory. In addition, detection is conducted twice on face, one by OpenCV Haar Cascade face detection and other internally by Dlib. All of them are unnecessary and take a lot of time to finish.
  • Modularity:
    • different NN models require different kinds of preprocessing for input images (e.g. appropriate face alignment in case of OpenFace). Therefore, an abstraction needs to be implemented to allow possible future use of other NN models.


June 11 to June 23 (Week 3 - 4) - 90% accuracy and 100x speed up on face recognition


DONE

  • Speed up to 80ms of processing time for each face, which is 10x faster than the last implementation and 100x faster than current implementation.
  • Accuracy improvement to 90% for 20% pretagged faces.
  • Tests (with python) better face detection improve accuracy up to 96% for 20% pretagged faces.

(All tests were conducted on orl database)
TODO

  • Abstraction layer for other NN models to be implemented.
  • Implement new face detection.
  • Optimize face prediction and investigate face clustering.


Following the analysis result reported for the last 2 weeks, I modified the implementation in a way that all data files needed for recognition are loaded at the launching time (in constructors), which saves a lot of time. It shows an increase of 10x in speed, since it turns out that loading data takes 90% of processing time.


For accuracy improvement, cosine similarity was implemented. It's obvious that those 128-D vectors are not normalized, so cosine distance works better than euclidean distance. Besides, I wrote python scripts to test new face detection approach with SSD-MobileNet NN model and OpenCV DNN. It shown a possible accuracy of above 96% for 20% pretagged images, which is extremely promising.


Since new implementation of face recognition has achieved some promising results, I intend to concentrate on optimization for the next coding phases:

  • Codes should be factored and restructured so as to allow possible implementations of different NN models with OpenCV DNN, which will require different preprocessing.
  • Face detection with SSD-MobileNet should be implemented into facesengine and tested.
  • Face prediction should be optimized. Indeed, the current way to predict a tag for a new face is based on finding the closest face to that faces, and then the tag of that closest face will be assigned to the new face. Intuitively, it's not a good solution. As a result, a way to determine whether a face belongs to a group of faces should be investigated, alongside with face clustering.


Coding period : Phase two (June 24 to July 21)

From the results and analysis of the last phase, I mostly concentrated on factoring codes of face recognition for this phase. This allowed for an abstraction layer where more NN models can be implemented while sharing a common interface. This gives places for flexibility and extension for later development on face recognition. In addition, I also accomplish a working version for face detection with OpenCV DNN and SSD-MobileNet, which outperforms current face detection.


June 24 to July 07 (Week 5 - 6) - Face Recognition codes factoring for modularity


DONE

  • Restructuring face recognition codes and isolating codes for preprocessing input of OpenFace NN model.

TODO

  • Implement new face detection with OpenCV DNN and SSD-MobileNet model.


Simplified UML for face recognition code factoring

The motivation for restructuring face recognition codes stems from the fact that different NN models need different preprocessing techniques. Therefore, the codes to load NN model with its own preprocessing should be isolated. Hence, restructuring face recognition during this phase will facilitate significantly my work if I want to test different NN models later.


Besides, I delete the old codes of Liu using dlib, since after restructuring the codes, there are no places using dlib codes anymore. This is indeed a goal of my GSoC project this year, because this reduces efforts for maintenance, as well as eliminates dlib dependencies, compiler warning and complicated rules for compiler when compiling dlib.


Discussing with my mentors and others digiKam contributors about face recognition, we were all agree that face detection is one of the key factors to improve face recognition in dk. Hence the next step of my work during this phase should tackle face detection.


July 08 to July 21 (Week 7 - 8) - Face Detection achieves 100% accuracy on orl test set


DONE

  • New face detection implementation with OpenCV DNN and SSD-MobileNet pretrained model.

TODO

  • Improve face detection for not square (or not near-square) images.
  • Improve face recognition with distance measure to face groups and implement face clustering.


At the moment of publishing, SSD (Single Shot Multibox Detector) is the state-of-the-art (SOTA) algorithm for single shot face detection, while achieved comparable results with Fast R-CNN (current SOTA algorithm) and a real-time performance at over 30 fps on CPU. SSD-MobileNet for OpenCV DNN is a pretrained model based on SSD and MobileNet architecture and can be found in the corresponding github folder of OpenCV. The customized pretrained model is lightweight and specially fits for OpenCV DNN.


New face detection with OpenCV DNN and SSD-MobileNet got a very good result. While running a little bit slower than OpenCV Haar face detector, it maintains real-time detection at around 30 fps running on CPU. On orl test set, OpenCV DNN face detector outperforms OpenCV Haar with 100% faces detected in comparing with around 90% in case of OpenCV Haar. It can also detect face in photos with complicated light condition, shadow or non-frontal faces. The bounding boxes generated are also better, since they are at various shapes and exactly fit to detected faces. A comparison of OpenCV DNN with OpenCV Haar can be found below:


Credit: Vikas Gupta (Face Detection – OpenCV, Dlib and Deep Learning ( C++ / Python ))

Despite having a very good performance and accuracy, OpenCV DNN face detector seems to work not well with rectangle photos, where the width / height ratio is far from 1.0 (i.e. square). Actually, faces are not detected or detected at the wrong places. Therefore, OpenCV DNN implementation must be studied more.


Besides, for the last phase of GSoC, I will concentrate my work on face recognition improvement and face clustering.


Coding period: Phase three (July 22 to August 26)

For this last phase of GSoC, I dedicated my work on optimizing the face detection and face recognition, while finishing my last TODOs on face clustering.

July 22 to August 11 (Week 9-11) - Face detection improvements, YOLOv3, k-means clustering


DONE

  • Face detection improvements on non-square images.
  • Study and implement new face detection based on OpenCV DNN with YOLOv3 pretrained model.
  • Implement k-means clustering for faces.

TODO

  • Try other algorithms for face clustering.
  • Turn back to face recognition optimization.


Basing on my observation (i.e. face detection did not work well on rectangle image with w:h ratio much smaller or much bigger than 1.0), I found a way to improve significantly the accuracy of face detection. The idea is to resize the image while keeping its aspect ratio (in a way such that the resizing proportion is as least as possible), then pad the image so that it becomes square and reaches the required dimension. Since faces in the image are only bigger but not deformed, they are easily detected.


In addition, I also discovered that SSD-MobileNet-based face detection did not work well with photos having many faces, low-resolution photos, or landscape photos with people that are too small in comparison to other objects (e.g. river, mountain, trees, etc.). All of those cases indicate that SSD-MobileNet cannot detect faces when the proportion of face_size/photo_size is too small. Hence, I studied another NN model, which is YOLOv3. When implementing face detection with that model, I achieved outperforming result comparing with SSD-MobileNet. The image below shows a comparison of face detection with SSD-MobileNet and with YOLOv3



Even though it runs 10 times slower (i.e. 400 - 800ms for each image), it detects faces much more accurately. From my point of view, bounding boxes detected for faces (by face detection) take a significant amount of time if users want to modify them. Therefore, an accurate face detection algorithm should be preferred over a fast one. That's why I set YOLOv3 as default NN model to use for face detection over SSD-MobileNet.


Back to face recognition, I attempted to implement face clustering. The ideas is to cluster unknown faces into groups, helping user to tag faces more easily. However, k-means implementation requires the number of clusters in advance, which is absurd in our case as we needs to cluster first to know how many clusters there are. As a result, better clustering algorithms should be studied later, as well as optimization on face recognition.

August 12 to August 18 (Week 12) - Face recognition optimization and DBSCAN face clustering


DONE

  • Face clustering with DBSCAN implemented.
  • Distance measurement between a face and a group of faces implemented.

TODO

  • Face recognition UI clean up.
  • Documentation and final report


Face recognition optimization on distance measurement between a face and a group of faces has been abandoned for more than a month. Focusing on that for this week, I finally implemented the distance measurement as the average of distance to all the faces in the group. It is expected that face prediction will be now more robust.


Density-Based Spatial Clustering of Applications with Noise (DBSCAN) was implemented for face clustering. It was selected since it does not require the number of clusters like k-means. However, an eps value (determining how we can consider 2 faces as being similar) and minPts (for minimum faces per cluster) should be given as parameter of the algorithm. The algorithm works quite well, but extremely sensitive to eps value. Indeed, human faces are quite similar one to another, so the "noise" is strong, and DBSCAN is not quite good in that situation.


For the week of work submission, things left to do are cleaning the codes and face recognition UI to replace old face recognition based on LPBH algorithm, documenting the codes and finishing the final report.


Future work

There are still many things to improve on digiKam face recognition performance and the accuracy of face clustering, as well as on the code base side. From my point of view, those improvements should be:

  • The 128-D face embedding vector of each face should be computed only once, and when modified. This will save user a lot time waiting for the vector to be recomputed each time face recognition performed.
  • New face clustering algorithms implemented. A candidate can be Density-Based Multiscale Analysis for Clustering in Strong Noise Settings With Varying Densities (DBMAC/DBMAC-II).
  • Dead codes and unused codes of old face recognition algorithms (e.g. LPBH, FisherFace, EigenFace) should be cleaned up.