GSoC/2019/StatusReports/ThanhTrungDinh
digiKam AI Face Recognition with OpenCV DNN module
digiKam is KDE desktop application for photos management. For a long time, digiKam team has put a lot of efforts to develop face engine, a feature allowing to scan user photos and suggest face tags automatically basing on pre-tagged faces by users. However, that functionality is currently deactivated in digiKam, as it is slow while not adequately accurate. Thus, this project aims to improve the performance and accuracy of facial recognition in digiKam by exploiting state-of-the-art neural network models in AI and machine learning, combining with highly-optimized OpenCV DNN module.
The project includes 2 main parts:
- Improve face recognition: implementation with OpenCV DNN module
- reduce processing time while keeping high accuracy
- classify unknown faces into classes of similar faces
- Improve face detection: implementation to be investigated
- detect faces across various scales (e.g. big, small, etc.), with occlusion (e.g. sunglasses, scarf, mask etc.), with different orientations (e.g. up, down, left, right, side-face etc.)
Mentors : Maik Qualmann, Gilles Caulier, Stefan Müller
Important Links
Proposal
Git dev branch
Important commits
- Face Recognition:
- Face Detection:
- Face workflow:
Work report
Bonding period (May 6 to May 27)
Generally, I familiarized myself with current Deep Learning (DL) based approach for face recognition in digiKam and picked up the work of Yingjie Liu (digiKam GSoC 2017). His proposal, blog posts and status report led me to FaceNet paper and OpenFace - an opensource implementation of neural network (NN) model inspired by FaceNet paper. They seemed very potential to my work. In addition, Liu also implemented unit tests on his DL implementation and their results (benchmark and accuracy), which can be used as a reference for my work later.
For the rest of the bonding period, I focused on reading carefully FaceNet paper and also looking for other promising NN models to implement when coding period begins. Besides, I also prepared unit tests for face recognition, basing on current test programs and Liu's work.
My plan for next 2 weeks of coding period is to:
- Finish NN model selection
- Finish unit tests
- Start to implement chosen NN model with OpenCV DNN
Coding period : Phase one (May 28 to June 23)
For this phase, my work mostly concentrated on building the "first and dirty" but working prototype of face recognition with OpenCV DNN. In addition, throughout that very first draft, points that need to be improved were revealed, so as to build a better and faster face recognition module.
May 28 to June 11 (Week 1 - 2) - Face Recognition got an 8x speed up
DONE
- NN model selection -> OpenFace pretrained model.
- First draft working implementation of face recognition with OpenCV DNN and OpenFace pretrained model.
- Unit test for evaluation and benchmarking new implementation (time and accuracy).
- Tested and compared new implementation with Liu's work basing on Dlib, for performance and accuracy.
TODO
- Improve performance (i.e. speed) and accuracy of new implementation.
- Investigate the effect of better face detection on face recognition accuracy.
OpenFace model is chosen since it is appropriate for application like digiKam. OpenFace is a pretrained based on FaceNet paper, taking a cropped, aligned face as input and resulting a 128-D vector representing that face. The vector can be used later to compute similarity to pretagged faces, or clustered into groups of similar faces. For photo management software as digiKam, user has an "open" face database, where an unlimited number of faces and people can be added. Therefore, the face recognition should be flexible enough for extension, while having a decent accuracy. However, false positives are not so critical, since they can be always corrected by user. From all of the reason above, OpenFace is a good choice.
Liu's implementation is based on a version of OpenFace pretrained model, customized for compatibility with Dlib. It achieves an astonishing accuracy on orl database with above 98% of accuracy for tag prediction on faces with only 20% of pretagged images. However, its drawback is the performance. Face recognition on 112x92 images took on average 8s for a face. In case that the image is bigger, performance is worse.
My attempt on a new implementation with OpenCV DNN has shown great potential. While not being able to reach the accuracy of Liu's work, it runs much faster. For my first draft, the processing time for each image is only about 1.3s, while an accuracy of above 80% for 20% of pretagged images was achieved.
Indeed, talking about the potential of the implementation with OpenCV DNN, I was able to identify the points whose solution is going to improve:
- Accuracy:
- It was on prediction phase when euclidean distance between 128-D vectors was used as a metric to evaluate whether a face is "similar" to another face or not. However, other types of distance (e.g. cosine similarity) more suitable for unnormalized vectors may return better results.
- More suitable face alignment is promising for better recognition accuracy. OpenFace github has python script on how faces should be aligned for the best accuracy achieved by output of the neural network
- Speed:
- file containing the data to compute face landmarks (useful for aligning face before recognition) is loaded every time a face is recognized. This can be easily eliminated by loading and storing the data on memory. In addition, detection is conducted twice on face, one by OpenCV Haar Cascade face detection and other internally by Dlib. All of them are unnecessary and take a lot of time to finish.
- Modularity:
- different NN models require different kinds of preprocessing for input images (e.g. appropriate face alignment in case of OpenFace). Therefore, an abstraction needs to be implemented to allow possible future use of other NN models.
June 11 to June 23 (Week 3 - 4) - 90% accuracy and 100x speed up on face recognition
DONE
- Speed up to 80ms of processing time for each face, which is 10x faster than the last implementation and 100x faster than current implementation.
- Accuracy improvement to 90% for 20% pretagged faces.
- Tests (with python) better face detection improve accuracy up to 96% for 20% pretagged faces.
(All tests were conducted on orl database)
TODO
- Abstraction layer for other NN models to be implemented
- Implement new face detection
- Optimize face prediction and investigate face clustering
Following the analysis result reported for the last 2 weeks, I modified the implementation in a way that all data files needed for recognition are loaded at the launching time (in contructors), which saves a lot of time. It shows an increase of 10x in speed, since it turns out that loading data takes 90% of processing time.
For accuracy improvement, cosine similarity was implemented. It's obvious that those 128-D vectors are not normalized, so cosine distance works better than euclidean distance. Besides, I wrote python scripts to test new face detection approach with SSD-MobileNet NN model and OpenCV DNN. It shown a possible accuracy of above 96% for 20% pretagged images, which is extremely promising.
Since new implementation of face recognition has achieved some promissing results, I intend to concentrate on optimization for the next coding phases:
- Codes should be factored and restructureed so as to allow possible implementations of different NN models with OpenCV DNN, which will require different preprocessing.
- Face detection with SSD-MobileNet should be implemented into facesengine and tested
- Face prediction should be optimized. Indeed, the current way to predict a tag for a new face is based on finding the closest face to that faces, and then the tag of that closest face will be assigned to the new face. Intuitively, it's not a good solution. As a result, a way to determine whether a face belongs to a group of faces should be investigated, alongside with face clustering.
Coding period : Phase two (June 24 to July 21)
From the results and analysis of the last phase, I mostly concentrated on factoring codes of face recognition for this phase. This allowed for an abstraction layer where more NN models can be implemented while sharing a common interface. This gives places for flexibility and extension for later development on face recognition. In addition, I also accomplish a working version for face detection with OpenCV DNN and SSD-MobileNet, which outperforms current face detection.
June 24 to July 07 (Week 5 - 6) - Face Recognition codes factoring for modularity
DONE
- Restructuring face recognition codes and isolating codes for preprocessing input of OpenFace NN model
TODO
- Implement new face detection with OpenCV DNN and SSD-MobileNet model
Simplified UML for face recognition code factoring
The motivation for restructuring face recognition codes stems from the fact that different NN models need different preprocessing techniques. Therefore, the codes to load NN model with its own preprocessing should be isolated. Hence, restructuring face recognition during this phase will facilitate significantly my work if I want to test different NN models later.
Besides, I delete the old codes of Liu using dlib, since after restructuring the codes, there are no places using dlib codes anymore. This is indeed a goal of my GSoC project this year, because this reduces efforts for maintenance, as well as eliminates dlib dependencies, compiler warning and complicated rules for compiler when compiling dlib.
Discussing with my mentors and others digiKam contributors about face recognition, we were all agree that face detection is one of the key factor to improve face recognition in dk. Hence the next step of my work during this phase should tackle face detection.
July 08 to July 21 (Week 7 - 8) - Face Detection achieves 100% accuracy on orl test set
DONE
- New face detection implementation with OpenCV DNN and SSD-MobileNet pretrained model
TODO
- Improve face detection for not square (or not near-square) images
- Improve face recognition with distance measure to face groups and implement face clustering
At the moment of publishment, SSD (Single Shot Multibox Detector) is the state-of-the-art (SOTA) algorithm for single shot face detection, while achieved comparable results with Fast R-CNN (current SOTA algorithm) and a real-time performance at over 30 fps on CPU. SSD-MobileNet for OpenCV DNN is a pretrained model based on SSD and MobileNet architecture and can be found in the corresponding github folder of OpenCV. The customized pretrained model is lightweight and specially fits for OpenCV DNN.
New face detection with OpenCV DNN and SSD-MobileNet got a very good result. While running a little bit slower than OpenCV Haar face detector, it maintains real-time detection at around 30 fps running on CPU. On orl test set, OpenCV DNN face detector outperforms OpenCV Haar with 100% faces detected in comparing with around 90% in case of OpenCV Haar. It can also detect face in photos with complicated light condition, shadow or non-frontal faces. The bounding boxes generated are also better, since they are at various shapes and exactly fit to detected faces. A comparison of OpenCV DNN with OpenCV Haar can be found below:
Credit: Vikas Gupta (Face Detection – OpenCV, Dlib and Deep Learning ( C++ / Python ))
Despite having a very good performance and accuracy, OpenCV DNN face detector seems to work not well with rectangle photos, where the width / height ratio is far from 1.0 (i.e. square). Actually, faces are not detected or detected at the wrong places. Therefore, OpenCV DNN implementation must be studied more.
Besides, for the last phase of GSoC, I will reconcentrate my work on face recognition improvement and face clustering.
Contacts
Email: [email protected]
Github: TrungDinhT