Mudit Jain GSOC 2015 Application: Face Detection

GSOC 2015 Application:

Personal Information:

Background:

I’m currently a 3rd year undergraduate student at Birla Institute of Technology & Science, India. I am highly interested in the field of Computer Vision and Image Processing. I have attached my resume for your kind consideration.

https://www.dropbox.com/home?preview=Mudit_resume.pdf

I am proficient with Python and C. I have worked on a good number of projects in the two languages. Some of the projects that I would like to mention are as follows :

During my internship at Srujana Innovation Center, I have implemented principal component analysis to generate and sort the Eigen Images of a given data set. Implementation was done in python using numpy and scipy libraries for matrix computations.

https://github.com/spark1729/srujana_ml/blob/master/Eigeneyes.py

2 .I have implemented the 1553 Military grade communication protocol during my internship at the Defence Research and Development Organisation,India.

This semester I'm working in collaboration with a graduate student at MIT Media Labs for the generation of a reconstructed surface of the eye. Further details are given in my resume.

I have been using git for quite some time and am comfortable with it. My interest in image processing naturally drew me towards this library. Initially, it was for implementation of various algorithms in specific applications for my project but now I am keen on contributing to the library for the long run.

Ever since I was introduced to the world of open source programming by a friend, I haven’t looked back. Not only is it a good way to collaborate and share but it also improves understanding of various topics, on a personal level. I have been using scikit-image for the implementation of various techniques for my projects and am keen to contribute to the library.

In addition to the above given topics, my interests lie in the fields of psychology, reading and watching TV series

Commitment :

This is to inform you that I have no other commitments during the summer and would be excited to contribute to the scikit-image library, given the opportunity. The idea for Face Detection is extremely interesting and I would like to work on the same as a GSOC'15 project under the scikit-image mentoring organisation.

Project Proposal Information

Proposal Title:

Face detection

Proposal Abstract:

Face Detection using haar features in accordance with the following paper:

Viola Jones Paper

The framework will take an image as an input and detect the faces present in it using the concepts of haar features, integral image and cascade classifiers. The basic idea is to design a framework for detection of faces and generate coefficients for already existing cascaded classifiers by giving it a dataset of registered images. Once implemented for faces, we can then implement the same for detecting various objects.

Proposal Detailed Description/Timeline (*):

Reference: https://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/viola-cvpr-01.pdf I have thoroughly read the paper and I have given a brief explanation of the

Basic Idea:

There are four parts for the given algorithm

Haar features
Integral Image
Adaboost
Cascaded Classifiers

Haar Features:

These features are employed by the detection framework universally involve the sums of image pixels within rectangular areas. The image describes the classifiers used in the Voila Jones paper. The sum of pixels in the white portion is then subtracted from the sum of pixels in the dark portion. I have started to implement the function to calculate the haar features. Developments regarding that can be seen on this given pull request. https://github.com/scikit-image/scikit-image/pull/1444

Integral Image:

To reduce the computation involved in computing haar features, a method has been suggested in the paper. The idea is to use integral images to directly calculate the sum of pixels in the dark and the white regions. The sum of all the pixels in a rectangular area can directly be computed by doing a minimum amount of lookups in the integral image. The computation of the integral image is already done in scikit image. Thus we will be using the “integral_image” function from the skimage.transform

Adaboost:

The face detection framework employs a variant of the learning algorithm AdaBoost to both select the best features and to train classifiers that use them. This algorithm constructs a “strong” classifier as a linear combination of weighted simple “weak” classifiers. A weak classifier is one haar feature that is evaluated inside the 24*24 pixel window. For this project we will be using the cascaded structure of classifiers that have already been found and implemented. However we will be training new threshold values for the cascaded classifiers over a wide dataset of input images.

Link for the xml files: https://github.com/Itseez/opencv/tree/master/data/haarcascades

Cascade Classifier:

The strong classifiers are arranged in a cascade in order of complexity, where each successive classifier is trained only on those selected samples which pass through the preceding classifiers. If at any stage in the cascade a classifier rejects the sub-window under inspection, no further processing is performed and continue on searching the next sub-window (see figure at right). The cascade therefore has the form of a degenerate tree. The implementation of this will use the threshold values of both the weak and strong classifiers from the xml data that we will be providing.

A total of 5000 positive examples should be sufficient to train the individual stages of the staged Classifier, but these examples should be chosen with care. The final detector will only detect faces similar to those in the training set so the training set should represent the most common facial variation.

Implementation & Work Flow:

Calculating Haar Features:

The best place to start off is implementing a utility function that calculates the haar features given the following parameters:

Image indices- x, y
type of haar_feature – One out of the four haar features
Scale factor of the haar feature

I have already started working on the code for computing haar features. My developments can be seen in the link below. https://github.com/spark1729/scikit-image/blob/haar_feature/skimage/transform/haar_feature.py

Haar features are evaluated inside a 24*24 pixel window which travels over the entire image. Generating the XML data for the detection of faces:

For detecting faces we have to read an xml file which will store the cascaded classifiers and threshold values for the strong and the weak classifiers. To not be patent encumbered in any way we will be training the cascaded classifiers over a given dataset to find the classifier thresholds, coefficients of weak classifiers for every stage of the cascaded classifier. We will however be using the already existing set of cascaded classifiers that can be found below. This will thus generate the xml data that is required.

Reading the XML data:

This can be done using the SAX (Simple API for XML) or the DOM (Document Object Model) API in python.

Sliding window function:

Once we have those values we can then implement a function that will have window size and scale attribute as parameters and will be responsible for sliding the window over the entire image.

Cascaded Classifiers:

In the above function, another function will be called that will evaluate haar features in the sub-window. This function will use the cascade classifier data that we derived from the xml file to reject or accept a particular window. As mentioned in the theory, a window will be accepted as containing a face only when it satisfies all the stages

Data set preparation :

For training we will be require data sets of positive(face)/negative images(background). The data set will be finalized upon discussion with the mentors. The positive images have to be cropped and a positive description file has to be made. A program can be written to speed up the process of cropping and also generate the positive description file.Negative images will be collected which won't contain the objects of interest.

Classifier Training :

Work on training the cascaded classifiers over a data set of images and certain constraints like minimum hit rate, maximum false alarm, number of stages to be trained and weight trimming in accordance to the link [4] in references. Generation of strong classifiers is not required and our entire focus will only be on training them.

A performance evaluation of the generated XML Data will be done in comparison to the available XML Data.

Resolution of the detector and the scale factor:

The base resolution of the detector is 2424 pixels, but using a starting scale of e.g. 2 the first detector sub-window will have a size of 4848 pixels. The starting delta is also multiplied with the starting scale and the result equals the step size used to move the detector sub-window through the input image.

Once the first pass is completed both the size of the sub-window and the step size is incremented by a factor equal to scale increment. This procedure is repeated until the size of the detector sub-window exceeds the least dimension of the input image.

Thus you will be able to finally detect all the faces in the image.

Timeline:

Community Bonding( April 27 - May 25 )

This time will be utilized to get more familiar with the Viola Jones Algorithm, thoroughly understanding the structure of the XML files to be generated, discussing with mentors regarding the implementation so that the project is not patent encumbered.

Week 1 ( May 25 to May 31)

Writing a utility function that computes Haar features including the extended Haar features. To compute Haar features, the concept of integral image will be used so as to reduce the computation. "integral_image" function from skimage.transform will be used. Documentation and tests will be written for this function

Week 2 ( June 1 - June 7 )

Implement the initial version of the method which will have an image and a window size as it's inputs, generates the integral image and has a window moving across the entire image, same as a mask in spatial filtering. Documentation and tests will be simultaneously added.

Week 3 & 4( June 8 - June 21 )

Another utility function will be implemented which will read information such as number of stages,maximum number of weak classifiers, leaf values , internal nodes, thresholding values for both strong and weak classifiers from the XML Data and pass it to the Cascade Classifier.Implementation of XML Parsing using either SAX or DOM API will be discussed with the mentors and then executed.

The cascade classifier function will be implemented which will have the data from the XML file and the window location as the input. It will then compute the haar features, compare with the thresholding values and accordingly accept or reject a window.

Week 5 & 6 ( June 22 - July 5 )

The core function for face detection will be designed and implemented, which will link all the functions that have been implemented up till now. It also, will be added with a functionality of scale factor. The final output of this function will be the co-ordinates of the boxes encompassing the faces.

Extensive tests will be written, documentation will be completed.

Week 7 ( July 6th - July 12 )

Data set preparation : For training we will be require data sets of positive(face)/negative images(background). The data set will be finalized upon discussion with the mentors. The positive images have to be cropped and a positive description file has to be made. A program can be written to speed up the process of cropping and also generate the positive description file.Negative images will be collected which won't contain the objects of interest.

Week 8 ( July 13 - July 26 )

Work on training the cascaded classifiers over a data set of images and certain constraints like minimum hit rate, maximum false alarm, number of stages to be trained and weight trimming in accordance to the link [4] in references. Generation of strong classifiers is not required and our entire focus will only be on training them.

A performance evaluation of the generated XML Data will be done in comparison to the available XML Data.

Week 10 ( July 27 - August 2 )

Optimizing the previous code by finding performance bottlenecks with cProfile (with the visualization tool such as SnakeViz) and rewriting the portions in C.

A example for face detection will be added to the to the gallery as mentioned in the contribution page for the library.

Week 11 ( August 3 - August 9 )

This week is a buffer week in-case the deadlines are not met. If everything works out, documentation will be worked upon, tests will be added and further functionality can be discussed.

Week 12 & Buffer ( August 10 - August 24 ) **Pencil's down deadline : **

Review the entire process. Finish the documentation for the entire project. Add tests. Fix potential bugs.

Resources that I have read through for the proposal:

Voila Jones Paper : http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf
Implementation of Voila Jones : http://etd.dtu.dk/thesis/223656/ep08_93.pdf
Wiki : http://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework
Implementation of classifier training : http://note.sonots.com/SciSoftware/haartraining.html
Terms related to classifier training. http://www.computer-vision-software.com/blog/2009/11/faq-opencv-haartraining/
XML Parsing: http://www.tutorialspoint.com/python/python_xml_processing.htm

Link to a patch/code sample, preferably one you have submitted to your sub-org (*):

Presently two of my pull requests are under review : One is regarding the implementation of a median filter for float images.

https://github.com/scikit-image/scikit-image/pull/1422

The other one is regarding calculating haar features given the the location, size and type of the feature.

This is one of the main utility functions required for the Face Detection Algorithm.

https://github.com/scikit-image/scikit-image/pull/1444

Links to additional information

In the TARDIS library:

https://github.com/tardis-sn/tardis/pull/277

https://github.com/wkerzendorf/tardis/pull/11

Classification of Images using Various orders of Entropy :

I have been working alongside my professor on the topic of classification of images using various orders of entropy. I have been using MATLAB for the implementation purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly