Monday, 1 February 2016

Deep learning 06-Classify car and non-car by convolution neural network

   Convolution neural network(cnn), a powerful tools for object recognition tasks in computer vision field, you can find good explanations of this powerful technique on cs231n, it is the best, free tutorial I could found by google.

    Most of the famous cnn libraries(theano, caffe, torch etc) are hard to install on windows platform, the exceptions I found are mxnet and tiny-cnn. mxnet support cpu/gpu mode and distributed training, it is a nice tool for large-scale deep learning(whatever, my laptop do not suit for large-scale training), the draw back(for me) is it do not provide good c++ api yet, instead it provide rich binding api of python. python is a decent tool to create prototype and a nice environment for research purpose, but it is not an good option to create stand alone binary, which could run on the machine without asking the users to install a bunch of tools(anaconda, virtual machine etc). This is why I choose tiny-cnn to train the binary classifier.

    Object classification is a difficult task, there are many variations you need to deal with, like intra-class variation, different view-points, occlusion, background clutter, illumination variation, deformation.

Intra-class variation
Different view point
Background clutter
Variant illumination

    It is hard to solve all of the challenges at once(however, CS231n claim that cnn could solve all of the problems I mentioned above), instead, we make some assumptions on the object we want to classify(To create a successful image classifier, it is very important to make assumption before you write down single line of code). Following are my assumptions(preconditions) of this binary classifier.

Assumption on the classifer

1 : This classifier only able to classify car and non-car
2 : This classifier assume good lighting conditions
3 : This classifier can deal with different viewpoint of cars
4 : This classifier do not rely on color information
5 : This classifier can deal with intra-class variation

    After the assumption has been made, we can start coding. The data set of the cars are come from the stanford AI lab, non cars example are come from caltech101. I randomly pick 6000 cars and 6000 non cars from these data set and do some augmentation to increase the size of the training. I use it to classify 1000 cars and 1000 non cars image(different data from the data set), the best accuracy is 1956/2000(97.8%). Not bad, but still got rooms to improve.

    The codes are located at github. I do not intent to explain the details of the codes(I can understand what am I wrote even after several years), but summarize the key points I learned from this tiny classifier.

Tips of training cnn by tiny-cnn

1 : Shuffle your training set, else the accuracy would always be 50%.
2 : Initial weights have big impact on the training results, you may get bad results several times because the initial weights are bad especially when you are using adagrad to as the optimizer, remember to run the training process again if the accuracy is ridiculous low.
3 : Augment your data, cnn is a resource hungry(include cpu,gpu,ram,samples) machine learning algorithms, try out different augmentation scheme(rotation, horizontal/vertical flip, illumination variation, shifting etc) and find out those help you gain better results.
4 : Try with different optimization algorithms and error functions, for this data set, mse and adagrad work best for me.
5 : Try with different batch size and alpha value(learning rate).
6 : Log your results.
7 : Start from shallow network, deeper network do not equal to better results, especially for small data set.