Monday, 7 August 2017

Performance of perceptual losses for super resolution

    Have you ever scratch your head when upscaling low resolution images? I do, because we all know the quality of the images after upscaling degrade. Thanks to the rise of machine learning in recent years, we are able to upscale single image with better results compare with traditional solutions(ex : bilinear, bicubic. You do not need to know what they are except they are apply widely in many products), we call this technique super resolution.

    This sound great, but how could we do it?I did not know it either until I study the tutorials of part2 of the marvelous Practical Deep learning for Coders, this course is fantastic to get your feet wet on deep learning.

    I will try my best to explain everything with minimal prerequisite knowledge on machine learning and computer vision, however, some knowledge of convolution neural network(cnn) is needed. The course of part1 is excellent if you want to learn cnn in depth. If you are in a hurry, pyimagesearch and medium has a short tutorial about cnn.

What is super resolution and how does it work

Q : What is super resolution

A :  Super resolution is a class of technique to enhance the resolution of images or videos.

Q : There are many softwares could help us upscale images, why do we need super resolution?

A : Traditional solutions of upscaling image apply interpolation algorithm on one image only(ex: bilinear or bicubic). In the contrast, super resolution exploit info from another source, either from contiguous frames, from the model trained by machine learning or different scale from one image.

Q : How does super resolution work

A : Super resolution I want to introduce today is based on Perceptual losses for Real-Time style Transfer and Super-Resolution.(please consult wiki if you want to study another type of super resolution).  The most interesting part of this solution is it treat super resolution as an image transformation problem(it is a process where an input image is transformed into an output image). This mean we may use the same technique to solve colorization, denoising, depth estimation, semantic segmentation and another tasks(It is not a problem if you do not know what they are).

Q : How do we transformed low resolution image to high resolution image?

A : A picture worth a thousand words.

    This network is composed by two components, image transformation network and a loss network. Image transformation network transform low resolution image into high resolution image, while loss network measuring the difference between predicted high resolution image and the true high resolution image

Q : What is the loss network anyway?Why do we use it to measure the loss?

A : Loss network is an image classification network train on imagenet (ex : vgg16, resnet, densenet). We use it to measure the loss because we want our network to better measure perceptual and semantic difference between images. The paper call the loss measure by this loss network perceptual loss.

Q : What makes the loss network able to generate better loss?

A : The loss network can generate better loss because the convolutional neural network trained for image classification have already learned to encode the perceptual and semantic information we want.

Q : The color of the image is different after upscale, how could I fixed it?

A : You could apply histogram matching as the paper mentioned, this should be able to deal with most of the cases.

Q : Any draw back of this algorithm?

A : Of course, nothing is perfect.

1 : Not all of the image work, they may look very ugly after upscale.
2 : The image maybe ice cream to your eyes, but it is not reconstructing the photo exactly but create details based on its training from example images.It is impossible to reconstruct the image with perfect results, because we have no way to retrieve the information did not exist from the beginning.
3 : Color of part of the images change after upscale, even histogram matching cannot fix it.

Q : What is histogram matching?

A : It is a way to make the color distribution of image A looks like image B.


    All of the experiments use same network architecture and train on 80000 images from imagenet, 2 epoch. From left to right are original image, image upscale 4x by bicubic, image upscale by super resolution by 4x.

    The results are not perfect, but this is not the end, super resolution is a hot research topic, every paper is a stepping stone for next algorithm, we will see more and more better, advance techniques pop out in the future.

Sharing trained model and codes

1 : Notebook to transform the imagenet data to training data
2 : Notebook to train and use the super resolution model
3 : Network model with transformation network and loss network, trained on 80000 images

    If you liked this article, please help others find it by clicking the little g+ icon below. Thanks a lot!

Wednesday, 19 July 2017

Neural style by Keras

    Today I want to write down how to implement the neural style of the paper A Neural Algorithm of Artistic Style by Keras learn from course. You can find the codes located at github.

    Before I begin to explain how to do it, I want to mentioned that generate artistic style by deep neural network is different with image classification, we need to learn new concepts and add them into our tool boxes, if you find it hard to understand at the first time you saw it, do not fear, I have the same feeling too. You can ask me the questions or go to fast ai forum.

    The paper present an algorithm to generate artistic style image by combine two image together using convolution neural network. Here are examples combine source images(bird, dog, building) with style images like starry , alice and tes_teach. From left to right is style image, source image, image combined by convolution neural network.

    Let us begin our journey of the implementation of the algorithm(I assume you know how to install Keras, tensorflow, numpy, cuda and other tools, I recommend using ubuntu16.04.x as your os, this could save you tons of headache when setup your deep learning toolbox).

Step 1 : Import file and modules

from PIL import Image

import os

import keras.backend as K
import vgg16_avg

from keras.models import Model
from keras.layers import *
from keras import metrics

from scipy.optimize import fmin_l_bfgs_b
from scipy.misc import imsave

Step 2 : Preprocess our input image

#the value of rn_mean is come from image net data set
rn_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32)

#create image close to zero mean and convert rgb channel to bgr channel 
#since the vgg model need bgr channel. ::-1 invert the order of axis 0
preproc = lambda x: (x - rn_mean)[:,:,:,::-1]
#We need to undo the preprocessing before we save it to our hard disk
deproc = lambda x: x[:,:,:,::-1] + rn_mean

Step 3 : Read the source image and style image

    Source image is the image you want to apply style on it. Style image is the style you want to apply on the source image.

dpath= os.getcwd() + "/"

#I make the size of content image, style image, generated img
#have the same shape, but this is not mandatory
#since we do not use any full connection layer
def read_img(im_name, shp):
    style_img =
    if len(shp) > 0:
        style_img = style_img.resize((shp[2], shp[1]))
    style_arr = np.array(style_img)    
    #The image read by PIL is three dimensions, but the model
    #need a four dimensions tensor(first dim is batch size)
    style_arr = np.expand_dims(style_arr, 0)
    return preproc(style_arr)

content_img_name = "dog"
content_img_arr = read_img(dpath + "img/{}.png".format(content_img_name), [])
content_shp = content_img_arr.shape
style_img_arr = read_img(dpath + "img/starry.png", content_shp)

Step 4 : Load vgg16_avg

    Unlike doing image classification with pure sequential api of Keras, to build a neural style network, we need to use backend api of Keras.

content_base = K.variable(content_img_arr)
style_base = K.variable(style_img_arr)
gen_img = K.placeholder(content_shp)
batch = K.concatenate([content_base, style_base, gen_img], 0)

#Feed the batch into the vgg model, every time we call the model/layer to
#generate output, it will generate output of content_base, style_base,
#gen_img. Unlike content_base and style_base, gen_img is a placeholder,
#that means we will need to provide data to this placeholder later on
model = vgg16_avg.VGG16_Avg(input_tensor = batch, include_top=False)

#build a dict of model layers
outputs = { for l in model.layers}
#I prefer these 1~3 layers hierarchy as my style_layers, 
#you can try it out with different range
style_layers = [outputs['block{}_conv1'.format(i)] for i in range(1,4)]
content_layer = outputs['block4_conv2']

    If you find K.variable, K.placeholder very confuse, please check the document of TensorFlow and Keras backend api.

Step 5 : Create function to find loss and gradient

#gram matrix is a matrix collect the correlation of all of the vectors
#in a set. Check wiki( 
#for more details
def gram_matrix(x):
    #change height,width,depth to depth, height, width, it could be 2,1,0 too
    #maybe 2,0,1 is more efficient due to underlying memory layout
    features = K.permute_dimensions(x, (2,0,1))
    #batch flatten make features become 2D array
    features = K.batch_flatten(features)
    return, K.transpose(features)) / x.get_shape().num_elements()    

def style_loss(x, targ):
    return metrics.mse(gram_matrix(x), gram_matrix(targ))
content_loss = lambda base, gen: metrics.mse(gen, base)    

#l[1] is the output(activation) of style_base, l[2] is the
#output of gen_img loss of style image and gen_img. As the
#paper suggest, we add the loss of all convolution layers
loss = sum([style_loss(l[1], l[2]) for l in style_layers]) 

#content_layer[0] is the output of content_base,
#content_layer[2] is the output of gen_img
#loss of content image and gen_img
loss += content_loss(content_layer[0], content_layer[2]) / 10. 

#The loss need two variables but we only pass in one,
#because we only got one placeholder in the graph,
#the other variable already determine by K.variable
grad = K.gradients(loss, gen_img)
#We cannot call loss and grad directly, we need
#to create a function(convert it to symbolic definition)
#before we can feed it into the solver
fn = K.function([gen_img], [loss] + grad)

    You can adjust the weight of style loss and content loss by yourself until you think the image looks good enough. The function at the end only tells you that the concatenated list of loss and grads is the output that you want to - eventually - minimize. So, when you feed it to the solver bfgs, it will try to minimize the loss and will stop when the gradients are also zero (a minimum, hopefully not just a local one).

Step 6 : Create a helper class to separate loss and gradient

#fn will return loss and grad, but fmin_l_bfgs need to seperate them
#that is why we need a class to separate loss and gradient and store them
class Evaluator:
    def __init__(self, fn_, shp_):
        self.fn = fn_
        self.shp = shp_
    def loss(self, x):
        loss_, grads_ = self.fn([x.reshape(self.shp)])
        self.grads = grads_.flatten().astype(np.float64)
        return loss_.astype(np.float64)
    def grad(self, x):
        return np.copy(self.grads)
evaluator = Evaluator(fn, content_shp)

Step 7 : Generate a random noise image(white noise image mentioned by the paper)

#This is the real value of the placeholder--gen_img
rand_img = lambda shape: np.random.uniform(-2.5, 2.5, shape)/100

Step 8 : Minimize the loss of rand_img with the source image and style image

def solve_img(evalu, niter, x):
    for i in range(0, niter):
        x, min_val, info = fmin_l_bfgs_b(evalu.loss, x.flatten(), 
                                         fprime=evalu.grad, maxfun = 20)
        #value of PIL lie within -127 and 127
        x = np.clip(x, -127, 127)
        print(i, ',Current loss value:', min_val)
        x = x.reshape(content_shp)
        simg = deproc(x.copy())
        img_name = '{}_{}_neural_style_img_{}.png'.
                    format(dpath + "gen_img/", content_img_name, i)
        imsave(img_name, simg[0])
    return x

solve_img(evaluator, 10, rand_img(content_shp)/10.)

    You may ask, why using fmin_l_bfgs_b but not stochastic gradient descent? The answer is we can, but we have a better choice. Unlike image classification, we do not have a lot of batch to run, right now we only need to figure out the loss and gradient between three inputs, they are source image, style image and the random image, using fmin_l_bfgs_b is more than enough.


Tuesday, 30 May 2017

Create a better images downloader(Google, Bing and Yahoo) by Qt5

  I mentioned how to create a simple Bing image downloader in Download Bing images by Qt5, in this post I will explain how do I tackle the challenges I have encountered when I try to build a better image downloader app by Qt5, the skills I used are apply on QImageScraper version_1.0, if you want to know the details, please dive into the codes, they are too complicated to write down in this blog.

1 : Show all of the images searched by Bing

  To show all of the images searched by Bing, we need to make sure the page is scrolled to the bottom, unfortunately there is no way to check it with 100% accuracy if we are not scrolling the page manually, because the height of the scroll bar keep changing when you scrolling it, this make the program almost impossible to determine when should it stop to scroll the page.


  I give several solutions a try but none of them are optimal, I have no choice but seek a compromise. Rather than scrolling the page full auto, I adopt semi auto solution as Pic.1 shown.


2 : Not all of the images are downloable

  There are several reasons may cause this issue.

  1. The search engine(Bing, Yahoo, Google etc) fail to find direct link of the image.
  2. The server "think" you is not a real human(robot?)
  3. Network error
  4. There are no error happen, but the reply of the server take too long

  Although I cannot find a perfect solution for problem 2, but there are some tricks to alleviate it, let the flow chart(Pic.2) clear the miasma.



  Simply put, if error happen, I will try to download thumbnail, if even the thumbnail cannot download, I will try next image. After all, this solution is not too bad, let us see the results of download 893 smoke images search by Google.


  All of the images could be downloaded, 817 of them are big images, 76 of them are small images, not a perfect results but not bad either. Something I did not mentioned in Pic.2 and Pic.3 are 

  1. I always switch user agents
  2. I start next download with random period(0.5second~1.5second)
  Purpose of these "weird" operations is try to emulate behaviors of humans, this could lower down the risk of being treated as "robot" by the servers. I cannot find free, trustable proxies yet, else I would like to randomly connect to different proxies time to time too, please tell me where to find those proxies if you know, thanks.

3 : Type of the images are mismatch or did not specify in file extension

  Not all of the images have correct type(jpg, png, gif etc), I am very lucky that Qt5 provide us QImageReader, this class can determine the type of the image from contents rather than extension. With it we can change the suffix of the file into the real format, remove the files which are not images.

4 : QFile fail to rename/remove file

  QFile::rename and QFile::remove got some troubles on windows(works well on mac), this bug me a while, it cost me one day to find out QImageReader blocking the file

5 : Invalid file name 

  Not all of the file name are valid, it is extremely hard to find out a perfect way to determine the file name is valid or not, I only do some minimal process for this issue--Remove illegal characters and trimmed white spaces.

6 : Deploy app on major platforms

  One of the strong selling points Qt is the ability of cross-platform, to tell you the truth I can build the app and run it on windows, mac and linux without changing single line of codes, it work out of the box. Problem is, deploy the app on linux is not fun at all, it is a very complicated task, I will try deploy this image downloader after linuxqtdeploy become mature.


  In this blog post, I reviewed some problems I met when using Qt5 to develop an image downloader, this is by no means exhaustive but scrape the surface, if you want to know the details, every nitty-gritty, better dive into source codes.

Download Bing images by Qt5
Source codes of QImageScraper

Sunday, 14 May 2017

Download Bing images by Qt5

  Have you ever need more data for your image classifier?I do, but download images search by Google, Bing nor Flickr one by one are very time consuming, why not we write a small, simple images scraper to help us? Sounds like a good idea, as usual, before I start the task, I list out the requirements of this small app.

a : Cross platform, able to work under ubuntu and windows with one code base(no plan for mobiles since this is a tool design for machine learning)
b : Support regular expression, because I need to parse the html
c : Support high level api of networking
d : Have decent webEngine, it is very hard(impossible?) to scrape the images from those search engine without it
e : Support unicode
f : Easy to create ui, because I want instant feedback of the website, this could speed up development times
g : Ease to build, solving dependency problem of different 3rd libraries are not fun at all

  After search through my toolbox I find out Qt5 is almost ideal for my task. In this post I will use Bing as an example(Google and Yahoo images share the same tricks, processes of scraping these big 3 image search engine are very similar). If you ever try to study the source codes of the search results of Bing, you will find out they are very complicated, difficult to read(Maybe MS spend lots of time to prevent users scrape images). Are you afraid?Rest assured, the steps of scraping image from Bing is a little bit complicated but not impossible as long as you have nice tools to aid you :).

Step 1 : You need a decent, modern browser like firefox or chrome

  Why do we need a decent browser?Because they have a powerful feature--Inspect Element, this function can help you find out the contents(links, buttons etc) of the website. 


Step 2 : Click Inspect Element on interesting content

  Move your mouse to the contents you want to observe and click Inspect Element.


After that the browser should show you the codes of the interesting content.


The codes point by the browser may not something you want, if this is the case, look around the codes point by the browser as Pic3 show.

Step 3 : Create a simple prototype by Qt5

  We already know how to inspect the source codes of the web page, let us create a simple ui to help us. This ui do not need to be professional or beautiful, after it is just a prototype. The functions we need to create a Bing image scraper are

a : Scroll pages
b : Click see more images
c : Parse and get the links of images
d : Download images

  With the help of Qt Designer, I am able to "draw" the ui(Pic4) within 5 minutes(ignore parse_icon_link and next_page buttons for this tutorial).


Since Qt Designer do not QWebEngineView yet, I add it manually by codes

    ui->gridLayout->addWidget(web_view_, 4, 0, 1, 2);

  Pic5 is what it looks like when running.


Step 4 : Implement scroll function by js

    //get scroll position of the scroll bar and make it deeper
    auto const ypos = web_page_->scrollPosition().ry() + 10000;
    //scroll to deeper y position
    web_page_->runJavaScript(QString("window.scrollTo(0, %1)").arg(ypos));

Step 5 : Implement parse image link function 

  Before we can get the full link of the images, we need to scrape the links of the page.

    web_page_->gttoHtml([this](QString const &contents)
        QRegularExpression reg("(search\\?view=detailV2[^\"]*)");
        auto iter = reg.globalMatch(contents);
            QRegularExpressionMatch match =;            
            if(match.captured(1).right(20) != "ipm=vs#enterinsights"){
                QString url = QUrl("" + 
                url.replace("&amp", "&");

Step 6 : Simulate "See more image"

  This part is a little bit tricky, I tried to find the words "See more images" but find nothing, the reason is the source codes return by View Page Source(Pic 6) do not update.

Pic 6

  Solution is easy, use Inspect Element to replace View Page Source(sometimes it is easier to find the contents you want by View Page Source, both of them are valuable for web scraping).


Step 7 : Download images

  Overall our ultimate goal is download the image we want, let us finish last part of this prototype.

  First, we need to get the html text of the image page(the page with the link of image source).


  Second, download the image
    void experiment_bing::web_page_load_finished(bool ok)
          qDebug()<<"cannot load webpage";

        web_page_->toHtml([this](QString const &contents)
            QRegularExpression reg("src2=\"([^\"]*)");
            auto match = reg.match(contents);
               QNetworkRequest request(match.captured(1));
               QString const header = "msnbot-media/1.1 (+http://search."
               //without this header, some image cannot download
               request.setHeader(QNetworkRequest::UserAgentHeader, header);                      
               downloader_->append(request, ui->lineEditSaveAt->text());
               qDebug()<<"cannot capture img link";
               //this image should not download again

  Third, download next image

void experiment_bing::download_finished(size_t unique_id, QByteArray)


  These are the key points of scraping images of Bing search by QtWebEngine, The downloader I use in this post are come from qt_enhance, whole prototype are placed at mega. If you want to know more, visit following link

Create a better images downloader(Google, Bing and Yahoo) by Qt5


  Because Qt-bug 66099, this prototype do not work under windows 10, unfortunately this bug is rated as P2, that means we may need to wait a while before it can be fixed by Qt community.


  Qt5.9 Beta4 fixed Qt-bug 60669, it works on my laptop(win 10 64bits) and desktop(Ubuntu 16.04.1).

  There exist a better solution to scrape image link of Bing, I will mentioned it on the next post.

Sunday, 12 March 2017

Challenge dog vs cat fun competition of kaggle by dlib and mxnet

  Today I want to record down the experiences I learned from the dog vs cat fun competition of kaggle which I spend about two weeks on it(this is the first competition I took). My best rank in this competition is 67, this rank is close to top 5%(there are 1314 team).

  The first tool I give it a try is dlib, although this library lack a lot of features compare with another deep learning toolbox, I still like it very much, especially the fact that dlib can work as zero dependency library.

What I have learned

1 : Remember to record down the parameters you used

At first I write down the records in my header file, this make my codes become harder to read as times go on, I should save those records in excel like format from the beginning. The other thing I learn is, I should record the parameters even I am running out of times, I find out without the records, I become more panic when the dead line was closer.

2 : Feed pseudo labels into the mini-batch with naive way do not work

 I should split the data of mini-batch with some sort of ratio, like 2/3 truth labels, 1/3 pseudo labels.

3 : Leverage pretrained model is much easier to get good results

I do not use pre-trained model but train the network from scratch, this do not give me great results, especially when I cannot afford to train the image with bigger size since my gpu only got 2GB of rams, my score was 0.27468 with brand new model. To speed things up, I treat resnet34 of dlib as feature extractor, save the features extracted by resnet34 and train on new network, this push my score to 0.09627, a big improve.

4 : Ensemble and k-cross validation

To improve my score, I split the data set into 5 cross data set and ensemble the results by average them, this push my score to 0.06266.  I do not apply stacking because I learn this technique after the competition finished. Maybe I can get better results if I know this technique earlier.

5 : How to use dlib, keras and mxnet

I put the codes of dlib and mxnet on github, I removed all of the non-work solutions, that is why you do not see any codes related to keras. keras did not help me improve my score but mxnet did, what I have done with mxnet was finetune all of the resnet pretrained models and ensemble them with the results trained by dlib. This improve my score to 0.05051.

6 : Read the post at forums, it may give you useful info

I learned that the data set got some "errors"  in it, I removed those false images from the training data set.

7 : Fast ai course is awesome, I should view them earlier

If I have watched the videos before I take this competition, I believe I could perform better in this competition. The forum of this course is very helpful too, it is royal free and open.

8 : X-Crop validation may help you improve your score

I found this technique from PyImageSearch and dlib, but I do not have enough of times to try this technique out.

9 : Save settings in JSON format

Rather than hard code the parameters, save them in JSON file is better, because

a : Do not need to change the source codes frequently, this save compile time
b : Every models, experiments can have their own settings record, easier to reproduce
training result

Thursday, 30 June 2016

Speed up image hashing of opencv(img_hash) and introduce color moment hash

    In this post, I would like to show you two things.

1 : How could I accelerate the speed of the img_hash module(click me) from 1.5x~500x(roughly) from my last post(click me).

2 : A new image hash algorithms which works quite well under rotation attack.

Accelerate the speed of img_hash

    We only need one line to gain this huge performance gain, no more, no less.


    What I do is close the optimization of openCL(I would not discuss why this speed things up dramatically on my laptop, if you are interesting about it, I would open another topic to discuss this phenomenon). Let us measure the performance after the change. Codes located at here(click me).

    Following comparison do not list the results of PHash about Average hash, PHash and Color hash algorithms, because I cannot find these algorithms in PHash library.

Computation time

Comparison time

Computation of img_hash with and without opencl

    As the results show, computation time of img_hash outperform PHash after I switch off opencl support(on you computer, switch it on may help you gain better performance) on my laptop(y410p). Whatever, the comparison performance do not change much with or without opencl support.

Benchmark of Color Moment Hash

    In this section, I would like to introduce an image hash algorithm which works quite well under rotation attack and provide a much better test results than my last post(click me).  This algorithm is introduced by this paper(click me), the class ColorMomentHash of img_hash module implement this algorithm.

    My last post only use one image--lena.png to do the experiment under different attack, in this post I will use the data set from phash to do the test(use miscellaneous data set(click me) as original image, apply different attack on it). These 3D bar charts are generated by Qt data visualization, I do not upload it to github yet because the codes are quite messy, if you need the source codes, please send the request to my email(, I would send you a copy of the codes, but do not expect I would refine the codes any time soon.

    The name of the images are quite long, it do not looks good when I draw it on chart, so I rename them to shorter form(001~023). Following are the mapping of those images. You can download the mapping of new name and old name from mega(click me).

    Threshold of the tests of color moment hash is 8, if the L2-Norm of two hash greater than 8, we treat it as fail, and draw it with red bars.

Contrast attack

Contrast attack on color moment hash
    Param is the gamma value of gamma correction.

Resize attack

Resize attack on color moment hash

    Param is the aspect ratio of horizontal and vertical site.

Gaussion noise attack

Gaussian noise attack on color moment hash

    Param is the standard deviation of gaussion.

Salt and pepper noise attack

Salt and pepper noise attack on color moment hash
    Param is the threshold of pepper and salt.

Rotation attack

Rotation attack on color moment hash

    Param is the angle of rotation.

Gaussian blur attack

Gaussian blur attack on color moment hash

    Param is the standard deviation of 3x3 gaussian filter.

Jpeg compression attack

Jpeg compression attack on color moment hash

    Param is the quality factor of jpeg compression, 100 means no compress.

Watermark attack

Watermark attack on color moment hash
    Param is the strength of watermark, 1.0 means the mark is 100% opaque. Image 017 and image 023 perform very poor because they are gray scale image.

    From these experiment data, we can say color moment hash perform very well under various attack except gaussion noise, salt and pepper noise and contrast attack.

Overall results of different algorithms

   Apparently, there are too many data to show for all of the algorithms, to make things more intuitive, I create the charts to help you measure the performance of these algorithms under different attacks.Their threshold are same as the last post(click me).

Average algorithm performance
PHash algorithm performance
Marr Hildreth algorithm performance
Radial hash algorithm performance
BMH zero algorithm performance
BMH one algorithm performance

Color moment algorithm performance
Overall reults
    These are the results of all of the algorithms, from the Overall results chart, it is easy to see that every algorithms have their pros and cons, you need to pick the one suit for your database. If speed is crucial, then average hash maybe is your best choices, because it is the fastest algorithms compare with other and perform very well under different attacks except of rotation and salt and pepper noise.If you need rotation resistance, color moment hash is you only choice because other algorithms suck on rotation attack. You can find the codes of these test cases from here(click me).

Compare with PHash library

    As this post show, img_hash module possess five advantages over the PHash library(click me).

1 : Processing speed of this module outperform PHash.

2 : This module adopt the same license as opencv(click me), which means you can do anything with it as you like without charging.

3 : The codes are much more modern, easier to use, img_hash free you from memory management chores once and for all. A modern, good c++ library should not force their users take care the resources by themselves.

4 : Api of img_hash are consistent, much easier to use than PHash library. Do not believe it? Let us see some examples.

Case 1a : Compute Radial Hash by PHash library

Digest digests_0, digests_1;
digest_0.coeffs = 0;
digest_1.coeffs = 1;
ph_image_digest(img_0, 1.0, 1.0, digest_0);
ph_image_digest(img_1, 1.0, 1.0, digest_1);

double pcc = 0;
ph_crosscorr(digest_0, digest_1, pcc, 0.9);
//do something, remember to free your memory :(

Case 1b : Compare Radial Hash by img_hash

auto algo = RadialVarianceHash::create();
cv::Mat hash_0, hash_1;
algo->compute(img_0, hash_0);
algo->compute(img_1, hash_1);
double const value = algo->compare(hash_0, hash_1);
//do something
//you do not need to free anything by yourself

Case 2a : Compute Marr Hash by PHash library

int N = 0;
uint8_t *hash_0 = ph_mh_imagehash(img_0, N);
uint8_t *hash_1 = ph_mh_imagehash(img_1, N);
double const value = ph_hammingdistance2(hash_0 , 72, hash_1, 72);   
//do something, remember to free your memory :(

Case 2b : Compare Marr Hash by img_hash

auto algo = MarrHildrethHash::create();
cv::Mat hash_0, hash_1;
algo->compute(img_0, hash_0);
algo->compute(img_1, hash_1);
double const value = algo->compare(hash_0, hash_1);
//do something
//you do not need to free anything by yourself

Case 3a : Compute Block mean Hash by PHash library

BinHash *hash_0 = 0;
BinHash *hash_1 = 0;
ph_bmb_imagehash(imgs_0, 1, &hash_0);
ph_bmb_imagehash(imgs_1, 1, &hash_1);

double const value = ph_hammingdistance2(hash_0->hash,
//do something, remember to free your memory :(

Case 3b : Compare Block mean Hash by img_hash

auto algo = BlockMeanHash::create(0);
cv::Mat hash_0, hash_1;
algo->compute(img_0, hash_0);
algo->compute(img_1, hash_1);
double const value = algo->compare(hash_0, hash_1);
//do something
//you do not need to free anything by yourself

    As you can see, img_hash not only faster, this module also provide you cleaner, more concise way to write your codes, you never need to remember different ways to find out your hash and how to compare them anymore, because the api of img_hash are consistent.

5 : This module only depend on opencv_core and opencv_imgproc, that means you should be able to compile it at ease on every major platform without scratching your heads.

Next move

    Develop an application--Similar Vision to show the capability of img_hash. Functions of this app are find out similar images from image set(of course, it will leverage the power of img_hash module) and similar video clips from videos. 

Sunday, 19 June 2016

Introduction to image hash module of opencv

    Anyone using the defacto standard computer vision library--opencv, have you ever hope opencv provide us ready to use, image hash algorithms like average hash, perceptual hash, block mean hash, radial variance hash, marr hildreth hash like PHash does? PHash sound like a robust solution and run quite fast, but prefer PHash mean you need to add more dependencies into your project and open your source codes, open source is not a viable option in most of the commercial products. Do you, like me, do not want to add more dependencies into your codes? Have a royalty free, robust and high performance image hash algorithms for your project?Let us admit it, we do not like to solve dependencies issues related to programming, beyond that, many of the commercial project need to remain close source, it would be much better if opencv provide us an image hash module.

    If opencv do not have one, why not just create one for it?

1 :  The algorithms of image hash are not too complicated.
2 :  PHash library already implement many of image hash algorithms, we could port them to opencv and use it as golden model.
3 :  opencv is an open source computer vision library. If we ever found any bugs, missing features, poor performance, we can do something to make it better.

    The good news is I have implement all of the algorithms I mentioned above, refine the codes(ex : block mean hash able to process single channel image, codes become cleaner and safer), do some bug fixes(please check the pull request if you want to know the details), free you from memory management chores and open a pull request. The bad news is this pull request hasn't merged yet when I write this post, so you need to clone/pull it down and build by yourself. Fear not, this module only depend on the core and imgproc of opencv, it should be fairly easy to build(opencv is quite easy to build from the beginning :)).

    Following examples will show you how to use img_hash, you will find out it is much easier to use than PHash library because the api are more consistent + you do not need to manage the memory by yourself.

How to use it

#include <opencv2/core.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/img_hash.hpp>
#include <opencv2/imgproc.hpp>

void computeHash(cv::Ptr<cv::img_hash::ImgHashBase> algo)
    cv::Mat const input = cv::imread("lena.png");
    cv::Mat const target = cv::imread("lena_blur.png");
    cv::Mat inHash; //hash of input image
    cv::Mat targetHash; //hash of target image

    //comupte hash of input and target
    algo->compute(input, inHash);
    algo->compute(target, targetHash);
    //Compare the similarity of inHash and targetHash
    //recommended thresholds are written in the header files
    //of every classes
    double const mismatch = algo->compare(inHash, targetHash);

int main()
    //disable opencl acceleration may boost up speed of img_hash
    //however, in this post I do not disable the optimization of opencl    

    //BlockMeanHash support mode 0 and mode 1, they associate to 
    //mode 1 and mode 2 of PHash library

    With these functions, we can measure the performance of our algorithms under different "attack", like resize, contrast, noise and rotation. Before we start the test, let me define the thresholds of "pass" and "fail".One thing to remember is, to make thing simple, I only use lena to show the results, different data set may need different thresholds/algorithms to get best results.


     After we determine our threshold, we could use our beloved lena to do the test :).


Resize attack

Resize attack

    Every algorithms(BMH mean block mean hash) work very well on different size and aspect ratio except of radial variance hash, this algorithms work on different size, but we need to keep the aspect ratio.

Contrast Attack

Contrast Attack

    Every algorithms works quite well under different contrast, although Radical variance hash, BMH zero and BMH one do not works well under very low contrast.

 Gaussian Noise Attack

Gaussian noise attack
    Very fortunate, every algorithms survive under the attack of gaussian nose.

Salt And Pepper Noise Attack

Salt and pepper noise attack
      As we can see, only Radical hash and BMH perform well under the attack of pepper and salt.

Rotation Attack

Rotation attack
    Apparently, all of the algorithms can not survive under rotation attack. But is this really matter?I guess not(do you always need to search the image after rotation by google?). If you really need to deal with rotation attack, I suggest you give BOVW(bag of visual words) a try, I use it to construct robust CBIR system before, the defects of robust BOVW based CBIR are long computation time, consume a lot of memory and much harder to scale to large data set(you will need to build up distributed system in that case).

    We have go through all of the tests, now let us measure the performance of hash computation time and comparison time of different algorithms(my laptop is Y410P, os is windows 10 64bits, compiler is vc2015 64bits with update 2 install).

    You can find all the details of different attacks at here(click me).

Computation Performance Test--img_hash vs PHash library

  I use different algorithms to compute the hash of 100 images from ukbench(ukbench03000.jpg~ukbench03099.jpg). The source codes of opencv comparison is located at here(check the function measure_computation_time and measure_comparison_time, I am using img_hash_1_0 when I am writing this post),  source codes of PHash performance test(version 0.94 since I am on windows) is located at here.

Computation performance test

Comparison performance test

    In most cases, img_hash is faster than PHash, but the speed of BMH zero and BMH one are slower than PHash version almost 30% or 40%.  The bottleneck is cv::resize(over 95% of times spend on it), to speed things up, we need a faster resize function.

Find similar image from ukbench

    The results looks good, but could it find similar images? Of course dude, let me show you how could we measure the hash values of our target from ukbench(for simplicity, I only pick 100 images from ukbench).


void find_target(cv::Ptr<cv::img_hash::ImgHashBase> algo, bool smaller)
    using namespace cv::img_hash;

    cv::Mat input = cv::imread("ukbench/ukbench03037.jpg");
    //not a good way to reuse the codes by calling
    //measure comparision time, please bear with me
    std::vector<cv::Mat> targets = measure_comparison_time(algo, "");

    double idealValue;
        idealValue = std::numeric_limits<double>::max();
        idealValue = std::numeric_limits<double>::min();
    size_t targetIndex = 0;
    cv::Mat inputHash;
    algo->compute(input, inputHash);
    for(size_t i = 0; i != targets.size(); ++i)
        double const value = algo->compare(inputHash, targets[i]);
            if(value < idealValue)
                idealValue = value;
                targetIndex = i;
            if(value > idealValue)
                idealValue = value;
                targetIndex = i;
    std::cout<<"mismatch value : "<<idealValue<<std::endl;
    cv::Mat result = cv::imread("ukbench/ukbench0" +
                                std::to_string(targetIndex + 3000) +
    cv::imshow("input", input);
    cv::imshow("found img " + std::to_string(targetIndex + 3000), result);

void find_target()
    using namespace cv::img_hash;

    find_target(RadialVarianceHash::create(), false);

    You will find out every algorithms give you back the same image you are looking for.


    Average hash and PHash are the fastest algorithms, but if you want a more robust one, pick BMH zero, BMH zero and BMH give similar resutls, but BMH one is slower since it need to spend more computation power. Hash comparision of Radial hash are much slower than other's, because it need to find out peak cross-correlation values from 40 combinations. If you want to know how to speed things up and know more about rotation invariant image hash algorithm, give this link(click me) a try.

    You can find the test cases at here. If you think this post helpful, please give my repositories(blogCodes2 and my img_hash of opencv_contrib) a star :). If you want to join the developments, please open a pull request, thanks.