Tuesday 15 September 2015

Deep learning 04--Compile mlpack-1.0.12 on windows 8.1 by visual studio 2015(64bits)

    As far as I know, most of the machine learning libraries of c++ are difficult to compile on windows, mlpack is one of them too(this lib implement sparse autoencoder and sparse coding, I would like to contribute something to this library in the future).If you want to do large scale machine learning, windows really is not a good platform for c++ since many libraries are hard to build or cannot get maximum performance on windows. However, your apps may need to run on windows since it is the most popular desktop OS.

     After a tedious journey of making mlpack work on windows, I want to write down the steps of how to compile mlpack, so I will never forget it.

   The steps to compile mlpack-1.0.12 are:
    1 : visual studio 2015 community--this version fixed a bug of vc, this bug will bring some trouble when compile armadillo(there are work around, like replace () by [] and use pointer to access data)
    • Visual studio 2015 would not install the c++ compiler by default, you need to select the custom install and select c++ by yourself
    • After you install vs2015, execute following command on command window,[
      copy "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\mspdbsrv.exe" 
      "C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE"
      ], this could fix a link issues( link.exe complains that MSPDB140.dll has the wrong version installed)
    2 : libxml2-2.9.2--deprecated, you can compile mlpack without it start from 2.x 
    • extract source codes(ex : c:/libxml2-2.9.2)
    • go to the folder c:/libxml2-2.9.2 and copy the configure.ac to configure.in
    • go to the folder c:/libxml2-2.9.2/win32 
    • open command prompt and enter cscript configure.js compiler=msvc iconv=no zlib=no debug=no
    • enter command "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" x86_amd64
    • enter command nmake /f Makefile.msvc install
    3 : zlib-1.2.8--deprecated, you can compile mlpack without it start from 2.x 
    • extract source codes(ex : c:/zlib-1.2.8)
    • go to the folder c:/zlib-1.2.8
    • enter command "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" x86_amd64
    • enter command  nmake -f win32/Makefile.msc AS=ml64 LOC="-DASMV -DASMINF -I." OBJA="inffasx64.obj gvmat64.obj inffas8664.obj"
    4 : libiconv-1.14-deprecated, you can compile mlpack without it start from 2.x 

        Download the zip file from source forge, open visual studio 2015 and compile, you will need an account of source forge to download this file

    5 : cmake3.3.2 or newer(start from 3.3.2, cmake support CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS)

    6 : install mingw-w64(I use mingw-w64 5.1.0 in this post)

    7 : lapack3.5.0--I have heard that openBLAS or intel mkl are faster than the blas come with lapack, but in this post I will use the blas come with lapack3.5.0
    • extract source codes(ex : c:/lapack3.5.0)
    • go to the folder c:/lapack3.5.0
    • add commands in CMakeLists.txt
    • open the CMakeLists.txt by cmake-gui
    • setup the native compilers of c, c++ and fortran as "x86_64-w64-mingw32-gcc.exe", "x86_64-w64-mingw32-g++.exe", "x86_64-w64-mingw32-gfortran.exe"
    • Disable BUILD_STATIC_LIBS and enable BUILD_SHARED_LIBS under the lable BUILD
    • Set the value(under label Ungrouped Entries)  VCVARSAMD64 as "C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/vcvarsx86_amd64.bat"
    • Set the value(under label CMake) CMake_GNUtoMS_VCVARS as "C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/vcvarsx86_amd64.bat"
    • Click Configure until all white
    • Click generate
    • Open the vcproject files and build
    7.1 : build openBLAS
    • BLAS is good, but openBLAS is much more faster than BLAS, it is almost three times faster on my laptop(Y410P)
    • Download msys
    • Clone openBLAS(git clone git://github.com/xianyi/OpenBLAS.git)
    • Setup the environment path of MSYS(ex : C:\msys)
    • Open command prompt
    • Go to the folder of openBLAS(ex : C:\OpenBLAS)
    • Type mingw32-make
    • You will find the .a and .dll under the folder C:\OpenBLAS
    8 : armadillo-5.600.2
    • extract source codes(ex : c:/armadillo-5.600.2)
    • go to the folder c:/aramadillo-5.600.2
    • open CMakeLists.txt by nodepad and add three lines set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
      add_definitions(-DARMA_64BIT_WORD)
      add_definitions(-DNOMINMAX) 
    • open the CMakeLists.txt by cmake-gui
    • Set the value(under label Ungrouped Entries)  BLAS_LIBRARY(ex : "C:/Users/yyyy/Qt/3rdLibs/lapack/lapack-3.5.0/bin/vc2015_x86_amd64/release/libopenblas.dll.a")
    • Set the value(under label Ungrouped Entries)  LAPACK_LIBRARY(ex : "C:/Users/yyyy/Qt/3rdLibs/lapack/lapack-3.5.0/bin/vc2015_x86_amd64/release/liblapack.lib")
    • Click Configure until all white
    • Click generate
    • Open the vcproject files and build 
    9 : boost_1_59_0-msvc-14.0-64
    • Just download and unzip, the community already build it for us
    10 : mlpack-1.0.12
    • extract source codes(ex : c:/mlpack-1.0.12)
    • go to the folder c:/mlpack-1.0.12 
    • Specify the path of the libraries, dll and setup some definition, the details can found at here(start from line 66~87)
    • Click Configure until all white
    • Click generate
    • Open the vcproject files and build 
    • If there are link error, specify the path of openblas, lapack, libxml2 by cmake-gui(under label Ungrouped Entries, I do not know why the set command can not work yet), configure and generate again
         Ok, after so much trouble, the mlpack finally work.I will use it to solve one of the exercise of UFLDL later on.



    Sunday 13 September 2015

    Deep learning 03--Self-Taught Learning and Unsupervised Feature Learning with Shark

        Shark is a fast, modular, feature-rich open-source C++ machine learning library. Is it? At least it is not fast at all without the support of Atlas. Without Atlas, I took almost 2.5 hours to train about 29000 samples of MNIST on windows 8 64bits using sparse autoencoder with 200 iterations. So why not I compile and link to Atlas? The bad news is, Atlas is difficult to build under windows and not well optimized for windows 64bits.My conclusion is, if you really want to use shark to do some serious training, change your OS.I will use ubuntu or other OS if I want to continue to use Shark, without Atlas the performance is unable to accept.The other lesson I learn from Shark is, if you want your libraries/apps portable, never ever develop your libraries/apps on top of those hard to compile libraries.

        Even the speed of Shark is quite slow without Atlas, it is still a modular, feature-rich machine learning library, I am quite impress about how good it split and combine different piece of concepts together. With the good design architecture of Shark, you can try out different results of algorithms with a few lines of codes, they are easy to read and elegant.

        I use Shark to solve one of the exercise of UFLDL , a pleasant experience except of the speed(without Atlas). I use 6 different Autoencoder to train the features than feed into random forest to classify the hand written digits from 5~9. The beauty of Shark is, I only need to do some changes to try out these 5(at first it is 8, but three of them are too damn slow or buggy) algorithms.

        First, you need to define the type of the autoencoder.  
       
    using Autoencoder1 =
    shark::Autoencoder<shark::LogisticNeuron, shark::LogisticNeuron>;
    
    using Autoencoder2 =
    shark::TiedAutoencoder<shark::LogisticNeuron, shark::LogisticNeuron>;
    
    using Autoencoder3 =
    shark::Autoencoder<
    shark::DropoutNeuron<shark::LogisticNeuron>,
    shark::LogisticNeuron
    >;
    
    using Autoencoder4 =
    shark::TiedAutoencoder<
    shark::DropoutNeuron<shark::LogisticNeuron>,
    shark::LogisticNeuron
    >;
    

    2 :  the sparse autoencoder and autoencoder use different cost functions, so I need to change the cost functions from ErrorFunction to SparseAutoEncoderError.

    3 :  sparse autoencoder cannot get good results with IRpropPlusFull, I need to use LBFGS to replace it.

    4 : The last thing is, the initial bias value of sparse autoencoder should be zero.

        Combine 2,3,4,  I decided to put them into different functions.

     
    template<typename Optimizer, typename Error, typename Model>
    std::string optimize_params(std::string const &encoder_name,
                                size_t iterate,
                                Optimizer *optimizer,
                                Error *error,
                                Model *model)
    {
        using namespace shark;
    
        Timer timer;
        std::ostringstream str;
        for (size_t i = 0; i != iterate; ++i) {
            optimizer->step(*error);
            str<<i<<" Error: "<<optimizer->solution().value <<"\n";
        }
        str<<"Elapsed time: " <<timer.stop()<<"\n";
        str<<"Function evaluations: "<<error->evaluationCounter()<<"\n";
    
        exportFiltersToPGMGrid(encoder_name, model->encoderMatrix(), 28, 28);
        std::ofstream out(encoder_name);
        boost::archive::polymorphic_text_oarchive oa(out);
        model->write(oa);
    
        return str.str();
    }
    
    template<typename Model>
    std::string train_autoencoder(std::vector<shark::RealVector> const &unlabel_data,
                                  std::string const &encoder_name,
                                  Model *model)
    {
        using namespace shark;
    
        model->setStructure(unlabel_data[0].size(), 200);   
        initRandomUniform(*model, -0.1*std::sqrt(1.0/unlabel_data[0].size()),
                    0.1*std::sqrt(1.0/unlabel_data[0].size()));
        
    
        SquaredLoss<RealVector> loss;
        UnlabeledData<RealVector> const Samples = createDataFromRange(unlabel_data);
        RegressionDataset data(Samples, Samples);
    
        ErrorFunction error(data, model, &loss);
        // Add weight regularization
        const double lambda = 0.01; // Weight decay paramater
        TwoNormRegularizer regularizer(error.numberOfVariables());
        error.setRegularizer(lambda, &regularizer);
    
        //output some info of model, like number of params, input size etc
        output_model_state(*model);
    
        IRpropPlusFull optimizer;
        optimizer.init(error);    
        return optimize_params(encoder_name, 200, &optimizer, &error, model);
    }
    
    template<typename Model>
    std::string train_sparse_autoencoder(std::vector<shark::RealVector> const &unlabel_data,
                                         std::string const &encoder_name,
                                         Model *model)
    {
        using namespace shark;
    
        model->setStructure(unlabel_data[0].size(), 200);    
        if(std::is_same<Model, Autoencoder2>::value ||
                std::is_same<Model, Autoencoder4>::value){            
            initRandomUniform(*model, -0.1*std::sqrt(1.0/unlabel_data[0].size()),
                        0.1*std::sqrt(1.0/unlabel_data[0].size()));
        }else{
            initialize_ffnet(model);
        }
        
    
        SquaredLoss<RealVector> loss;
        UnlabeledData<RealVector> const Samples = createDataFromRange(unlabel_data);
        RegressionDataset data(Samples, Samples);
    
        const double Rho = 0.01; // Sparsity parameter
        const double Beta = 6.0; // Regularization parameter
        SparseAutoencoderError error(data, model, &loss, Rho, Beta);
        // Add weight regularization
        const double lambda = 0.01; // Weight decay paramater
        TwoNormRegularizer regularizer(error.numberOfVariables());
        error.setRegularizer(lambda, &regularizer);
    
        //output some info of model, like number of params, input size etc
        output_model_state(*model);
    
        LBFGS optimizer;
        optimizer.lineSearch().lineSearchType() = LineSearch::WolfeCubic;
        optimizer.init(error);    
        return optimize_params(encoder_name, 400, &optimizer, &error, model);
    }
    
    

        After I have the train function, the training process become quite easy to write.


    void autoencoder_prediction(std::vector<shark::RealVector> const &train_data,
                                std::vector<unsigned int> const &train_label)
    {
        {
            Autoencoder1 model;
            train_autoencoder(train_data, "ls_ls.txt", &model);
            prediction("ls_ls.txt", "ls_ls_rtree.txt", train_data,
                       train_label, &model);
        }
    
        {
            Autoencoder2 model;
            train_autoencoder(train_data, "tied_ls_ls.txt", &model);
            prediction("tied_ls_ls.txt", "tied_ls_ls_rtree.txt", train_data,
                       train_label, &model);
        }
    
        //Autoencoder3 has bug, the prediction will stuck and cannot complete
        //Do not know it is cause by Shark3.0 beta or my fault
        /*
        {
            Autoencoder3 model;
            train_autoencoder(train_data, "dropls_ls.txt", &model);
            prediction("dropls_ls.txt", "dropls_ls_rtree.txt", train_data,
                       train_label, &model);
        });//*/
    
        {
            Autoencoder4 model;
            train_autoencoder(train_data, "tied_dropls_ls.txt", &model);
            prediction("tied_dropls_ls.txt", "tied_dropls_ls_rtree.txt", train_data,
                       train_label, &model);
        }
    }
    

        All of the algorithms use same function to train and predict, I only need to change the type of the autocoder and file names(the files will save the result of training).

        The part of prediction is almost same as the example of Shark, the different part is this example train and test on the data of MNIST. The codes are locate at github.

        Results of autoencoder

    Autoencoder1: iterate 200 times
    Random Forest on training set accuracy: 1
    Random Forest on test set accuracy: 0.978194

    Autoencoder2: iterate 200 times
    Random Forest on training set accuracy: 1
    Random Forest on test set accuracy: 0.9784
    Autoencoder4:iterate 200 times
    Random Forest on training set accuracy: 1
    Random Forest on test set accuracy: 0.918741


        Results of sparse autoencoder
    Autoencoder1: iterate 400 times
    Random Forest on training set accuracy: 1
    Random Forest on test set accuracy: 0.920798

    Autoencoder2: iterate 200 times
    Random Forest on training set accuracy: 1
    Random Forest on test set accuracy: 0.95721


        Visualization of the train results of autoencoder
    Autoencoder1

    Autoencoder2

    Autoencoder4


        Visualization of the train results of sparse autoencoder
    Autoencoder1
    Autoencoder2

         Okay, Shark is slow on windows, it is almost impossible to expect Shark can manage real time image recognition task, is it possible to develop real time image recognition application with deep learning algo without rewrite?Maybe caffe can save my ass, according to stackoverflow, CNN perform better than deep belief network if you are dealing with computer vision tasks. What if I really need a fast and stable autoencoder on windows?Maybe mlpack could be an option(no guarantee it could be build on windows). If I only want to do some research, using R or python maybe is a better solution, since they are easier to install and provide good performance, but I need to use c++ to create real world products, that is why a decent machine learning library is crucial for me.

    Wednesday 9 September 2015

    Deep learning 02--deep learning and sparse autoencoder

       Extract meaningful features from interesting object(ex : cat, dog, smoke, car, human face, bird and so on) is crucial, because it can dominate the prediction results. But as you can see, find out good features from the images may not an easy task and always require expertise knowledge, you may need to study tons of papers before you can decided which kind of features you should feed into the machine learning algorithms.Deep learning give use another option, rather than extract the features by human, deep learning let the computer figure out which features to use.

        The idea of deep learning could be explain by graph_00

       
    graph_00
        To tell you the truth, the first time I know there are some algorithms could extract features for us, I am very exciting about it. Not to mention there are many research results already proved that deep learning is a very powerful beast in the field of image recognition, in order to leverage the power of deep learning, I begin to study the tutorial of UFLDL before I use shark(shark is quite easy to compile on unix, linux, mac and windows, but could be very slow without atlas) and caffe(I do not know this one is easier to compile or not when I write this post, but the performance of caffe is awesome) to help me finish the tasks, because this could help me gain more sense about what are those algorithms doing about.

        The first deep learning algorithm introduced by UFLDL is sparse autoencoder, the implementation details of sparse autoencoder is quite daunting, even it may be the most easiest algorithm to understand in the deep learning algorithms.

        At the first glance, sparse autoencoder looks quite similar to the traditional neural network, the different part are

    1. It is train layer by layer
    2. The dimension of the hidden layers usually less than the input layers
    3. Sparse autoencoder add sparsity constraint on the hidden unit(autoencoder do not have this constraint)
    4. The purpose of sparse autoencoder is  find out how to use less features to reconstruct the input
       Autoencoder(or sparse autoencoder) is combine with three layers(graph_01), there are input layer, hidden layer and output layer.

    graph_01
        The output of the hidden layers are the features we want to feed into the other machine learning algorithms(ex : svm, random forest, softmax regression and so on) for classification tasks. Sometimes we also say the output of the hidden layers are "encoder", the output of the output layer are "decoder", because the hidden layer will extract the features from input layer, the output layer will reconstruct the input by the output of the hidden layer.

        The meaning of train it layer by layer is that we can use the output of the hidden layer as the next input of the autoencoder layer(this autoencoder also contain three layers). By this way we can build efficient deep network.

        The implementation details of sparse autoencoder of mine can be found at here, the example and the images file. graph_02 show the results train by my implementation. Since the mini-batch do not work well in this case, I need to use the full batch size to do the gradient descent, without algorithm like L-BFGS, the speed could be quite slow. In the next post, I will use shark to recognize the hand written digts from MNIST.

       
    graph_02

        Atlas is difficult to build on windows and is not well optimized for Windows 64 bit, if you want to get maximum speed from shark, better train the data under unix/linux.