Wednesday 27 November 2013

Misunderstanding of c++ 02 -- new and delete must be slower than malloc and free

    Many c programmers believe that new and delete must be slower than new and delete, because they do too many jobs under the hood.new and delete have to deal with exception, new and delete must call constructor and destructor etc.

What kind of mistakes they make

1 :  We could disable exception, today many major compilers could do that for us(g++, clang++, visual c++).

2 : If the constructor and destructor are trivial, new and delete don't need to call the constructor and destructor.

3 : The comparison is unfair, by default, new == malloc + constructor, delete = destructor + free.


    For more details, you could check stack overflow.

  In theory, compiler of c++ should be able to perform same performance about new and malloc, delete and free, but could the compiler do that?

Codes


c
#include <stdio.h>
#include <stdlib.h> // pulls in declaration of malloc, free


int main()
{
    char *a = (char*)malloc(1024);

    for(int i = 0; i != 1024; ++i){
        printf("%c", a[i]);
    }

    free(a);
    a = NULL;

    return 0;
}


c++
#include <cstdio>

int main()
{
    char *a = new char[1024];

    for(int i = 0; i != 1024; ++i){
        printf("%c", a[i]);
    }

    delete []a;

    return 0;
}

c assembly : clang -S -O3 -mllvm --x86-asm-syntax=intel mallocAndNew00.c

.section    __TEXT,__text,regular,pure_instructions
    .globl  _main
    .align  4, 0x90
_main:                                  ## @main
    .cfi_startproc
## BB#0:                                ## %entry
    push    RBP
Ltmp3:
    .cfi_def_cfa_offset 16
Ltmp4:
    .cfi_offset rbp, -16
    mov RBP, RSP
Ltmp5:
    .cfi_def_cfa_register rbp
    push    R14
    push    RBX
Ltmp6:
    .cfi_offset rbx, -32
Ltmp7:
    .cfi_offset r14, -24
    mov EDI, 1024
    call    _malloc
    mov R14, RAX
    mov EBX, 1
    xor EDI, EDI
    jmp LBB0_1
    .align  4, 0x90
LBB0_2:                                 ## %for.body.for.body_crit_edge
                                        ##   in Loop: Header=BB0_1 Depth=1
    movsx   EDI, BYTE PTR [R14 + RBX]
    inc RBX
LBB0_1:                                 ## %for.body
                                        ## =>This Inner Loop Header: Depth=1
    call    _putchar
    cmp EBX, 1024
    jne LBB0_2
## BB#3:                                ## %for.end
    mov RDI, R14
    call    _free
    xor EAX, EAX
    pop RBX
    pop R14
    pop RBP
    ret
    .cfi_endproc


.subsections_via_symbols


c++ assembly : clang++ -S -O3 -std=c++11 -mllvm --x86-asm-syntax=intel mallocAndNew00.cpp
 .section __TEXT,__text,regular,pure_instructions
 .globl _main
 .align 4, 0x90
_main:                                  ## @main
 .cfi_startproc
## BB#0:                                ## %entry
 push RBP
Ltmp3:
 .cfi_def_cfa_offset 16
Ltmp4:
 .cfi_offset rbp, -16
 mov RBP, RSP
Ltmp5:
 .cfi_def_cfa_register rbp
 push R14
 push RBX
Ltmp6:
 .cfi_offset rbx, -32
Ltmp7:
 .cfi_offset r14, -24
 mov EDI, 1024
 call __Znam
 mov R14, RAX
 mov EBX, 1
 xor EDI, EDI
 jmp LBB0_1
 .align 4, 0x90
LBB0_2:                                 ## %for.body.for.body_crit_edge
                                        ##   in Loop: Header=BB0_1 Depth=1
 movsx EDI, BYTE PTR [R14 + RBX]
 inc RBX
LBB0_1:                                 ## %for.body
                                        ## =>This Inner Loop Header: Depth=1
 call _putchar
 cmp EBX, 1024
 jne LBB0_2
## BB#3:                                ## %for.end
 test R14, R14
 je LBB0_5
## BB#4:                                ## %delete.notnull
 mov RDI, R14
 call __ZdaPv
LBB0_5:                                 ## %delete.end
 xor EAX, EAX
 pop RBX
 pop R14
 pop RBP
 ret
 .cfi_endproc


.subsections_via_symbols

  This time the codes of c++ and c have some different, c++ call the assembly of new and delete(?), c call the assembly of malloc and new. c++ check the pointer is point to nullptr or not before delete, but c do not check it anyway.If you want the same behavior, call malloc and free in c++, do make sure what are you doing before you call malloc and free but not new and delete.

codes can download from github.

Monday 25 November 2013

Misunderstanding of c++ 01 -- c++ must slower than c because c++ always construct and destruct object

    Unlike c, c++ support constructor and destructor, these are powerful tool for resource management. Many(or some?) c programmers claim that constructor and destructor could kill the performance, and they don't want to take the burden of it, they want to gain full control and this seems like an impossible mission in c++. Is this true? Absolutely no.

    What kind of mistakes they make?

1 : We do not have to take the cost of constructor and destructor, it is not a must.In other words, if your structure or class satisfy the requirements of trivial constructor and trivial destructor, you program will not invoke any constructor nor destructor.

A class or struct is trivial constructor if

A constructor of a class A is trivial if all the following are true:
  • It is implicitly defined
  • A has no virtual functions and no virtual base classes
  • All the direct base classes of A have trivial constructors
  • The classes of all the nonstatic data members of A have trivial constructors
If any of the above are false, then the constructor is nontrivial.

A class or struct is trivial destructor if

A destructor of a class A is trivial if all the following are true:
  • It is implicitly defined
  • All the direct base classes of A have trivial destructors
  • The classes of all the nonstatic data members of A have trivial destructors
If any of the above are false, then the destructor is nontrivial.


  All of the POD have trivial destructor and trivial constructor.

2 :  You think constructor and destructor cost you something, but in truth they rarely do(compare with equivalent C behavior)


  In theory, we don't have to take the cost of constructor and destructor,  moreover,  they rarely cost us anything. Theory is good, let us fallback to reality, can compiler get the job done?If you don't sure about it, measure, rather than guessing about performance.

  Following codes are compiled by clang3.3 and clang++3.3 on mac OS X 10.8.5(mac mini, intel cpu).


High level codes and associative assembly of case 1
 
 

c codes
struct trivialStruct
{

int a;
float b;
double c;

};

int main()
{
  struct trivialStruct A;  

  return 0;
}

assembly generated by command "clang -S -O2 -mllvm --x86-asm-syntax=intel trivialConstructDestruct00.c"
 .section __TEXT,__text,regular,pure_instructions
 .globl _main
 .align 4, 0x90
_main:                                  ## @main
 .cfi_startproc
## BB#0:                                ## %entry
 push RBP
Ltmp2:
 .cfi_def_cfa_offset 16
Ltmp3:
 .cfi_offset rbp, -16
 mov RBP, RSP
Ltmp4:
 .cfi_def_cfa_register rbp
 lea RDI, QWORD PTR [RIP + L_.str]
 mov AL, 2
 call _printf
 xor EAX, EAX
 pop RBP
 ret
 .cfi_endproc

 .section __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
 .asciz  "%d, %f, %f"


.subsections_via_symbols


c++ codes
struct trivialStruct
{

int a;
float b;
double c;

};

int main()
{
  trivialStruct A;  

  return 0;
}

assembly generated by command "clang++ -S -O2 -mllvm --x86-asm-syntax=intel trivialConstructDestruct00.cpp"
 .section __TEXT,__text,regular,pure_instructions
 .globl _main
 .align 4, 0x90
_main:                                  ## @main
 .cfi_startproc
## BB#0:                                ## %entry
 push RBP
Ltmp2:
 .cfi_def_cfa_offset 16
Ltmp3:
 .cfi_offset rbp, -16
 mov RBP, RSP
Ltmp4:
 .cfi_def_cfa_register rbp
 lea RDI, QWORD PTR [RIP + L_.str]
 mov AL, 2
 call _printf
 xor EAX, EAX
 pop RBP
 ret
 .cfi_endproc

 .section __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
 .asciz  "%d, %f, %f"


.subsections_via_symbols



  Apparently, c++ have to take the cost of destructor and constructor are false, they are part of the misunderstading because of FUD.

High level codes and associative assembly of case 2

c

#include 
#include 

struct trivialStruct
{

int *a;
float *b;
float *c;

};

void construct_trivial_struct(struct trivialStruct *data)
{
  data->a = (int*)malloc(sizeof(int));
  data->b = (float*)malloc(sizeof(float));
  data->c = (float*)malloc(sizeof(float));
  
  *data->a = 100;
  *data->b = 200;
  *data->c = 300;
}

void destruct_trivial_struct(struct trivialStruct *data)
{
  free(data->a);
  free(data->b);
  free(data->c);
  
  data->a = NULL;
  data->b = NULL;
  data->c = NULL;
}

int main()
{
  struct trivialStruct A;
  construct_trivial_struct(&A);
  printf("%d, %f, %f", *A.a, *A.b, *A.c);
  
  destruct_trivial_struct(&A);

  return 0;
}


assembly generated by command "clang -S -O2 -mllvm --x86-asm-syntax=intel trivialConstructDestruct00.c"
 .section __TEXT,__text,regular,pure_instructions
 .globl _construct_trivial_struct
 .align 4, 0x90
_construct_trivial_struct:              ## @construct_trivial_struct
 .cfi_startproc
## BB#0:                                ## %entry
 push RBP
Ltmp3:
 .cfi_def_cfa_offset 16
Ltmp4:
 .cfi_offset rbp, -16
 mov RBP, RSP
Ltmp5:
 .cfi_def_cfa_register rbp
 push R15
 push R14
 push RBX
 push RAX
Ltmp6:
 .cfi_offset rbx, -40
Ltmp7:
 .cfi_offset r14, -32
Ltmp8:
 .cfi_offset r15, -24
 mov RBX, RDI
 mov EDI, 4
 call _malloc
 mov R14, RAX
 mov QWORD PTR [RBX], R14
 mov EDI, 4
 call _malloc
 mov R15, RAX
 mov QWORD PTR [RBX + 8], R15
 mov EDI, 4
 call _malloc
 mov QWORD PTR [RBX + 16], RAX
 mov DWORD PTR [R14], 100
 mov DWORD PTR [R15], 1128792064
 mov DWORD PTR [RAX], 1133903872
 add RSP, 8
 pop RBX
 pop R14
 pop R15
 pop RBP
 ret
 .cfi_endproc

 .globl _destruct_trivial_struct
 .align 4, 0x90
_destruct_trivial_struct:               ## @destruct_trivial_struct
 .cfi_startproc
## BB#0:                                ## %entry
 push RBP
Ltmp12:
 .cfi_def_cfa_offset 16
Ltmp13:
 .cfi_offset rbp, -16
 mov RBP, RSP
Ltmp14:
 .cfi_def_cfa_register rbp
 push RBX
 push RAX
Ltmp15:
 .cfi_offset rbx, -24
 mov RBX, RDI
 mov RDI, QWORD PTR [RBX]
 call _free
 mov RDI, QWORD PTR [RBX + 8]
 call _free
 mov RDI, QWORD PTR [RBX + 16]
 call _free
 mov QWORD PTR [RBX + 16], 0
 mov QWORD PTR [RBX + 8], 0
 mov QWORD PTR [RBX], 0
 add RSP, 8
 pop RBX
 pop RBP
 ret
 .cfi_endproc

 .section __TEXT,__literal8,8byte_literals
 .align 3
LCPI2_0:
 .quad 4641240890982006784     ## double 200
LCPI2_1:
 .quad 4643985272004935680     ## double 300
 .section __TEXT,__text,regular,pure_instructions
 .globl _main
 .align 4, 0x90
_main:                                  ## @main
 .cfi_startproc
## BB#0:                                ## %entry
 push RBP
Ltmp18:
 .cfi_def_cfa_offset 16
Ltmp19:
 .cfi_offset rbp, -16
 mov RBP, RSP
Ltmp20:
 .cfi_def_cfa_register rbp
 lea RDI, QWORD PTR [RIP + L_.str]
 movsd XMM0, QWORD PTR [RIP + LCPI2_0]
 movsd XMM1, QWORD PTR [RIP + LCPI2_1]
 mov ESI, 100
 mov AL, 2
 call _printf
 xor EAX, EAX
 pop RBP
 ret
 .cfi_endproc

 .section __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
 .asciz  "%d, %f, %f"


.subsections_via_symbols
c++
#include 
#include 

struct trivialStruct
{

trivialStruct();
~trivialStruct();

int *a;
float *b;
float *c;

};

trivialStruct::trivialStruct() : 
a((int*)malloc(sizeof(int))), 
b((float*)malloc(sizeof(float))), 
c((float*)malloc(sizeof(float)))
{
  *a = 100;
  *b = 200;
  *c = 300;
}

trivialStruct::~trivialStruct()
{
  free(a);
  free(b);
  free(c);
  
  a = nullptr;
  b = nullptr;
  c = nullptr;
}

int main()
{
  trivialStruct A;
  printf("%d, %f, %f", *A.a, *A.b, *A.c);

  return 0;
}

 .section __TEXT,__text,regular,pure_instructions
 .globl __ZN13trivialStructC1Ev
 .align 4, 0x90
__ZN13trivialStructC1Ev:                ## @_ZN13trivialStructC1Ev
 .cfi_startproc
## BB#0:                                ## %entry
 push RBP
Ltmp3:
 .cfi_def_cfa_offset 16
Ltmp4:
 .cfi_offset rbp, -16
 mov RBP, RSP
Ltmp5:
 .cfi_def_cfa_register rbp
 push R15
 push R14
 push RBX
 push RAX
Ltmp6:
 .cfi_offset rbx, -40
Ltmp7:
 .cfi_offset r14, -32
Ltmp8:
 .cfi_offset r15, -24
 mov RBX, RDI
 mov EDI, 4
 call _malloc
 mov R14, RAX
 mov QWORD PTR [RBX], R14
 mov EDI, 4
 call _malloc
 mov R15, RAX
 mov QWORD PTR [RBX + 8], R15
 mov EDI, 4
 call _malloc
 mov QWORD PTR [RBX + 16], RAX
 mov DWORD PTR [R14], 100
 mov DWORD PTR [R15], 1128792064
 mov DWORD PTR [RAX], 1133903872
 add RSP, 8
 pop RBX
 pop R14
 pop R15
 pop RBP
 ret
 .cfi_endproc

 .globl __ZN13trivialStructC2Ev
 .align 4, 0x90
__ZN13trivialStructC2Ev:                ## @_ZN13trivialStructC2Ev
 .cfi_startproc
## BB#0:                                ## %entry
 push RBP
Ltmp12:
 .cfi_def_cfa_offset 16
Ltmp13:
 .cfi_offset rbp, -16
 mov RBP, RSP
Ltmp14:
 .cfi_def_cfa_register rbp
 push R15
 push R14
 push RBX
 push RAX
Ltmp15:
 .cfi_offset rbx, -40
Ltmp16:
 .cfi_offset r14, -32
Ltmp17:
 .cfi_offset r15, -24
 mov RBX, RDI
 mov EDI, 4
 call _malloc
 mov R14, RAX
 mov QWORD PTR [RBX], R14
 mov EDI, 4
 call _malloc
 mov R15, RAX
 mov QWORD PTR [RBX + 8], R15
 mov EDI, 4
 call _malloc
 mov QWORD PTR [RBX + 16], RAX
 mov DWORD PTR [R14], 100
 mov DWORD PTR [R15], 1128792064
 mov DWORD PTR [RAX], 1133903872
 add RSP, 8
 pop RBX
 pop R14
 pop R15
 pop RBP
 ret
 .cfi_endproc

 .globl __ZN13trivialStructD1Ev
 .align 4, 0x90
__ZN13trivialStructD1Ev:                ## @_ZN13trivialStructD1Ev
 .cfi_startproc
## BB#0:                                ## %entry
 push RBP
Ltmp21:
 .cfi_def_cfa_offset 16
Ltmp22:
 .cfi_offset rbp, -16
 mov RBP, RSP
Ltmp23:
 .cfi_def_cfa_register rbp
 push RBX
 push RAX
Ltmp24:
 .cfi_offset rbx, -24
 mov RBX, RDI
 mov RDI, QWORD PTR [RBX]
 call _free
 mov RDI, QWORD PTR [RBX + 8]
 call _free
 mov RDI, QWORD PTR [RBX + 16]
 call _free
 mov QWORD PTR [RBX + 16], 0
 mov QWORD PTR [RBX + 8], 0
 mov QWORD PTR [RBX], 0
 add RSP, 8
 pop RBX
 pop RBP
 ret
 .cfi_endproc

 .globl __ZN13trivialStructD2Ev
 .align 4, 0x90
__ZN13trivialStructD2Ev:                ## @_ZN13trivialStructD2Ev
 .cfi_startproc
## BB#0:                                ## %entry
 push RBP
Ltmp28:
 .cfi_def_cfa_offset 16
Ltmp29:
 .cfi_offset rbp, -16
 mov RBP, RSP
Ltmp30:
 .cfi_def_cfa_register rbp
 push RBX
 push RAX
Ltmp31:
 .cfi_offset rbx, -24
 mov RBX, RDI
 mov RDI, QWORD PTR [RBX]
 call _free
 mov RDI, QWORD PTR [RBX + 8]
 call _free
 mov RDI, QWORD PTR [RBX + 16]
 call _free
 mov QWORD PTR [RBX + 16], 0
 mov QWORD PTR [RBX + 8], 0
 mov QWORD PTR [RBX], 0
 add RSP, 8
 pop RBX
 pop RBP
 ret
 .cfi_endproc

 .section __TEXT,__literal8,8byte_literals
 .align 3
LCPI4_0:
 .quad 4641240890982006784     ## double 200
LCPI4_1:
 .quad 4643985272004935680     ## double 300
 .section __TEXT,__text,regular,pure_instructions
 .globl _main
 .align 4, 0x90
_main:                                  ## @main
 .cfi_startproc
## BB#0:                                ## %entry
 push RBP
Ltmp34:
 .cfi_def_cfa_offset 16
Ltmp35:
 .cfi_offset rbp, -16
 mov RBP, RSP
Ltmp36:
 .cfi_def_cfa_register rbp
 lea RDI, QWORD PTR [RIP + L_.str]
 movsd XMM0, QWORD PTR [RIP + LCPI4_0]
 movsd XMM1, QWORD PTR [RIP + LCPI4_1]
 mov ESI, 100
 mov AL, 2
 call _printf
 xor EAX, EAX
 pop RBP
 ret
 .cfi_endproc

 .section __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
 .asciz  "%d, %f, %f"


.subsections_via_symbols

    We could see that the part of construct and destruct are same as the assembly generated by c.Although the compiler generate two pieces of constructor and destructor for us, this do not mean the codes will become fatter or slower, because the linker could remove the duplicate the codes for us, even if they are not removed(hard to believe this will happen in modern linker), they will take some space in memory, but will never be evaluated.

    One of the way to verify this is separate the declaration and definition of the struct, generate the assembly by same command and look at the codes, you will find that the assembly only call for one symbol of the constructor. The final step is verify the size of the exe, if the linker haven't removed duplicate codes, the size of the exe should be fatter. The other solution is download a disassemble to analyze the exe by yourself.

    In the conclusion, most of the times we have to construct or destruct the object, so constructor and destructor rarely cost us anything(compare with equivalent c behavior).Even you really don't them, you could disable them by your wish, nothing stop you from gaining full control in c++.

    Codes are available on github.

Saturday 23 November 2013

Deploy Qt apps to android phone on mac

    After some struggle, I figure out how to deploy android application by Qt5 on my mac mini.Before you start, make sure you remove other version of Qt from your mac, else it may deploy wrong version of Qt onto the android device.

Step 1 : download the tools you need


    About the android NDK, make sure you didn't download the legacy-toolchains(I don't know what is this for yet), download the bigger one.

Step 2 : setup the android configurations

graph_00

  graph_00 show you how to setup the location after you unzip and install(this step is trivial, if you got any question, please leave me a comment).

Step 3 : pick a devices and give it a test

    My device is GT-S6310, I use api 10 and armeabi to deploy the example of calculator(QWidget) onto the device.

graph_01

    Connect the device by usb, click the run button and you will see the app run alive and kicking on the device.Do remember to enable the debugging options of your device.In my case,
it is Settings->Developer options->USB debugging(enable this one), if you don't enabled it, your mac can't detect the device(you could check the device is connect or not by "adb devices").

    I tried other examples like maroon, emitters, same, qml video effects, not all of them are deployable, even they did, not all of the functions are work flawlessly. In my humble opinion, Qt5.2 beta1 still far from productive on android.Hope that the release version of Qt5.2 could give users a much more stable library.


Friday 22 November 2013

Misunderstanding of c++ 00 -- Before you blame c++, please make sure you know c++

  Inside of c++ object model, Bjarne and the lessons of Scott meyers keep telling us the performance of c++ is at least as good as c.But there are still tons of c programmers who do not understand c++ always denigrade the performance of c++ with their FUD perceptions.There are many forums have the fight between c programmers and c++ programmers, from performance vary to the complexity of the languages and so on.

  None of the language is perfect, it is a truth that c++ have so many defects, but most of the "defects" claimed by those c programmers are totally nonsense, they have no sense what are they talking about. It is natural for programmers to disagree, dislike the features of a language, but before you denigrate any languages, tools or features you don't like, at least do some homework. c is good, but c++ is far more better. Exactly, many big, important softwares are shift from c to c++, not c++ to c.Here are other projects of c++ which demonstrate the ability of c++, showing that c++ could work on embedded devices pretty well.




   Here are some immature criticisms come from those FUD  c programmers.


  • c++ is slower than c because compiler do too many things for us in the backgroud
  • c++ is slower than c because c++ always construct and destruct object
  • new and delete must slower than malloc and free because they have to(?) deal with exception, constructor and destructor
  • Inheritance make c++ become fatter and slower than c
  • virtual is very slow and fat compare with function pointer
  • template will lead to code bloat
  • RAII will cause memory fragementation
  • RAII do not suit for resource like file, mutex, thread and so on
  • macro is easier to use than template
  • c is easier to read and maintain than c++
  • higher level of abstraction equal to slower, so c++ must be slower than c
  • and so on(please tell me what do you  want to add on this list)
  Generally speaking, one of the philosophy of c++ is "you don't pay for what you don't use".

  It is a waste of time to show them their worry are based on the fact that c programmers are not c++ programmers again and again, so I decided to write down the experience and prove on my blog.

   A war between different languages are almost meaningless, because there are too many programmers like to attack the tools they don't know with their self-imagine image without research, prove, learning.As the father of c++ tell us, "Before dismissing C++ features, such as classes and templates, for demanding tasks, learn the basics of their use and try them out. Measure, rather than guessing about performance!".

   In these series of post, I will try my best to explain why those reason of c programmers are wrong, what are the problems of those criticisms.  I am far from an expert of programming or c++,  if you find anything doubt or uncertainty of my post, please leave some comments.

  However, even these series of post will spend a lot of times on discussing the performance of c++, performance is not the only factor of choosing your tools, like portability, maintainability, develop time, level of programmers, but it is  another story to tell.

machine learning--04 : Normal equation

  Besides gradient descent, we could use normal equation to find out the optimal hypothesis function too.


   As equation 1 show, normal equation is far more easier to implement with compare to gradient descent.If the matrix is singular, we could either decrease the number of features, or using SVD to find an approximation of the inverse matrix.


  The pros and cons of gradient descent vs normal equation.


Gradient Descent
   


  • Need to choose alpha
  • Needs many iterations
  • works well even when n(number of features) is large

Normal Equation

  • No need to choose alpha
  • Don't need to iterate
  • Need to compute inverse matrix
  • slow if n(number of features) is very large

  The price of computing the inverse matrix is almost same as O(n^3), this kind of complexity is unacceptable when the number of n is big(10000 or more, depends on your machine).

Saturday 16 November 2013

machine learning--03 : logistic regression

  According to the open course, logistic regression is same as "classification regression", same as linear regression, it is belong to supervised machine learning algorithm, but the targets of logistic regression are digital, not analog.Following are the memo of this open course.

Example

Email : Spam/Not Spam?
Tumor : Malignant/Benign

y=1, 0
0 : "negative class"(Not Spam)
1 : "positive class""(Spam)

Question
    
1 : Why do we need logistic regression?Can't we just use linear regression to predict the results by setting some threshold?

A : Because the results are far from perfect. Assume we pick the threshold as 0.5

graph_00

The examples of graph_00 works perfectly, but the prediction is very easy to break if we add new sample into the training set as graph_01 show.

graph_01



B : the hypothesis function of linear regression can greater than 1 or less than zero.


2 : What is the model of logistic regression?

The course use sigmoid function as the model, the output range of sigmoid function is [0, 1], this is a neat feature for classification.


  
graph_02


The output of the hypothesis is same as


Monday 11 November 2013

machine learning--02 : Estimate the batch gradient descent is converge or not

    This is an exercise from the open course of machine learning, one of the purpose of this exercise is measure the batch gradient descent is converge or not.




    Equation 1 is the cost function, the procedures of measuring the convergence of gradient is compare different learning rate of the gradient descent, draw the graph and make sure the value of the cost function is decrease on every iteration.

    At first, I implement the cost function and batch gradient descent.


/**
 *@brief calculate the theta generated by the cost function 
 * of the linear regression
 */
template<typename T>
T cost_function(cv::Mat_<T> const &features, 
cv::Mat_<T> const &labels, 
cv::Mat_<T> const &theta)
{
    cv::Mat_<T> const temp = features * theta - labels;
    cv::Mat_<T> results = temp.t() * temp;
    results /= 2 * labels.rows;

    return *results.template ptr<T>(0);
}

/**
 *@brief find the costs of each iteration
 *@param features input sequence
 *@param labels output sequence
 *@param alpha determine the step of each iteration, smaller alpha would 
 * cause longer time to iterate but with higher chance to converge;
 * larger a;pha will run faster but with higher chance to divert.
 * Since this function has a fixed iteration time, so alpha 
 * only affect accuracy.
 *@param iterate iterate time
 *@return the costs of each iteration
 */
template<typename T>
cv::Mat_<T> batch_gradient_descent_cost(cv::Mat_<T> const &features, 
cv::Mat_<T> const &labels, 
T alpha = 0.07, size_t iterate = 1)
{
    cv::Mat_<T> theta = cv::Mat_<T>::zeros(features.cols, 1);
    cv::Mat_<T> results;
    T const ratio = alpha / features.rows;
    for(size_t i = 0; i != iterate; ++i){
        cv::Mat_<T> const data = ratio * 
                linear_regression(features, labels, theta);
        results.push_back(cost_function(features, labels, theta));
        theta -= data;
    }

    return results;
}

Draw the cost vs iteration graph by qwt.


/**
 * @brief draw and show the charts of cost function vs iteration
 * @param plot a 2-d plotting widget
 * @param features input(features)
 * @param labels output(target)
 * @param a
 */
template<typename T>
void cost_vs_iteration(simple2DPlot &plot, cv::Mat_<T> const &features, 
cv::Mat_<T> const &labels, QApplication &a)
{
    size_t const iterate_times = 50;
    T const ratio[] = {0.01, 0.03, 0.1, 0.3, 1, 1.2};
    QColor const colors[] = { {255, 0, 0}, {0, 255, 0}, {0, 0, 255}, 
                              {255, 255, 0}, {255, 128, 128}, 
                              {128, 0, 128}};
    std::vector<T> iterates(50);
    std::iota(std::begin(iterates), std::end(iterates), 1);
    for(size_t i = 0; i != sizeof(ratio) / sizeof(T); ++i){
        cv::Mat_<T> const costs = batch_gradient_descent_cost<T>(features, 
                                       labels, ratio[i], iterate_times);
        plot.insert_curve(std::begin(iterates), std::end(iterates), 
                          std::begin(costs));
        plot.get_curve(i).setTitle(
          QString::fromStdString(std::to_string(ratio[i])));
        plot.get_curve(i).setPen(colors[i]);
        plot.get_curve(i).setStyle(QwtPlotCurve::Lines);
    }

    plot.replot();
    plot.render_document("costs vs iteration", {300, 200});

    plot.resize(600, 400);
    plot.show();
    a.exec();

    plot.clear();
    cv::Mat_<T> const costs = 
    batch_gradient_descent_cost<T>(features, labels, 1.3, iterate_times);
    plot.insert_curve(std::begin(iterates), std::end(iterates), 
                      std::begin(costs));
    plot.get_curve(0).setTitle("1.3");
    plot.get_curve(0).setStyle(QwtPlotCurve::Lines);
    plot.render_document("diverge", {300, 200});

    plot.replot();
    plot.show();

    a.exec();
}


graph-00 : converge

graph-01 : diverge


 The implementation of simple2DPlot, the codes of main project.

machine learning--01 : linear regression and batch gradient descent

  Linear regression is one of the famous machine learning algorithm.This algorithm assume the relationship between the input(features) and the output(targets) are linear, it use a hypothesis to predict the result.




    Equation 1 is the hypothesis we use to predict the results, usually we define X0 as 1.To find out the parameters, we could use batch gradient descent.



   Since the square error produce by linear regression only has one minimum value(it is a convex), we don't need to worry about the starting point.You could find more details on this website. Implement this algorithm with the help of openCV is pretty simple(octave/matlab even more simpler).


/**
 *@brief linear regression
 *@param features input sequence
 *@param labels output sequence
 *@param theta the parameters we want to find
 *@return new theta
 */
template<typename T>
cv::Mat_<T> linear_regression(cv::Mat_<T> const &features, 
cv::Mat_<T> const &labels, 
cv::Mat_<T> const &theta)
{
    cv::Mat_<T> result = features.clone();
    cv::transpose(result, result);
    result *= (features * theta - labels);

    return result;
}

/**
 *@brief batch gradient descent
 *@param features input sequence
 *@param labels output sequence
 *@param alpha determine the step of each iteration, smaller alpha would 
 * cause longer time to iterate but with higher chance to converge;
 * larger a;pha will run faster but with higher chance to divert.
 * Since this function has a fixed iteration time, so alpha only 
 * affect accuracy.
 *@param iterate iterate time
 *@return theta, the parameters of batch gradient descent searching
 */
template<typename T>
cv::Mat_<T> batch_gradient_descent(cv::Mat_<T> const &features, 
cv::Mat_<T> const &labels, 
T alpha = 0.07, size_t iterate = 1)
{
    cv::Mat_<T> theta = cv::Mat_<T>::zeros(features.cols, 1);
    T const ratio = alpha / features.rows;
    for(size_t i = 0; i != iterate; ++i){
        cv::Mat_<T> const data = ratio * linear_regression
                                        (features, labels, theta);
        theta -= data;
    }

    return theta;
}


  If you can't accept the vectorize equation of the function linear_regression, you could expand it as following.



Source codes can download from github.

Thursday 7 November 2013

machine learning--00 : supervised vs unsupervised

  Machine learning is a very important and practical skills in computer vision, in order to make a good use of the machine learning algorithm provided by openCV,  I decided to study machine learning and record the things I learned from the open classroom of standford.

  There are two major branch of machine learning algorithms, they are supervised and unsupervised learning.

Supervised learning : teach the program what is the correct answers, the input will contain "features" and "labels".

Unsupervised learning : let the program learn by itself, the input contain "features" but no "labels".

Example of Supervised learning

 Height prediction : There are various numbers of boys between ages(features) of two and eights, we want to predict the heights(labels) of different ages.


graph-00 : ages(x-axis) vs height(y-axis) without predict

   On graph-00,  I plot the ages vs height with dot,  the range of ages are analog but not digital(2.54 years old, 3.33 years old and so on), how could we predict the height of age 4.44 which do not shown on this graph? This is the time when machine learning comes in,  we give the program features(in this case, it is ages) and labels(in this case, it is height), use them to deduce an equation which could map arbitrary ages within 2~8 years old, we call this equation as hypothesis function and make it at-least close to the result of the data set we have.


graph-01 : ages(x-axis) vs height(y-axis) with predict(blue line)
   On graph-01, I map the ages to height by the equation deduce by the batch gradient descent, since the ranges of the labels are analog but not digital, this kind of supervised learning also refer to regression.

    The other examples is image recognition, like the simple OCR engine developed by this blog also leverage the power of supervised learning to predict the character of the image.Because the results are discrete in the OCR example, we also refer that kind of supervised learning as classification.

Example of unsupervised learning

    k-means is one of the unsupervised machine learning algorithm, you give it a bunch of data without telling the program which label the data belong to, let the program determine which label of those data belong to, this also called clustering.


 
graph_02 : before k-means clustering


graph_03 : after k-means clustering