AI入门之——Andrew Ng “Machine Learning”课程学习笔记第七周

7、Support Vector Machines

7.1 Optimization objective

7.2 Large Margin Intuition

C如果越大，两个分类的间距越小。

7.3 The Mathematics behind large margin classification

先讲解了||u||符号的含义,可以利用向量的长度来计算矩阵的乘积.

SVM Decision Boundary 计算时可以利用这个特性。

7.4 Kernels

7.4.1 Kernels I

Gaussian Kernel.

7.5 Kernels II

SVM parameters:
Large C: Lower bias, high variance

Small C: High bias, low variance

Large segma*segma: more smoothly. High bias, low variance.

Small segma*segma: less smoothly. Lower bias, higher variance.

7.6 Using an SVM

Use SVM software package(liblinear,libsvm…)to solve for parameters theta.

Need to specify:

Choice of parameter C.
Choice of Kernel(similarity function):
    No Kernel('Linear Kernel')
    Gaussian Kernel.(need to choose segma)

Note:

Do not perform feature scaling before using the Gaussian kernel.

Other Choice of kernel:

Not all similarity functions similarity(x,l) make valid kernels. 
(Need to satisfy 'mercer's Theorem' to make sure SVM package's optimications run correctly, and do not diverge)

Multi-class classification:

Train K SVMs, 需要对K个类进行分类，而非k-1。

Logistic regression vs SVM:

n= number of features, m =number of training examples

if n is large（相对于m）:
    using Logistic regression, or SVM without kernel.
if n is small, m is intermediate:
    use SVM with Gaussian Kernel.
if n is small, m is large:
    create/add more features,
    then use logistic regression or SVM without kernel.

Neural network likely to work well for most od these settings,
but many slower to train.