7、Support Vector Machines
7.1 Optimization objective
7.2 Large Margin Intuition
C如果越大,两个分类的间距越小。
7.3 The Mathematics behind large margin classification
先讲解了||u||符号的含义,可以利用向量的长度来计算矩阵的乘积.
SVM Decision Boundary 计算时可以利用这个特性。
7.4 Kernels
7.4.1 Kernels I
Gaussian Kernel.
7.5 Kernels II
SVM parameters:
Large C: Lower bias, high variance
Small C: High bias, low variance
Large segma*segma: more smoothly. High bias, low variance.
Small segma*segma: less smoothly. Lower bias, higher variance.
7.6 Using an SVM
Use SVM software package(liblinear,libsvm…)to solve for parameters theta.
Need to specify:
Choice of parameter C.
Choice of Kernel(similarity function):
No Kernel('Linear Kernel')
Gaussian Kernel.(need to choose segma)
Note:
Do not perform feature scaling before using the Gaussian kernel.
Other Choice of kernel:
Not all similarity functions similarity(x,l) make valid kernels.
(Need to satisfy 'mercer's Theorem' to make sure SVM package's optimications run correctly, and do not diverge)
Multi-class classification:
Train K SVMs, 需要对K个类进行分类,而非k-1。
Logistic regression vs SVM:
n= number of features, m =number of training examples
if n is large(相对于m):
using Logistic regression, or SVM without kernel.
if n is small, m is intermediate:
use SVM with Gaussian Kernel.
if n is small, m is large:
create/add more features,
then use logistic regression or SVM without kernel.
Neural network likely to work well for most od these settings,
but many slower to train.