AI入门之——Andrew Ng “Machine Learning”课程学习笔记第二周

课程内容简介

课程主要介绍机器学习、数据挖掘和统计模式识别。相关主题包括：
i) 监督式学习（参数和非参数算法、支持向量机、核函数、神经网络）
ii）无监督学习（集群、降维、推荐系统、深度学习）
iii) 机器学习实例（偏见／方差理论、机器学习和AI领域的创新）

课程学习

第二周

1、环境搭建

2、Multivariate Linear Regression，多元线性回归。

Feature Scaling:
Make sure features are on a similar scale.

为什么使用feature scaling？
It speeds up gradient by making it require fewer
iterations to get to 


(Get every feature into approximately a -1<= xi <=1)
(Mean normalization: replace xi with xi-ui to make
 features have approximately zero mean.)

如何保证梯度下降正确完成？

Making sure gradient descent is working correctly.
如果随着迭代次数的增加J(θ)反而越来越大，应该用更小的α。
If α is too small: slow convergence.
If α is too large: J(θ) may be not decrease on
            every interation, may not converge.

3、Computing Parameters Analytically

正态方程

Octave: pinv(X’ X) X’ * Y (X’ 即XT)

什么时候用梯度下降？什么时候用正态方程？

m training examples, n features.

Gradient Descent
* Need to choose α.
* Need many iterations.
* Work well even when n is large.

Normal Equation
* No need to choose α.
* Don't need to iterate.
* Need to cumpute (XTX)-1.   O(n3) 
* Slow if n is very large.

如果XTX是不可逆的怎么办？

Octave 有两种计算矩阵逆的函数——pinv、inv
如果使用pinv计算的话都会得到实际的结果，不管矩阵是否可逆。

* Redundant features(linearly dependent).
* Too many features(eg. m <= n)
    delete some features,or use regularization.