课程内容简介
课程主要介绍机器学习、数据挖掘和统计模式识别。相关主题包括:
i) 监督式学习(参数和非参数算法、支持向量机、核函数、神经网络)
ii)无监督学习(集群、降维、推荐系统、深度学习)
iii) 机器学习实例(偏见/方差理论、机器学习和AI领域的创新)
课程学习
第二周
1、环境搭建
2、Multivariate Linear Regression,多元线性回归。
Feature Scaling:
Make sure features are on a similar scale.
为什么使用feature scaling?
It speeds up gradient by making it require fewer
iterations to get to
(Get every feature into approximately a -1<= xi <=1)
(Mean normalization: replace xi with xi-ui to make
features have approximately zero mean.)
如何保证梯度下降正确完成?
Making sure gradient descent is working correctly.
如果随着迭代次数的增加J(θ)反而越来越大,应该用更小的α。
If α is too small: slow convergence.
If α is too large: J(θ) may be not decrease on
every interation, may not converge.
3、Computing Parameters Analytically
正态方程
Octave: pinv(X’ X) X’ * Y (X’ 即XT)
什么时候用梯度下降?什么时候用正态方程?
m training examples, n features.
Gradient Descent
* Need to choose α.
* Need many iterations.
* Work well even when n is large.
Normal Equation
* No need to choose α.
* Don't need to iterate.
* Need to cumpute (XTX)-1. O(n3)
* Slow if n is very large.
如果XTX是不可逆的怎么办?
Octave 有两种计算矩阵逆的函数——pinv、inv
如果使用pinv计算的话都会得到实际的结果,不管矩阵是否可逆。
* Redundant features(linearly dependent).
* Too many features(eg. m <= n)
delete some features,or use regularization.