10、Large Scale Machine Learning

10.1、Gradient Descent with Large Datasets

It’s not who has the best algorithm that wins.
It’s who has the most data.

不像Batch gradient descent每次迭代时都需要将数据集代入计算。

当数据集大的时候我们需要随机shuffle数据集，
然后选取第一个数据进行计算，然后进行梯度下降，再接着使用接下来的数据依次进行此过程。

batch 和 stochastic的结合。每次选一组数据进行计算，然后再接着使用下一组重复此过程。

Learning rate α is tapically held constant. Can slowly descrease α over time if we want θ to converge.

上述方法可以应用在实时在线学习上，数据集大小不固定，以数据流的形式出现。

在进行梯度下降计算时，中间有步骤需要求和，我们可以利用Map-reduce和并行计算来缩短处理时间。