杂七杂八的复习笔记

梯度更新公式,沿着负梯度方向变化寻找minimum
\theta_j = \theta_j - \alpha \frac{\partial J(\theta)}{\partial(\theta_j)} -> \frac{\partial J(\theta)}{\partial (\theta_j)} = [\frac{1}{2m}\sum (h_{\theta}(x^{(i)}) - y^{(i)})^2]'
= [\frac{1}{2m}\sum a^2]'a' = [\frac{1}{m}\sum a]'a'
=\frac{1}{m}\sum (h_{\theta}(x^{(i)})-y^{(i)})a' = \frac{1}{m} \sum (h_{\theta}(x^{(i)})-y^{(i)})[(\theta_0+\theta_1x_1+\theta_2x_2...\theta_nx_n-y^{(i)})]'
=\frac{1}{m}\sum (h_{\theta}(x^{(i)})-y^{(i)}) x_j^{(i)}
\therefore \theta_j = \theta_j - \alpha \frac{1}{m}\sum (h_{\theta}(x^{(i)})-y^{(i)}) x_j^{(i)}
\alpha 满足 0 \lt \alpha \lt \frac{2}{tr(R)} \lt \frac{2}{\lambda_{max}} ,LSE(lest square error)的J必然收敛,还有有一点注意,SGD是保证收敛的,R为X,也就是input的自相关矩阵。tr(R)是矩阵R的迹
传统的ml,数据不是很大的时候,一般就是按照train set : test set为7:3,或者train set : test set : cross validation set为6:2:2
常用loss function
cross entropy loss
mean square error(MSE)