gig_i代表当前步骤的梯度F(x)x=xi\nabla F(x)|_{x=x_i}αi\alpha_i代表当前的学习率,
AiA_i代表当前的HessianHessian矩阵(2F(x)x=xi\nabla^2F(x)|_{x=x_i}

共轭向量法

p0=F(x)x=x0α0=g0Tp0p0TA0p0x1=x0+α0p0\begin{aligned} &p_0=-\nabla F(x)|_{x=x_0}\\ &\alpha_0=\frac{-g_0^Tp_0}{p_0^TA_0p_0}\\ &x_1=x_0+\alpha_0p_0 \end{aligned}

while True: (其中1代表后一步的)

β1=g1Tg1g0Tg0p1=g1+β1p0α1=g1Tp1p1TA1p1x2=x1+α1p1\begin{aligned} &\beta_1=\frac{g_1^Tg_1}{g_0^Tg_0}\\ &p_1=-g_1+\beta_1p_0\\ &\alpha_1=\frac{-g_1^Tp_1}{p_1^TA_1p_1}\\ &x_2=x_1+\alpha_1p_1 \end{aligned}

最速下降法

xk+1=xkαkgkx_{k+1}=x_k-\alpha_kg_k

最大稳定学习率(二次型)

α<2λmax\alpha<\frac{2}{\lambda_{\max}}

λmax\lambda_{\max}AA的最大特征值

沿直线最速下降算法

while True: (其中1代表后一步的)

p0=F(x)x=x0α0=g0Tp0p0TA0p0x1=x0+α0p0\begin{aligned} &p_0=-\nabla F(x)|_{x=x_0}\\ &\alpha_0=\frac{-g_0^Tp_0}{p_0^TA_0p_0}\\ &x_1=x_0+\alpha_0p_0 \end{aligned}

牛顿法

xk+1=xkAk1gkx_{k+1}=x_k-A_k^{-1}g_k