diff --git a/Chapters/5-Optimization/INDEX.md b/Chapters/5-Optimization/INDEX.md index 97b8aa1..90d9542 100644 --- a/Chapters/5-Optimization/INDEX.md +++ b/Chapters/5-Optimization/INDEX.md @@ -281,12 +281,12 @@ small value, usually in the order of $10^{-8}$ > instead of a vector. To make it easier to understand in matricial notation: > > $$ -> \begin{aligned} -> \nabla L^{(k + 1)} &= \frac{d \, Loss^{(k)}}{d \, W^{(k)}} \\ -> G^{(k + 1)} &= G^{(k)} +(\nabla L^{(k+1)}) ^2 \\ -> W^{(k+1)} &= W^{(k)} - \eta \frac{\nabla L^{(k + 1)}} + \begin{aligned} + \nabla L^{(k + 1)} &= \frac{d \, Loss^{(k)}}{d \, W^{(k)}} \\ + G^{(k + 1)} &= G^{(k)} +(\nabla L^{(k+1)}) ^2 \\ + W^{(k+1)} &= W^{(k)} - \eta \frac{\nabla L^{(k + 1)}} {\sqrt{G^{(k+1)} + \epsilon}} -> \end{aligned} + \end{aligned} > $$ > > In other words, compute the gradient and scale it for the sum of its squares