diff --git a/Chapters/5-Optimization/Fancy-Methods/ADAM.md b/Chapters/5-Optimization/Fancy-Methods/ADAM.md index a3fdf99..60684c2 100644 --- a/Chapters/5-Optimization/Fancy-Methods/ADAM.md +++ b/Chapters/5-Optimization/Fancy-Methods/ADAM.md @@ -69,7 +69,7 @@ Usual values are these: - We need to ***tune 2 momentum parameters instead of 1*** -## Notes[^anelli-adam-3] +## Notes[^anelli-adam-3][^adamw-notes] - Whenever the ***gradient*** is constant, the `local gain` is 1, as @@ -97,3 +97,5 @@ Usual values are these: [^anelli-adam-2]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 5 pg. 54 [^anelli-adam-3]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 5 pg. 57-58 + +[^adamw-notes]: [AdamW Notes](./ADAM-W.md)