Added ADADELTA

2025-04-19 17:30:41 +02:00
parent d5e34ab54b
commit d9fd801bba
1 changed files with 78 additions and 0 deletions
--- a/Chapters/5-Optimization/Fancy-Methods/ADADELTA.md
+++ b/Chapters/5-Optimization/Fancy-Methods/ADADELTA.md
@@ -0,0 +1,78 @@
+# ADADELTA[^adadelta-offcial-paper]
+
+`ADADELTA` was inspired by [`AdaGrad`](./ADAGRAD.md) and
+created to address some problems of it, like
+***sensitivity to initial `parameters` and corresponding
+gradient***[^adadelta-offcial-paper]
+
+## First Formulation
+
+To address all these problems, `ADADELTA` accumulates
+***gradients over a `window`***, though in a
+***exponential decaying averaging way***:
+
+$$
+E[g^2]_t = \alpha \cdot E[g^2]_{t-1} +
+    (1 - \alpha) \cdot g^2_t
+$$
+
+The update, which is very similar to the one in
+[AdaGrad](./ADAGRAD.md#the-algorithm), becomes:
+
+$$
+ \bar{w}_{t+1, i} =
+    \bar{w}_{t, i} - \frac{
+        \eta
+    }{
+        \sqrt{E[g^2]_t + \epsilon}
+    } \cdot g_{t,i}
+$$
+
+Technically speaking, the last equation can be rewritten
+as:
+
+$$
+ \bar{w}_{t+1, i} =
+    \bar{w}_{t, i} - \frac{
+        \eta
+    }{
+        RMS[g]_t
+    } \cdot g_{t,i}
+$$
+
+Though, this is ***still not the actual equation*** as
+it has the `units` ***all over the place***.
+
+## Second Formulation
+
+Technically speaking, this update is ***adimensional***,
+so, as noted by the authors of the
+paper[^adadelta-units], we should correct this problem
+by ***considering the curvature locally smooth*** and
+taking an approximation of it at the next step, by taking
+the value at the previous one, making the full
+update equation:
+
+$$
+ \bar{w}_{t + 1, i} =
+    \bar{w}_{t, i} - \frac{
+        RMS[\bar{w}_{i}]_{t - 1}
+    }{
+        RMS[g]_t
+    } \cdot g_{t,i}
+$$
+
+As we can notice, the ***`learning rate` completely
+disappears from the equation, eliminating the need to
+set one***
+
+> [!NOTE]
+>
+> It can be noticed that [`RMSProp`](./../INDEX.md#rmsprop-in-detail)
+> is basically the [first update](#first-formulation) we derived for this method
+
+<!-- Footnotes -->
+
+[^adadelta-offcial-paper]: [Official ADADELTA Paper | arXiv:1212.5701v1](https://arxiv.org/pdf/1212.5701)
+
+[^adadelta-units]: [Official ADADELTA Paper | Paragraph 3.2 Idea 2: Correct Units with Hessian Approximation | arXiv:1212.5701v1](https://arxiv.org/pdf/1212.5701)