diff --git a/Chapters/5-Optimization/Fancy-Methods/ADADELTA.md b/Chapters/5-Optimization/Fancy-Methods/ADADELTA.md new file mode 100644 index 0000000..73949a1 --- /dev/null +++ b/Chapters/5-Optimization/Fancy-Methods/ADADELTA.md @@ -0,0 +1,78 @@ +# ADADELTA[^adadelta-offcial-paper] + +`ADADELTA` was inspired by [`AdaGrad`](./ADAGRAD.md) and +created to address some problems of it, like +***sensitivity to initial `parameters` and corresponding +gradient***[^adadelta-offcial-paper] + +## First Formulation + +To address all these problems, `ADADELTA` accumulates +***gradients over a `window`***, though in a +***exponential decaying averaging way***: + +$$ +E[g^2]_t = \alpha \cdot E[g^2]_{t-1} + + (1 - \alpha) \cdot g^2_t +$$ + +The update, which is very similar to the one in +[AdaGrad](./ADAGRAD.md#the-algorithm), becomes: + +$$ + \bar{w}_{t+1, i} = + \bar{w}_{t, i} - \frac{ + \eta + }{ + \sqrt{E[g^2]_t + \epsilon} + } \cdot g_{t,i} +$$ + +Technically speaking, the last equation can be rewritten +as: + +$$ + \bar{w}_{t+1, i} = + \bar{w}_{t, i} - \frac{ + \eta + }{ + RMS[g]_t + } \cdot g_{t,i} +$$ + +Though, this is ***still not the actual equation*** as +it has the `units` ***all over the place***. + +## Second Formulation + +Technically speaking, this update is ***adimensional***, +so, as noted by the authors of the +paper[^adadelta-units], we should correct this problem +by ***considering the curvature locally smooth*** and +taking an approximation of it at the next step, by taking +the value at the previous one, making the full +update equation: + +$$ + \bar{w}_{t + 1, i} = + \bar{w}_{t, i} - \frac{ + RMS[\bar{w}_{i}]_{t - 1} + }{ + RMS[g]_t + } \cdot g_{t,i} +$$ + +As we can notice, the ***`learning rate` completely +disappears from the equation, eliminating the need to +set one*** + +> [!NOTE] +> +> It can be noticed that [`RMSProp`](./../INDEX.md#rmsprop-in-detail) +> is basically the [first update](#first-formulation) we derived for this method + + + +[^adadelta-offcial-paper]: [Official ADADELTA Paper | arXiv:1212.5701v1](https://arxiv.org/pdf/1212.5701) + +[^adadelta-units]: [Official ADADELTA Paper | Paragraph 3.2 Idea 2: Correct Units with Hessian Approximation | arXiv:1212.5701v1](https://arxiv.org/pdf/1212.5701)