79 lines
2.0 KiB
Markdown
79 lines
2.0 KiB
Markdown
# ADADELTA[^adadelta-offcial-paper]
|
|
|
|
`ADADELTA` was inspired by [`AdaGrad`](./ADAGRAD.md) and
|
|
created to address some problems of it, like
|
|
***sensitivity to initial `parameters` and corresponding
|
|
gradient***[^adadelta-offcial-paper]
|
|
|
|
## First Formulation
|
|
|
|
To address all these problems, `ADADELTA` accumulates
|
|
***gradients over a `window`***, though in a
|
|
***exponential decaying averaging way***:
|
|
|
|
$$
|
|
E[g^2]_t = \alpha \cdot E[g^2]_{t-1} +
|
|
(1 - \alpha) \cdot g^2_t
|
|
$$
|
|
|
|
The update, which is very similar to the one in
|
|
[AdaGrad](./ADAGRAD.md#the-algorithm), becomes:
|
|
|
|
$$
|
|
\bar{w}_{t+1, i} =
|
|
\bar{w}_{t, i} - \frac{
|
|
\eta
|
|
}{
|
|
\sqrt{E[g^2]_t + \epsilon}
|
|
} \cdot g_{t,i}
|
|
$$
|
|
|
|
Technically speaking, the last equation can be rewritten
|
|
as:
|
|
|
|
$$
|
|
\bar{w}_{t+1, i} =
|
|
\bar{w}_{t, i} - \frac{
|
|
\eta
|
|
}{
|
|
RMS[g]_t
|
|
} \cdot g_{t,i}
|
|
$$
|
|
|
|
Though, this is ***still not the actual equation*** as
|
|
it has the `units` ***all over the place***.
|
|
|
|
## Second Formulation
|
|
|
|
Technically speaking, this update is ***adimensional***,
|
|
so, as noted by the authors of the
|
|
paper[^adadelta-units], we should correct this problem
|
|
by ***considering the curvature locally smooth*** and
|
|
taking an approximation of it at the next step, by taking
|
|
the value at the previous one, making the full
|
|
update equation:
|
|
|
|
$$
|
|
\bar{w}_{t + 1, i} =
|
|
\bar{w}_{t, i} - \frac{
|
|
RMS[\bar{w}_{i}]_{t - 1}
|
|
}{
|
|
RMS[g]_t
|
|
} \cdot g_{t,i}
|
|
$$
|
|
|
|
As we can notice, the ***`learning rate` completely
|
|
disappears from the equation, eliminating the need to
|
|
set one***
|
|
|
|
> [!NOTE]
|
|
>
|
|
> It can be noticed that [`RMSProp`](./../INDEX.md#rmsprop-in-detail)
|
|
> is basically the [first update](#first-formulation) we derived for this method
|
|
|
|
<!-- Footnotes -->
|
|
|
|
[^adadelta-offcial-paper]: [Official ADADELTA Paper | arXiv:1212.5701v1](https://arxiv.org/pdf/1212.5701)
|
|
|
|
[^adadelta-units]: [Official ADADELTA Paper | Paragraph 3.2 Idea 2: Correct Units with Hessian Approximation | arXiv:1212.5701v1](https://arxiv.org/pdf/1212.5701)
|