Added Hessian Free and Conjugate method
This commit is contained in:
parent
26e0f11b42
commit
8d8266059b
@ -411,10 +411,78 @@ $0$ we don't necessarily know where we are.
|
||||
> [!NOTE]
|
||||
> [Here in detail](./Fancy-Methods/LION.md)
|
||||
|
||||
### Hessian Free
|
||||
### Hessian Free[^anelli-hessian-free]
|
||||
|
||||
How much can we `learn` from a given
|
||||
`Loss` space?
|
||||
|
||||
The ***best way to move*** would be along the
|
||||
***gradient***, assuming it has
|
||||
the ***same curvature***
|
||||
(e.g. It's and has a local minimum).
|
||||
|
||||
But ***usually this is not the case***, so we need
|
||||
to move ***where the ratio of gradient and curvature is
|
||||
high***
|
||||
|
||||
#### Newton's Method
|
||||
|
||||
This method takes into account the ***curvature***
|
||||
of the `Loss`
|
||||
|
||||
With this method, the update would be:
|
||||
|
||||
$$
|
||||
\Delta\vec{w} = - \epsilon H(\vec{w})^{-1} \cdot \frac{
|
||||
d \, E
|
||||
}{
|
||||
d \, \vec{w}
|
||||
}
|
||||
$$
|
||||
|
||||
***If this could be feasible we'll go on the minimum in
|
||||
one step***, but it's not, as the
|
||||
***computations***
|
||||
needed to get a `Hessian` ***increase exponentially***.
|
||||
|
||||
The thing is that whenever we ***update `weights`*** with
|
||||
the `Steepest Descent` method, each update *messes up*
|
||||
another, while the ***curvature*** can help to ***scale
|
||||
these updates*** so that they do not disturb each other.
|
||||
|
||||
#### Curvature Approximations
|
||||
|
||||
However, since the `Hessian` is
|
||||
***too expensive to compute***, we can approximate it.
|
||||
|
||||
- We can take only the ***diagonal elements***
|
||||
- ***Other algorithms*** (e.g. Hessian Free)
|
||||
- ***Conjugate Gradient*** to minimize the
|
||||
***approximation error***
|
||||
|
||||
#### Conjugate Gradient[^conjugate-wikipedia][^anelli-conjugate-gradient]
|
||||
|
||||
> [!CAUTION]
|
||||
>
|
||||
> This is an oversemplification of the topic, so reading
|
||||
> the footnotes material is greatly advised.
|
||||
|
||||
The basic idea is that, in order not to mess up previous
|
||||
directions, we ***`optimize` along perpendicular directions***.
|
||||
|
||||
This method is ***guaranteed to mathematically succeed
|
||||
after N steps, the dimension of the space***, in practice
|
||||
the error will be minimal.
|
||||
|
||||
This ***method works well for `non-quadratic errors`***
|
||||
and the `Hessian Free` `optimizer` uses this method
|
||||
on ***genuinely quadratic surfaces***, which are
|
||||
***quadratic approximations of the real surface***
|
||||
|
||||
|
||||
<!-- TODO: Add PDF 5 pg. 38 -->
|
||||
|
||||
<!-- Footnotes -->
|
||||
|
||||
[^momentum]: [Distill Pub | 18th April 2025](https://distill.pub/2017/momentum/)
|
||||
|
||||
@ -425,3 +493,9 @@ $0$ we don't necessarily know where we are.
|
||||
[^rprop-torch]: [Rprop | Official PyTorch Documentation | 19th April 2025](https://pytorch.org/docs/stable/generated/torch.optim.Rprop.html)
|
||||
|
||||
[^rmsprop-torch]: [RMSprop | Official PyTorch Documentation | 19th April 2025](https://pytorch.org/docs/stable/generated/torch.optim.RMSprop.html)
|
||||
|
||||
[^anelli-hessian-free]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 5 pg. 67-81
|
||||
|
||||
[^conjugate-wikipedia]: [Conjugate Gradient Method | Wikipedia | 20th April 2025](https://en.wikipedia.org/wiki/Conjugate_gradient_method)
|
||||
|
||||
[^anelli-conjugate-gradient]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 5 pg. 74-76
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user