49 lines
1.7 KiB
Markdown
49 lines
1.7 KiB
Markdown
# Lion (evoLved sIgn mOmeNtum)[^official-paper]
|
|
|
|
`Lion` is a ***genetic search algorithm*** aimed to
|
|
find the best `optimizer`.
|
|
|
|
It starts from a population of `AdamW` algorithms to
|
|
***speed up the search***. Opposed to
|
|
`Adam` and `AdamW`, it keeps track
|
|
***only for the momentum*** and ***gradient sign***,
|
|
requiring ***less `memory`***.
|
|
|
|
Since ***uniform updates yields larger norms***,
|
|
`Lion` requires a ***smaller `learning-rate`***
|
|
and a ***larger decoupled `weight-decay`***
|
|
$\lambda$[^official-paper-1].
|
|
|
|
The ***advantages of `Lion` over `Adam` and `AdamW`
|
|
increase with the size of
|
|
the `mini-batch`***[^official-paper-1]
|
|
|
|
## Symbolic Representation[^official-paper-2]
|
|
|
|
New ***trained algorithms*** are represented
|
|
`simbolically`, bringing these advantages:
|
|
|
|
- `Algorithms` must be ***implemented*** as `programs`
|
|
- It ***easier to analyze, comprehend and transfer to
|
|
new task*** these `algorithms`, rather than other
|
|
`algorithms` such as `NeuralNetworks`
|
|
- We can **estimate the *complexity*** by looking
|
|
at the ***length of code***
|
|
|
|
## Tournament[^official-paper-3]
|
|
|
|
The best code is found with a ***tournament style
|
|
evolution***. Each cycle it picks the ***best
|
|
`algorithm`*** which will be
|
|
***copied and mutated*** and the ***oldest is removed***
|
|
|
|
<!-- Footnotes -->
|
|
|
|
[^official-paper]: [Official Lion Paper | arXiv:2302.06675v4](https://arxiv.org/pdf/2302.06675)
|
|
|
|
[^official-paper-1]: [Official Lion Paper| Paragraph 1 pg. 3 | arXiv:2302.06675v4](https://arxiv.org/pdf/2302.06675)
|
|
|
|
[^official-paper-2]: [Official Lion Paper| Paragraph 1 pg. 3 | arXiv:2302.06675v4](https://arxiv.org/pdf/2302.06675)
|
|
|
|
[^official-paper-3]: [Official Lion Paper| Paragraph 2 pg. 4-5 | arXiv:2302.06675v4](https://arxiv.org/pdf/2302.06675)
|