Added images and revised content

This commit is contained in:
Christian Risi 2025-10-26 19:39:20 +01:00
parent 294c9a6783
commit 2823aeba76

View File

@ -194,6 +194,30 @@ them `learn` or `forget` chosen pieces of information
> With ***chosen*** we intend choosing from the
> `hyperspace`, so it's not really precise.
### High Level Scheme
The point of an `RNN` cell is to be able to modify its internal state.
As for the image, this can be implemented by having gates (AND operations) to read, write and
keep (remember) pieces of information in the memory.
![high level cell](./pngs/high-level-cell.png)
Even though this is a *high level and simplistic architecture*, it gives a rough idea of
how to implement it.
- First of all, instead of using `AND` gates, we can substitute with an elementwise
multiplication.
- Secondly we can implement an elementwise addition to take combine a new written element with
a past one
### Standard RNN Cell
This is the most simple type of implementation. Here all signals are set to 1
![simple rnn cell](./pngs/standard-cell.png)
### Long Short Term Memory | LSTM[^anelli-RNNs-9][^LSTM-wikipedia]
This `cell` has a ***separate signal***, namely the
@ -203,6 +227,28 @@ initialized to `1`***.
![LSTM cell](./pngs/lstm-cell.png)
As for the image, we can identify a `keep (or forget) gate`, `write gate` and `read gate`.
- **Forget Gate**:\
The previous `read state` ($h_{t-1}$) concatenated with `input` ($x_{t}$) is what controls
how much of the `previous cell state` keeps being remembered. Since it has values
$\in [0, 1]$, it has been called **forget gate**
- **Input Gate**:\
It is controlled by a `sigmoid` with the same inputs as the forget gate, but with different
weights. It regulates how much of the `tanh` of same inputs goes into the cell state.
`tanh` here has an advantage over the `sigmoid` for the value as it admits values
$\in [-1, 1]$
- **Output Gate**:\
It is controlled by a `sigmoid` with the same inputs as the previous gates, but different
weights. It regulates how much of the `tanh` of the `current state cell` goes over the
`output`
![detailed LSTM cell](./pngs/lstm-cell-detailed.png)
<!-- TODO: revise formulas -->
> [!NOTE]
>
> $W$ will be weights associated with $\vec{x}$ and
@ -214,10 +260,6 @@ initialized to `1`***.
> $\odot$ is the [Hadamard Product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)), also called the
> ***pointwise product***
![detailed LSTM cell](./pngs/lstm-cell-detailed.png)
<!-- TODO: revice formulas -->
#### Forget Gate | Keep Gate
This `gate` ***controls the `cell-state`***:
@ -289,12 +331,23 @@ the `hidden-state`***, while keeping
![GRU cell](./pngs/gru-cell.png)
> [!NOTE]
> [`GRU`](#gru) doesn't have any `output-gate` and
> $h_0 = 0$
As for the image, we have only 2 gates:
- **Reset Gate**:\
Tells us **how much of the old information should pass with the input**. It is controlled
by the `old state` and the `input`.
- **Update Gate**:\
Tells us
**how much of the old info will be kept and how much of the new info will be learnt**.
It is controlled by a concatenation of the `output of reset gate` and the `input` passing
from a `tanh`.
![detailed GRU cell](./pngs/gru-cell-detailed.png)
> [!NOTE]
> [`GRU`](#gru) doesn't have any `output-gate` and
> $h_0 = 0$
>
#### Update Gate
This `gate` unifies [`forget gate`](#forget-gate--keep-gate) and [`input gate`](#input-gate--write-gate)
@ -344,12 +397,13 @@ $$
### Bi-LSTM[^anelli-RNNs-12][^Bi-LSTM-stackoverflow]
It is a technique in which we put 2 `LSTM` `networks`,
***one to remember the `past` and one to remember the
`future`***.
We implement 2 networks, one that takes hidden states coming from computing info over in order,
while the other one taking hidden states in reverse order.
This type of `networks` ***improve context
understanding***
Then we take outputs from both networks and compute attention and other operations, such as
`softmax` and `linear` ones, to get the output.
In this way we gain info coming from both directions of a sequence.
### Applications[^anelli-RNNs-11]