diff --git a/Chapters/8-Recurrent-Networks/INDEX.md b/Chapters/8-Recurrent-Networks/INDEX.md index 86ecfb0..9b4e770 100644 --- a/Chapters/8-Recurrent-Networks/INDEX.md +++ b/Chapters/8-Recurrent-Networks/INDEX.md @@ -194,6 +194,30 @@ them `learn` or `forget` chosen pieces of information > With ***chosen*** we intend choosing from the > `hyperspace`, so it's not really precise. +### High Level Scheme + +The point of an `RNN` cell is to be able to modify its internal state. + +As for the image, this can be implemented by having gates (AND operations) to read, write and +keep (remember) pieces of information in the memory. + +![high level cell](./pngs/high-level-cell.png) + +Even though this is a *high level and simplistic architecture*, it gives a rough idea of +how to implement it. + +- First of all, instead of using `AND` gates, we can substitute with an elementwise +multiplication. + +- Secondly we can implement an elementwise addition to take combine a new written element with +a past one + +### Standard RNN Cell + +This is the most simple type of implementation. Here all signals are set to 1 + +![simple rnn cell](./pngs/standard-cell.png) + ### Long Short Term Memory | LSTM[^anelli-RNNs-9][^LSTM-wikipedia] This `cell` has a ***separate signal***, namely the @@ -203,6 +227,28 @@ initialized to `1`***. ![LSTM cell](./pngs/lstm-cell.png) +As for the image, we can identify a `keep (or forget) gate`, `write gate` and `read gate`. + +- **Forget Gate**:\ + The previous `read state` ($h_{t-1}$) concatenated with `input` ($x_{t}$) is what controls + how much of the `previous cell state` keeps being remembered. Since it has values + $\in [0, 1]$, it has been called **forget gate** +- **Input Gate**:\ + It is controlled by a `sigmoid` with the same inputs as the forget gate, but with different + weights. It regulates how much of the `tanh` of same inputs goes into the cell state. + + `tanh` here has an advantage over the `sigmoid` for the value as it admits values + $\in [-1, 1]$ + +- **Output Gate**:\ + It is controlled by a `sigmoid` with the same inputs as the previous gates, but different + weights. It regulates how much of the `tanh` of the `current state cell` goes over the + `output` + +![detailed LSTM cell](./pngs/lstm-cell-detailed.png) + + + > [!NOTE] > > $W$ will be weights associated with $\vec{x}$ and @@ -214,10 +260,6 @@ initialized to `1`***. > $\odot$ is the [Hadamard Product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)), also called the > ***pointwise product*** -![detailed LSTM cell](./pngs/lstm-cell-detailed.png) - - - #### Forget Gate | Keep Gate This `gate` ***controls the `cell-state`***: @@ -289,12 +331,23 @@ the `hidden-state`***, while keeping ![GRU cell](./pngs/gru-cell.png) -> [!NOTE] -> [`GRU`](#gru) doesn't have any `output-gate` and -> $h_0 = 0$ +As for the image, we have only 2 gates: + +- **Reset Gate**:\ + Tells us **how much of the old information should pass with the input**. It is controlled + by the `old state` and the `input`. +- **Update Gate**:\ + Tells us + **how much of the old info will be kept and how much of the new info will be learnt**. + It is controlled by a concatenation of the `output of reset gate` and the `input` passing + from a `tanh`. ![detailed GRU cell](./pngs/gru-cell-detailed.png) +> [!NOTE] +> [`GRU`](#gru) doesn't have any `output-gate` and +> $h_0 = 0$ +> #### Update Gate This `gate` unifies [`forget gate`](#forget-gate--keep-gate) and [`input gate`](#input-gate--write-gate) @@ -344,12 +397,13 @@ $$ ### Bi-LSTM[^anelli-RNNs-12][^Bi-LSTM-stackoverflow] -It is a technique in which we put 2 `LSTM` `networks`, -***one to remember the `past` and one to remember the -`future`***. +We implement 2 networks, one that takes hidden states coming from computing info over in order, +while the other one taking hidden states in reverse order. -This type of `networks` ***improve context -understanding*** +Then we take outputs from both networks and compute attention and other operations, such as +`softmax` and `linear` ones, to get the output. + +In this way we gain info coming from both directions of a sequence. ### Applications[^anelli-RNNs-11]