Added images and revised content

2025-10-26 19:39:20 +01:00
parent 294c9a6783
commit 2823aeba76
1 changed files with 66 additions and 12 deletions
--- a/Chapters/8-Recurrent-Networks/INDEX.md
+++ b/Chapters/8-Recurrent-Networks/INDEX.md
@@ -194,6 +194,30 @@ them `learn` or `forget` chosen pieces of information
 > With ***chosen*** we intend choosing from the
 > `hyperspace`, so it's not really precise.

+### High Level Scheme
+
+The point of an `RNN` cell is to be able to modify its internal state.
+
+As for the image, this can be implemented by having gates (AND operations) to read, write and
+keep (remember) pieces of information in the memory.
+
+![high level cell](./pngs/high-level-cell.png)
+
+Even though this is a *high level and simplistic architecture*, it gives a rough idea of
+how to implement it.
+
+- First of all, instead of using `AND` gates, we can substitute with an elementwise
+multiplication.
+
+- Secondly we can implement an elementwise addition to take combine a new written element with
+a past one
+
+### Standard RNN Cell
+
+This is the most simple type of implementation. Here all signals are set to 1
+
+![simple rnn cell](./pngs/standard-cell.png)
+
 ### Long Short Term Memory | LSTM[^anelli-RNNs-9][^LSTM-wikipedia]

 This `cell` has a ***separate signal***, namely the
@@ -203,6 +227,28 @@ initialized to `1`***.

 ![LSTM cell](./pngs/lstm-cell.png)

+As for the image, we can identify a `keep (or forget) gate`, `write gate` and `read gate`.
+
+- **Forget Gate**:\
+    The previous `read state` ($h_{t-1}$) concatenated with `input` ($x_{t}$) is what controls
+    how much of the `previous cell state` keeps being remembered. Since it has values
+    $\in [0, 1]$, it has been called **forget gate**
+- **Input Gate**:\
+    It is controlled by a `sigmoid` with the same inputs as the forget gate, but with different
+    weights. It regulates how much of the `tanh` of same inputs goes into the cell state.
+
+    `tanh` here has an advantage over the `sigmoid` for the value as it admits values
+    $\in [-1, 1]$
+
+- **Output Gate**:\
+    It is controlled by a `sigmoid` with the same inputs as the previous gates, but different
+    weights. It regulates how much of the `tanh` of the `current state cell` goes over the
+    `output`
+
+![detailed LSTM cell](./pngs/lstm-cell-detailed.png)
+
+<!-- TODO: revise formulas -->
+
 > [!NOTE]
 >
 > $W$ will be weights associated with $\vec{x}$ and
@@ -214,10 +260,6 @@ initialized to `1`***.
 > $\odot$ is the [Hadamard Product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)), also called the
 > ***pointwise product***

-![detailed LSTM cell](./pngs/lstm-cell-detailed.png)
-
-<!-- TODO: revice formulas -->
-
 #### Forget Gate | Keep Gate

 This `gate` ***controls the `cell-state`***:
@@ -289,12 +331,23 @@ the `hidden-state`***, while keeping

 ![GRU cell](./pngs/gru-cell.png)

-> [!NOTE]
-> [`GRU`](#gru) doesn't have any `output-gate` and
-> $h_0 = 0$
+As for the image, we have only 2 gates:
+
+- **Reset Gate**:\
+    Tells us **how much of the old information should pass with the input**. It is controlled
+    by the `old state` and the `input`.
+- **Update Gate**:\
+    Tells us
+    **how much of the old info will be kept and how much of the new info will be learnt**.
+    It is controlled by a concatenation of the `output of reset gate` and the `input` passing
+    from a `tanh`.

 ![detailed GRU cell](./pngs/gru-cell-detailed.png)

+> [!NOTE]
+> [`GRU`](#gru) doesn't have any `output-gate` and
+> $h_0 = 0$
+>
 #### Update Gate

 This `gate` unifies [`forget gate`](#forget-gate--keep-gate) and [`input gate`](#input-gate--write-gate)
@@ -344,12 +397,13 @@ $$

 ### Bi-LSTM[^anelli-RNNs-12][^Bi-LSTM-stackoverflow]

-It is a technique in which we put 2 `LSTM` `networks`,
-***one to remember the `past` and one to remember the
-`future`***.
+We implement 2 networks, one that takes hidden states coming from computing info over in order,
+while the other one taking hidden states in reverse order.

-This type of `networks` ***improve context
-understanding***
+Then we take outputs from both networks and compute attention and other operations, such as
+`softmax` and `linear` ones, to get the output.
+
+In this way we gain info coming from both directions of a sequence.

 ### Applications[^anelli-RNNs-11]