diff --git a/Chapters/8-Recurrent-Networks/INDEX.md b/Chapters/8-Recurrent-Networks/INDEX.md
index 86ecfb0..9b4e770 100644
--- a/Chapters/8-Recurrent-Networks/INDEX.md
+++ b/Chapters/8-Recurrent-Networks/INDEX.md
@@ -194,6 +194,30 @@ them `learn` or `forget` chosen pieces of information
 > With ***chosen*** we intend choosing from the
 > `hyperspace`, so it's not really precise.
 
+### High Level Scheme
+
+The point of an `RNN` cell is to be able to modify its internal state.
+
+As for the image, this can be implemented by having gates (AND operations) to read, write and
+keep (remember) pieces of information in the memory.
+
+![high level cell](./pngs/high-level-cell.png)
+
+Even though this is a *high level and simplistic architecture*, it gives a rough idea of
+how to implement it.
+
+- First of all, instead of using `AND` gates, we can substitute with an elementwise
+multiplication.
+
+- Secondly we can implement an elementwise addition to take combine a new written element with
+a past one
+
+### Standard RNN Cell
+
+This is the most simple type of implementation. Here all signals are set to 1
+
+![simple rnn cell](./pngs/standard-cell.png)
+
 ### Long Short Term Memory | LSTM[^anelli-RNNs-9][^LSTM-wikipedia]
 
 This `cell` has a ***separate signal***, namely the
@@ -203,6 +227,28 @@ initialized to `1`***.
 
 ![LSTM cell](./pngs/lstm-cell.png)
 
+As for the image, we can identify a `keep (or forget) gate`, `write gate` and `read gate`.
+
+- **Forget Gate**:\
+    The previous `read state` ($h_{t-1}$) concatenated with `input` ($x_{t}$) is what controls
+    how much of the `previous cell state` keeps being remembered. Since it has values
+    $\in [0, 1]$, it has been called **forget gate**
+- **Input Gate**:\
+    It is controlled by a `sigmoid` with the same inputs as the forget gate, but with different
+    weights. It regulates how much of the `tanh` of same inputs goes into the cell state.
+
+    `tanh` here has an advantage over the `sigmoid` for the value as it admits values
+    $\in [-1, 1]$
+
+- **Output Gate**:\
+    It is controlled by a `sigmoid` with the same inputs as the previous gates, but different
+    weights. It regulates how much of the `tanh` of the `current state cell` goes over the
+    `output`
+
+![detailed LSTM cell](./pngs/lstm-cell-detailed.png)
+
+<!-- TODO: revise formulas -->
+
 > [!NOTE]
 >
 > $W$ will be weights associated with $\vec{x}$ and
@@ -214,10 +260,6 @@ initialized to `1`***.
 > $\odot$ is the [Hadamard Product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)), also called the
 > ***pointwise product***
 
-![detailed LSTM cell](./pngs/lstm-cell-detailed.png)
-
-<!-- TODO: revice formulas -->
-
 #### Forget Gate | Keep Gate
 
 This `gate` ***controls the `cell-state`***:
@@ -289,12 +331,23 @@ the `hidden-state`***, while keeping
 
 ![GRU cell](./pngs/gru-cell.png)
 
-> [!NOTE]
-> [`GRU`](#gru) doesn't have any `output-gate` and
-> $h_0 = 0$
+As for the image, we have only 2 gates:
+
+- **Reset Gate**:\
+    Tells us **how much of the old information should pass with the input**. It is controlled
+    by the `old state` and the `input`.
+- **Update Gate**:\
+    Tells us
+    **how much of the old info will be kept and how much of the new info will be learnt**.
+    It is controlled by a concatenation of the `output of reset gate` and the `input` passing
+    from a `tanh`.
 
 ![detailed GRU cell](./pngs/gru-cell-detailed.png)
 
+> [!NOTE]
+> [`GRU`](#gru) doesn't have any `output-gate` and
+> $h_0 = 0$
+>
 #### Update Gate
 
 This `gate` unifies [`forget gate`](#forget-gate--keep-gate) and [`input gate`](#input-gate--write-gate)
@@ -344,12 +397,13 @@ $$
 
 ### Bi-LSTM[^anelli-RNNs-12][^Bi-LSTM-stackoverflow]
 
-It is a technique in which we put 2 `LSTM` `networks`,
-***one to remember the `past` and one to remember the
-`future`***.
+We implement 2 networks, one that takes hidden states coming from computing info over in order,
+while the other one taking hidden states in reverse order.
 
-This type of `networks` ***improve context
-understanding***
+Then we take outputs from both networks and compute attention and other operations, such as
+`softmax` and `linear` ones, to get the output.
+
+In this way we gain info coming from both directions of a sequence.
 
 ### Applications[^anelli-RNNs-11]