Added images and revised content
This commit is contained in:
parent
294c9a6783
commit
2823aeba76
@ -194,6 +194,30 @@ them `learn` or `forget` chosen pieces of information
|
||||
> With ***chosen*** we intend choosing from the
|
||||
> `hyperspace`, so it's not really precise.
|
||||
|
||||
### High Level Scheme
|
||||
|
||||
The point of an `RNN` cell is to be able to modify its internal state.
|
||||
|
||||
As for the image, this can be implemented by having gates (AND operations) to read, write and
|
||||
keep (remember) pieces of information in the memory.
|
||||
|
||||

|
||||
|
||||
Even though this is a *high level and simplistic architecture*, it gives a rough idea of
|
||||
how to implement it.
|
||||
|
||||
- First of all, instead of using `AND` gates, we can substitute with an elementwise
|
||||
multiplication.
|
||||
|
||||
- Secondly we can implement an elementwise addition to take combine a new written element with
|
||||
a past one
|
||||
|
||||
### Standard RNN Cell
|
||||
|
||||
This is the most simple type of implementation. Here all signals are set to 1
|
||||
|
||||

|
||||
|
||||
### Long Short Term Memory | LSTM[^anelli-RNNs-9][^LSTM-wikipedia]
|
||||
|
||||
This `cell` has a ***separate signal***, namely the
|
||||
@ -203,6 +227,28 @@ initialized to `1`***.
|
||||
|
||||

|
||||
|
||||
As for the image, we can identify a `keep (or forget) gate`, `write gate` and `read gate`.
|
||||
|
||||
- **Forget Gate**:\
|
||||
The previous `read state` ($h_{t-1}$) concatenated with `input` ($x_{t}$) is what controls
|
||||
how much of the `previous cell state` keeps being remembered. Since it has values
|
||||
$\in [0, 1]$, it has been called **forget gate**
|
||||
- **Input Gate**:\
|
||||
It is controlled by a `sigmoid` with the same inputs as the forget gate, but with different
|
||||
weights. It regulates how much of the `tanh` of same inputs goes into the cell state.
|
||||
|
||||
`tanh` here has an advantage over the `sigmoid` for the value as it admits values
|
||||
$\in [-1, 1]$
|
||||
|
||||
- **Output Gate**:\
|
||||
It is controlled by a `sigmoid` with the same inputs as the previous gates, but different
|
||||
weights. It regulates how much of the `tanh` of the `current state cell` goes over the
|
||||
`output`
|
||||
|
||||

|
||||
|
||||
<!-- TODO: revise formulas -->
|
||||
|
||||
> [!NOTE]
|
||||
>
|
||||
> $W$ will be weights associated with $\vec{x}$ and
|
||||
@ -214,10 +260,6 @@ initialized to `1`***.
|
||||
> $\odot$ is the [Hadamard Product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)), also called the
|
||||
> ***pointwise product***
|
||||
|
||||

|
||||
|
||||
<!-- TODO: revice formulas -->
|
||||
|
||||
#### Forget Gate | Keep Gate
|
||||
|
||||
This `gate` ***controls the `cell-state`***:
|
||||
@ -289,12 +331,23 @@ the `hidden-state`***, while keeping
|
||||
|
||||

|
||||
|
||||
> [!NOTE]
|
||||
> [`GRU`](#gru) doesn't have any `output-gate` and
|
||||
> $h_0 = 0$
|
||||
As for the image, we have only 2 gates:
|
||||
|
||||
- **Reset Gate**:\
|
||||
Tells us **how much of the old information should pass with the input**. It is controlled
|
||||
by the `old state` and the `input`.
|
||||
- **Update Gate**:\
|
||||
Tells us
|
||||
**how much of the old info will be kept and how much of the new info will be learnt**.
|
||||
It is controlled by a concatenation of the `output of reset gate` and the `input` passing
|
||||
from a `tanh`.
|
||||
|
||||

|
||||
|
||||
> [!NOTE]
|
||||
> [`GRU`](#gru) doesn't have any `output-gate` and
|
||||
> $h_0 = 0$
|
||||
>
|
||||
#### Update Gate
|
||||
|
||||
This `gate` unifies [`forget gate`](#forget-gate--keep-gate) and [`input gate`](#input-gate--write-gate)
|
||||
@ -344,12 +397,13 @@ $$
|
||||
|
||||
### Bi-LSTM[^anelli-RNNs-12][^Bi-LSTM-stackoverflow]
|
||||
|
||||
It is a technique in which we put 2 `LSTM` `networks`,
|
||||
***one to remember the `past` and one to remember the
|
||||
`future`***.
|
||||
We implement 2 networks, one that takes hidden states coming from computing info over in order,
|
||||
while the other one taking hidden states in reverse order.
|
||||
|
||||
This type of `networks` ***improve context
|
||||
understanding***
|
||||
Then we take outputs from both networks and compute attention and other operations, such as
|
||||
`softmax` and `linear` ones, to get the output.
|
||||
|
||||
In this way we gain info coming from both directions of a sequence.
|
||||
|
||||
### Applications[^anelli-RNNs-11]
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user