Added Autoencoders

This commit is contained in:
Christian Risi 2025-09-02 21:25:17 +02:00
parent 56c0cf768b
commit 8fdfd57c73

View File

@ -0,0 +1,113 @@
# Autoencoders
Here we are trying to make a `model` to learn an **identity** function
$$
h_{\theta} (x) \approx x
$$
Now, if we were just to do this, it would be very simple, just pass
the `input` directly to `output`.
The innovation comes from the fact that we can ***compress*** `data` by using
an `NN` that has **less `neurons` per layer than `input` dimension**, or have
**less `connections` (sparse)**
## Compression
In a very simple fashion, we train a network to compress $\vec{x}$ in a more **dense**
vector $\vec{y}$ and then later **expand** it into $\vec{z}$, also called
**prediction** of $\vec{x}$
$$
\begin{aligned}
\vec{x} &= [a, b]^{d_x} \\
\vec{y} &= g(\vec{W_{0}}\vec{x} + b_{0}) \rightarrow \vec{y} = [a_1, b_1]^{d_y} \\
\vec{z} &= g(\vec{W_{1}}\vec{y} + b_{1}) \rightarrow \vec{z} = [a, b]^{d_x} \\
\vec{z} &\approx \vec{x}
\end{aligned}
$$
## Sparse Training
A sparse hidden representation comes by penalizing values assigned to `neurons`
(weights).
$$
\min_{\theta}
\underbrace{||h_{\theta}(x) - x ||^{2}}_{\text{
Reconstruction Error
}} +
\underbrace{\lambda \sum_{i}|a_i|}_{\text{
L1 sparsity
}}
$$
The reason on why we want **sparsity** is that we want the **best** representation
in the `latent space`, thus we want to **avoid** our `network` to **learn the
identity mapping**
## Layerwise Training
To train an `autoencoder` we train `layer` by `layer`, minimizing `vanishing gradients`.
The trick is to train one `layer`, then use it as the input for the other `layer`
and training over it as if it were our $x$. Rinse and repeat for 3 `layers` approximately.
If you want, **at last**, you can put another `layer` that you train over `data` to
**fine tune**
<!-- TODO: See Deep Belief Networks and Deep Boltzmann Machines-->
<!-- TODO: See Deep autoencoders training-->
## U-Net
It was developed to analyze medical images and segmentation, step in which we
add classification to pixels. To train these segmentation models we use **target maps**
that have the desired classification maps.
### Architecture
- **Encoder**:\
We have several convolutional and pooling layers to make the representation smaller.
Once small enough, we'll have a `FCNN`
- **Decoder**:\
In this phase we restore the representation to the original dimension (`up-sampling`).
Here we have many **deconvolution** layers, however these are learnt functions
- **Skip Connection**:\
These are connections used to tell **deconvolutional** layers where the feature
came from. Basically we concatenate a previous convolutional block with the
convoluted one and we make a convolution of these layers.
<!-- TODO: See PDF anelli 10 to see complete architecture -->
## Variational Autoencoders
Until now we were reconstructing points in the latent space to points in the
**target space**.
However, these means that the **immediate neighbours of the data point** are
**meaningless**.
The idea is to make it such that all **immediate neighbour regions of our data point**
will be decoded as our **data point**.
To achieve this, our **point** will become a **distribution** over the `latent-space`
and then we'll sample from there and decode the point. We then operate as normally by
backpropagating the error.
### Regularization Term
We use `Kullback-Leibler` to see the difference in distributions. This has a
**closed form** in terms of **mean** and **covariance matrices**
The importance of regularization makes it so that these encoders are both continuous and
complete (each point is meaningfull). Without it we would have too similar results in our
regions. Also this makes it so that we don't have regions ***too concentrated and
similar to a point, nor too far apart from each other***
### Loss
$$
L(x) = ||x - \hat{x}||^{2}_{2} + KL[N(\mu_{x}, \Sigma_{x}), N(0, 1)]
$$