Added Autoencoders
This commit is contained in:
parent
56c0cf768b
commit
8fdfd57c73
@ -0,0 +1,113 @@
|
||||
# Autoencoders
|
||||
|
||||
Here we are trying to make a `model` to learn an **identity** function
|
||||
|
||||
$$
|
||||
h_{\theta} (x) \approx x
|
||||
$$
|
||||
|
||||
Now, if we were just to do this, it would be very simple, just pass
|
||||
the `input` directly to `output`.
|
||||
|
||||
The innovation comes from the fact that we can ***compress*** `data` by using
|
||||
an `NN` that has **less `neurons` per layer than `input` dimension**, or have
|
||||
**less `connections` (sparse)**
|
||||
|
||||
## Compression
|
||||
|
||||
In a very simple fashion, we train a network to compress $\vec{x}$ in a more **dense**
|
||||
vector $\vec{y}$ and then later **expand** it into $\vec{z}$, also called
|
||||
**prediction** of $\vec{x}$
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
\vec{x} &= [a, b]^{d_x} \\
|
||||
\vec{y} &= g(\vec{W_{0}}\vec{x} + b_{0}) \rightarrow \vec{y} = [a_1, b_1]^{d_y} \\
|
||||
\vec{z} &= g(\vec{W_{1}}\vec{y} + b_{1}) \rightarrow \vec{z} = [a, b]^{d_x} \\
|
||||
\vec{z} &\approx \vec{x}
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
## Sparse Training
|
||||
|
||||
A sparse hidden representation comes by penalizing values assigned to `neurons`
|
||||
(weights).
|
||||
|
||||
$$
|
||||
\min_{\theta}
|
||||
\underbrace{||h_{\theta}(x) - x ||^{2}}_{\text{
|
||||
Reconstruction Error
|
||||
}} +
|
||||
\underbrace{\lambda \sum_{i}|a_i|}_{\text{
|
||||
L1 sparsity
|
||||
}}
|
||||
$$
|
||||
|
||||
The reason on why we want **sparsity** is that we want the **best** representation
|
||||
in the `latent space`, thus we want to **avoid** our `network` to **learn the
|
||||
identity mapping**
|
||||
|
||||
## Layerwise Training
|
||||
|
||||
To train an `autoencoder` we train `layer` by `layer`, minimizing `vanishing gradients`.
|
||||
|
||||
The trick is to train one `layer`, then use it as the input for the other `layer`
|
||||
and training over it as if it were our $x$. Rinse and repeat for 3 `layers` approximately.
|
||||
|
||||
If you want, **at last**, you can put another `layer` that you train over `data` to
|
||||
**fine tune**
|
||||
|
||||
<!-- TODO: See Deep Belief Networks and Deep Boltzmann Machines-->
|
||||
<!-- TODO: See Deep autoencoders training-->
|
||||
|
||||
## U-Net
|
||||
|
||||
It was developed to analyze medical images and segmentation, step in which we
|
||||
add classification to pixels. To train these segmentation models we use **target maps**
|
||||
that have the desired classification maps.
|
||||
|
||||
### Architecture
|
||||
|
||||
- **Encoder**:\
|
||||
We have several convolutional and pooling layers to make the representation smaller.
|
||||
Once small enough, we'll have a `FCNN`
|
||||
- **Decoder**:\
|
||||
In this phase we restore the representation to the original dimension (`up-sampling`).
|
||||
Here we have many **deconvolution** layers, however these are learnt functions
|
||||
- **Skip Connection**:\
|
||||
These are connections used to tell **deconvolutional** layers where the feature
|
||||
came from. Basically we concatenate a previous convolutional block with the
|
||||
convoluted one and we make a convolution of these layers.
|
||||
|
||||
<!-- TODO: See PDF anelli 10 to see complete architecture -->
|
||||
|
||||
## Variational Autoencoders
|
||||
|
||||
Until now we were reconstructing points in the latent space to points in the
|
||||
**target space**.
|
||||
|
||||
However, these means that the **immediate neighbours of the data point** are
|
||||
**meaningless**.
|
||||
|
||||
The idea is to make it such that all **immediate neighbour regions of our data point**
|
||||
will be decoded as our **data point**.
|
||||
|
||||
To achieve this, our **point** will become a **distribution** over the `latent-space`
|
||||
and then we'll sample from there and decode the point. We then operate as normally by
|
||||
backpropagating the error.
|
||||
|
||||
### Regularization Term
|
||||
|
||||
We use `Kullback-Leibler` to see the difference in distributions. This has a
|
||||
**closed form** in terms of **mean** and **covariance matrices**
|
||||
|
||||
The importance of regularization makes it so that these encoders are both continuous and
|
||||
complete (each point is meaningfull). Without it we would have too similar results in our
|
||||
regions. Also this makes it so that we don't have regions ***too concentrated and
|
||||
similar to a point, nor too far apart from each other***
|
||||
|
||||
### Loss
|
||||
|
||||
$$
|
||||
L(x) = ||x - \hat{x}||^{2}_{2} + KL[N(\mu_{x}, \Sigma_{x}), N(0, 1)]
|
||||
$$
|
||||
Loading…
x
Reference in New Issue
Block a user