Added Chapter 11
This commit is contained in:
parent
53256e09b8
commit
f6960c109b
@ -0,0 +1,132 @@
|
||||
# GANs
|
||||
|
||||
## Pseudorandom Numbers
|
||||
|
||||
### Rejection Sampling
|
||||
|
||||
Instead of sampling directly over the **complex distribution**, we sample over a **simpler one**
|
||||
and accept or reject values according to some conditions.
|
||||
|
||||
If these conditions are crafted in a specific way, our samples will resemble those of the **complex distribution**
|
||||
|
||||
### Metropolis-Hasting
|
||||
|
||||
The idea is of **constructing a stationary Markov Chain** along the possible values we can sample.
|
||||
|
||||
Then we take a long path and see where we end up and sample that point
|
||||
|
||||
### Inverse Transform
|
||||
|
||||
The idea is of sampling directly from a **well known distribution** and then return a number from the **complex distribution** such
|
||||
that its probability function is higher than our **simple sample**
|
||||
|
||||
$$
|
||||
u \in \mathcal{U}[0, 1] \\
|
||||
\text{take } x \text{ so that } u \leq F(x) \text{ where } \\
|
||||
F(x) \text{ is the the cumulative distribution funcion of } x
|
||||
$$
|
||||
|
||||
As proof we have that
|
||||
|
||||
$$
|
||||
F_{X}(x) \in [0, 1] = X \rightarrow F_{X}^{-1}(X) = x \in \R \\
|
||||
\text{Let's define } Y = F_{X}^{-1}(U) \\
|
||||
F_{Y}(y) =
|
||||
P(Y \leq y) =
|
||||
P(F_{X}^{-1}(U) \leq y) \rightarrow \\
|
||||
\rightarrow P(F_{X}(F_{X}^{-1}(U)) \leq F_{X}(y)) =
|
||||
P(U \leq F_{X}(y)) = F_{X}(y)
|
||||
$$
|
||||
|
||||
In particular, we demonstrated that by crafting a variable such as that,
|
||||
makes it possible to sample $X$ and $Y$ by sampling $U$
|
||||
|
||||
> [!NOTE]
|
||||
> The last passage says that a uniform variable is less than a value, that
|
||||
> comes from a distribution. Since $F_{U}(x) = P(U \leq x) = x$ the last
|
||||
> part holds.
|
||||
>
|
||||
> We can use the CDF of X in the probability function, without inverting the
|
||||
> inequality because any CDF is positively monotone
|
||||
|
||||
## Generative Models
|
||||
|
||||
The idea is that, given a $n \times n$ vector $N$, made of stacked pixels,
|
||||
not all vectors will be a dog photo, so we must find the probability
|
||||
distribution associated with dog images
|
||||
|
||||
However we have little to no information about the **actual distribution**
|
||||
of dog images.
|
||||
|
||||
Our solution is to **sample from a simple distribution, transform** it
|
||||
into our target distribution and then **compare with** a **sampled subset** of
|
||||
the **complex distribution**
|
||||
|
||||
We then train our network to make better transformations
|
||||
|
||||
### Direct Learning
|
||||
|
||||
We see the MMD distance between our generated set and the real one
|
||||
|
||||
### Indirect Learning
|
||||
|
||||
We introduce the concept of a **discriminator** which will guess whether
|
||||
our image was generated or it comes from the actual distribution.
|
||||
|
||||
We then update our model based on the grade achieved by the **discriminator**
|
||||
|
||||
## Generative Adversarial Networks
|
||||
|
||||
If we use the **indirect learning** approach, we need a `Network` capable of
|
||||
classificating generated and genuine content according to their labels.
|
||||
|
||||
However, for the same reasons of the **generative models**, we don't have such
|
||||
a `Network` readily available, but rather we have to **learn** it.
|
||||
|
||||
## Training Phases
|
||||
|
||||
We can train both the **generator an discriminator** together. They will
|
||||
`backpropagate` based on **classification errors** of the **discriminator**,
|
||||
however they will have 2 distinct objective:
|
||||
|
||||
- `Generator`: **Maximize** this error
|
||||
- `Discriminator`: **Minimize** this error
|
||||
|
||||
From a **game theory** view, they are playing a **zero-sum** game and the
|
||||
perfect outcome is **when the discriminator matches tags 50% of times**
|
||||
|
||||
### Loss functions
|
||||
|
||||
Assuming that $G(\vec{z})$ is the result of the `generator`, $D(\vec{a})$ is the result of the `discriminator`, $y = 0$ is the generated content and $y = 1$ is the real one
|
||||
|
||||
- **`Generator`**:\
|
||||
We want to maximize the error of the `Discriminator` $D(\vec{a})$ when $y = 0$
|
||||
|
||||
$$
|
||||
\max_{G}\{ -(1 -y) \log(1 - D(G(\vec{z})))\} =
|
||||
\max_{G}\{- \log(1 - D(G(\vec{z})))\}
|
||||
$$
|
||||
|
||||
|
||||
|
||||
|
||||
- **`Discriminator`**: \
|
||||
We want to minimize its error when $y = 1$. However, if the `Generator` is good at its job
|
||||
$D(G(\vec{z})) \approx 1$ the above equation
|
||||
will be near 0, so we use these instead:
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
\min_{G}\{ (1 -y) \log(1 - D(G(\vec{z})))\} &=
|
||||
\min_{G}\{\log(1 - D(G(\vec{z})))\} \\
|
||||
&\approx \max_{G}\{- \log(D(G(\vec{z})))\}
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
However they are basically playing a `minimax` game:
|
||||
|
||||
$$
|
||||
\min_{D} \max_{G} \{
|
||||
- E_{x \sim Data} \log(D(\vec{x})) - E_{z \sim Noise} \log(1 - D(G(\vec{z})))
|
||||
\}
|
||||
$$
|
||||
Loading…
x
Reference in New Issue
Block a user