GANs

Pseudorandom Numbers

Rejection Sampling

Instead of sampling directly over the complex distribution, we sample over a simpler one and accept or reject values according to some conditions.

If these conditions are crafted in a specific way, our samples will resemble those of the complex distribution

Metropolis-Hasting

The idea is of constructing a stationary Markov Chain along the possible values we can sample.

Then we take a long path and see where we end up and sample that point

Inverse Transform

The idea is of sampling directly from a well known distribution and then return a number from the complex distribution such that its probability function is higher than our simple sample


u \in \mathcal{U}[0, 1] \\
\text{take } x \text{ so that } u \leq F(x) \text{ where } \\
F(x) \text{ is the the cumulative distribution funcion of } x

As proof we have that


F_{X}(x) \in [0, 1] = X \rightarrow F_{X}^{-1}(X) = x \in \R \\
\text{Let's define } Y = F_{X}^{-1}(U) \\
F_{Y}(y) =
    P(Y \leq y) =
    P(F_{X}^{-1}(U) \leq y) \rightarrow \\
\rightarrow P(F_{X}(F_{X}^{-1}(U)) \leq F_{X}(y)) =
    P(U \leq F_{X}(y)) = F_{X}(y)

In particular, we demonstrated that by crafting a variable such as that, makes it possible to sample X and Y by sampling U

Note

The last passage says that a uniform variable is less than a value, that comes from a distribution. Since F_{U}(x) = P(U \leq x) = x the last part holds.

We can use the CDF of X in the probability function, without inverting the inequality because any CDF is positively monotone

Generative Models

The idea is that, given a n \times n vector N, made of stacked pixels, not all vectors will be a dog photo, so we must find the probability distribution associated with dog images

However we have little to no information about the actual distribution of dog images.

Our solution is to sample from a simple distribution, transform it into our target distribution and then compare with a sampled subset of the complex distribution

We then train our network to make better transformations

Direct Learning

We see the MMD distance between our generated set and the real one

Indirect Learning

We introduce the concept of a discriminator which will guess whether our image was generated or it comes from the actual distribution.

We then update our model based on the grade achieved by the discriminator

Generative Adversarial Networks

If we use the indirect learning approach, we need a Network capable of classificating generated and genuine content according to their labels.

However, for the same reasons of the generative models, we don't have such a Network readily available, but rather we have to learn it.

Training Phases

We can train both the generator an discriminator together. They will backpropagate based on classification errors of the discriminator, however they will have 2 distinct objective:

Generator: Maximize this error
Discriminator: Minimize this error

From a game theory view, they are playing a zero-sum game and the perfect outcome is when the discriminator matches tags 50% of times

Loss functions

Assuming that G(\vec{z}) is the result of the generator, D(\vec{a}) is the result of the discriminator, y = 0 is the generated content and y = 1 is the real one

Generator:
We want to maximize the error of the Discriminator D(\vec{a}) when y = 0


\max_{G}\{ -(1 -y) \log(1 - D(G(\vec{z})))\} =
    \max_{G}\{- \log(1 - D(G(\vec{z})))\}

Discriminator:
We want to minimize its error when y = 1. However, if the Generator is good at its job D(G(\vec{z})) \approx 1 the above equation will be near 0, so we use these instead:


\begin{aligned}
\min_{G}\{ (1 -y) \log(1 - D(G(\vec{z})))\} &=
    \min_{G}\{\log(1 - D(G(\vec{z})))\} \\
    &\approx \max_{G}\{- \log(D(G(\vec{z})))\}
\end{aligned}

However they are basically playing a minimax game:


\min_{D} \max_{G} \{
- E_{x \sim Data} \log(D(\vec{x})) - E_{z \sim Noise} \log(1 - D(G(\vec{z})))
\}

4.1 KiB Raw Blame History