4.1 KiB
GANs
Pseudorandom Numbers
Rejection Sampling
Instead of sampling directly over the complex distribution, we sample over a simpler one and accept or reject values according to some conditions.
If these conditions are crafted in a specific way, our samples will resemble those of the complex distribution
Metropolis-Hasting
The idea is of constructing a stationary Markov Chain along the possible values we can sample.
Then we take a long path and see where we end up and sample that point
Inverse Transform
The idea is of sampling directly from a well known distribution and then return a number from the complex distribution such that its probability function is higher than our simple sample
u \in \mathcal{U}[0, 1] \\
\text{take } x \text{ so that } u \leq F(x) \text{ where } \\
F(x) \text{ is the the cumulative distribution funcion of } x
As proof we have that
F_{X}(x) \in [0, 1] = X \rightarrow F_{X}^{-1}(X) = x \in \R \\
\text{Let's define } Y = F_{X}^{-1}(U) \\
F_{Y}(y) =
P(Y \leq y) =
P(F_{X}^{-1}(U) \leq y) \rightarrow \\
\rightarrow P(F_{X}(F_{X}^{-1}(U)) \leq F_{X}(y)) =
P(U \leq F_{X}(y)) = F_{X}(y)
In particular, we demonstrated that by crafting a variable such as that,
makes it possible to sample X and Y by sampling U
Note
The last passage says that a uniform variable is less than a value, that comes from a distribution. Since
F_{U}(x) = P(U \leq x) = xthe last part holds.We can use the CDF of X in the probability function, without inverting the inequality because any CDF is positively monotone
Generative Models
The idea is that, given a n \times n vector N, made of stacked pixels,
not all vectors will be a dog photo, so we must find the probability
distribution associated with dog images
However we have little to no information about the actual distribution of dog images.
Our solution is to sample from a simple distribution, transform it into our target distribution and then compare with a sampled subset of the complex distribution
We then train our network to make better transformations
Direct Learning
We see the MMD distance between our generated set and the real one
Indirect Learning
We introduce the concept of a discriminator which will guess whether our image was generated or it comes from the actual distribution.
We then update our model based on the grade achieved by the discriminator
Generative Adversarial Networks
If we use the indirect learning approach, we need a Network capable of
classificating generated and genuine content according to their labels.
However, for the same reasons of the generative models, we don't have such
a Network readily available, but rather we have to learn it.
Training Phases
We can train both the generator an discriminator together. They will
backpropagate based on classification errors of the discriminator,
however they will have 2 distinct objective:
Generator: Maximize this errorDiscriminator: Minimize this error
From a game theory view, they are playing a zero-sum game and the perfect outcome is when the discriminator matches tags 50% of times
Loss functions
Assuming that G(\vec{z}) is the result of the generator, D(\vec{a}) is the result of the discriminator, y = 0 is the generated content and y = 1 is the real one
Generator:
We want to maximize the error of theDiscriminatorD(\vec{a})wheny = 0
\max_{G}\{ -(1 -y) \log(1 - D(G(\vec{z})))\} =
\max_{G}\{- \log(1 - D(G(\vec{z})))\}
Discriminator:
We want to minimize its error wheny = 1. However, if theGeneratoris good at its jobD(G(\vec{z})) \approx 1the above equation will be near 0, so we use these instead:
\begin{aligned}
\min_{G}\{ (1 -y) \log(1 - D(G(\vec{z})))\} &=
\min_{G}\{\log(1 - D(G(\vec{z})))\} \\
&\approx \max_{G}\{- \log(D(G(\vec{z})))\}
\end{aligned}
However they are basically playing a minimax game:
\min_{D} \max_{G} \{
- E_{x \sim Data} \log(D(\vec{x})) - E_{z \sim Noise} \log(1 - D(G(\vec{z})))
\}