diff --git a/Chapters/11-GANs/INDEX.md b/Chapters/11-GANs/INDEX.md index e69de29..2c050cf 100644 --- a/Chapters/11-GANs/INDEX.md +++ b/Chapters/11-GANs/INDEX.md @@ -0,0 +1,132 @@ +# GANs + +## Pseudorandom Numbers + +### Rejection Sampling + +Instead of sampling directly over the **complex distribution**, we sample over a **simpler one** +and accept or reject values according to some conditions. + +If these conditions are crafted in a specific way, our samples will resemble those of the **complex distribution** + +### Metropolis-Hasting + +The idea is of **constructing a stationary Markov Chain** along the possible values we can sample. + +Then we take a long path and see where we end up and sample that point + +### Inverse Transform + +The idea is of sampling directly from a **well known distribution** and then return a number from the **complex distribution** such +that its probability function is higher than our **simple sample** + +$$ +u \in \mathcal{U}[0, 1] \\ +\text{take } x \text{ so that } u \leq F(x) \text{ where } \\ +F(x) \text{ is the the cumulative distribution funcion of } x +$$ + +As proof we have that + +$$ +F_{X}(x) \in [0, 1] = X \rightarrow F_{X}^{-1}(X) = x \in \R \\ +\text{Let's define } Y = F_{X}^{-1}(U) \\ +F_{Y}(y) = + P(Y \leq y) = + P(F_{X}^{-1}(U) \leq y) \rightarrow \\ +\rightarrow P(F_{X}(F_{X}^{-1}(U)) \leq F_{X}(y)) = + P(U \leq F_{X}(y)) = F_{X}(y) +$$ + +In particular, we demonstrated that by crafting a variable such as that, +makes it possible to sample $X$ and $Y$ by sampling $U$ + +> [!NOTE] +> The last passage says that a uniform variable is less than a value, that +> comes from a distribution. Since $F_{U}(x) = P(U \leq x) = x$ the last +> part holds. +> +> We can use the CDF of X in the probability function, without inverting the +> inequality because any CDF is positively monotone + +## Generative Models + +The idea is that, given a $n \times n$ vector $N$, made of stacked pixels, +not all vectors will be a dog photo, so we must find the probability +distribution associated with dog images + +However we have little to no information about the **actual distribution** +of dog images. + +Our solution is to **sample from a simple distribution, transform** it +into our target distribution and then **compare with** a **sampled subset** of +the **complex distribution** + +We then train our network to make better transformations + +### Direct Learning + +We see the MMD distance between our generated set and the real one + +### Indirect Learning + +We introduce the concept of a **discriminator** which will guess whether +our image was generated or it comes from the actual distribution. + +We then update our model based on the grade achieved by the **discriminator** + +## Generative Adversarial Networks + +If we use the **indirect learning** approach, we need a `Network` capable of +classificating generated and genuine content according to their labels. + +However, for the same reasons of the **generative models**, we don't have such +a `Network` readily available, but rather we have to **learn** it. + +## Training Phases + +We can train both the **generator an discriminator** together. They will +`backpropagate` based on **classification errors** of the **discriminator**, +however they will have 2 distinct objective: + +- `Generator`: **Maximize** this error +- `Discriminator`: **Minimize** this error + +From a **game theory** view, they are playing a **zero-sum** game and the +perfect outcome is **when the discriminator matches tags 50% of times** + +### Loss functions + +Assuming that $G(\vec{z})$ is the result of the `generator`, $D(\vec{a})$ is the result of the `discriminator`, $y = 0$ is the generated content and $y = 1$ is the real one + +- **`Generator`**:\ +We want to maximize the error of the `Discriminator` $D(\vec{a})$ when $y = 0$ + +$$ +\max_{G}\{ -(1 -y) \log(1 - D(G(\vec{z})))\} = + \max_{G}\{- \log(1 - D(G(\vec{z})))\} +$$ + + + + +- **`Discriminator`**: \ +We want to minimize its error when $y = 1$. However, if the `Generator` is good at its job +$D(G(\vec{z})) \approx 1$ the above equation +will be near 0, so we use these instead: + +$$ +\begin{aligned} +\min_{G}\{ (1 -y) \log(1 - D(G(\vec{z})))\} &= + \min_{G}\{\log(1 - D(G(\vec{z})))\} \\ + &\approx \max_{G}\{- \log(D(G(\vec{z})))\} +\end{aligned} +$$ + +However they are basically playing a `minimax` game: + +$$ +\min_{D} \max_{G} \{ +- E_{x \sim Data} \log(D(\vec{x})) - E_{z \sim Noise} \log(1 - D(G(\vec{z}))) +\} +$$ \ No newline at end of file