# GANs ## Pseudorandom Numbers ### Rejection Sampling Instead of sampling directly over the **complex distribution**, we sample over a **simpler one** and accept or reject values according to some conditions. If these conditions are crafted in a specific way, our samples will resemble those of the **complex distribution** ### Metropolis-Hasting The idea is of **constructing a stationary Markov Chain** along the possible values we can sample. Then we take a long path and see where we end up and sample that point ### Inverse Transform The idea is of sampling directly from a **well known distribution** and then return a number from the **complex distribution** such that its probability function is higher than our **simple sample** $$ u \in \mathcal{U}[0, 1] \\ \text{take } x \text{ so that } u \leq F(x) \text{ where } \\ F(x) \text{ is the the cumulative distribution funcion of } x $$ As proof we have that $$ F_{X}(x) \in [0, 1] = X \rightarrow F_{X}^{-1}(X) = x \in \R \\ \text{Let's define } Y = F_{X}^{-1}(U) \\ F_{Y}(y) = P(Y \leq y) = P(F_{X}^{-1}(U) \leq y) \rightarrow \\ \rightarrow P(F_{X}(F_{X}^{-1}(U)) \leq F_{X}(y)) = P(U \leq F_{X}(y)) = F_{X}(y) $$ In particular, we demonstrated that by crafting a variable such as that, makes it possible to sample $X$ and $Y$ by sampling $U$ > [!NOTE] > The last passage says that a uniform variable is less than a value, that > comes from a distribution. Since $F_{U}(x) = P(U \leq x) = x$ the last > part holds. > > We can use the CDF of X in the probability function, without inverting the > inequality because any CDF is positively monotone ## Generative Models The idea is that, given a $n \times n$ vector $N$, made of stacked pixels, not all vectors will be a dog photo, so we must find the probability distribution associated with dog images However we have little to no information about the **actual distribution** of dog images. Our solution is to **sample from a simple distribution, transform** it into our target distribution and then **compare with** a **sampled subset** of the **complex distribution** We then train our network to make better transformations ### Direct Learning We see the MMD distance between our generated set and the real one ### Indirect Learning We introduce the concept of a **discriminator** which will guess whether our image was generated or it comes from the actual distribution. We then update our model based on the grade achieved by the **discriminator** ## Generative Adversarial Networks If we use the **indirect learning** approach, we need a `Network` capable of classificating generated and genuine content according to their labels. However, for the same reasons of the **generative models**, we don't have such a `Network` readily available, but rather we have to **learn** it. ## Training Phases We can train both the **generator an discriminator** together. They will `backpropagate` based on **classification errors** of the **discriminator**, however they will have 2 distinct objective: - `Generator`: **Maximize** this error - `Discriminator`: **Minimize** this error From a **game theory** view, they are playing a **zero-sum** game and the perfect outcome is **when the discriminator matches tags 50% of times** ### Loss functions Assuming that $G(\vec{z})$ is the result of the `generator`, $D(\vec{a})$ is the result of the `discriminator`, $y = 0$ is the generated content and $y = 1$ is the real one - **`Generator`**:\ We want to maximize the error of the `Discriminator` $D(\vec{a})$ when $y = 0$ $$ \max_{G}\{ -(1 -y) \log(1 - D(G(\vec{z})))\} = \max_{G}\{- \log(1 - D(G(\vec{z})))\} $$ - **`Discriminator`**: \ We want to minimize its error when $y = 1$. However, if the `Generator` is good at its job $D(G(\vec{z})) \approx 1$ the above equation will be near 0, so we use these instead: $$ \begin{aligned} \min_{G}\{ (1 -y) \log(1 - D(G(\vec{z})))\} &= \min_{G}\{\log(1 - D(G(\vec{z})))\} \\ &\approx \max_{G}\{- \log(D(G(\vec{z})))\} \end{aligned} $$ However they are basically playing a `minimax` game: $$ \min_{D} \max_{G} \{ - E_{x \sim Data} \log(D(\vec{x})) - E_{z \sim Noise} \log(1 - D(G(\vec{z}))) \} $$