Deep-Learning/Chapters/11-GANs/INDEX.md

# GANs

## Pseudorandom Numbers

### Rejection Sampling

Instead of sampling directly over the **complex distribution**, we sample over a **simpler one**
and accept or reject values according to some conditions.

If these conditions are crafted in a specific way, our samples will resemble those of the **complex distribution**

### Metropolis-Hasting

The idea is of **constructing a stationary Markov Chain** along the possible values we can sample.

Then we take a long path and see where we end up and sample that point

### Inverse Transform

The idea is of sampling directly from a **well known distribution** and then return a number from the **complex distribution** such
that its probability function is higher than our **simple sample**

$$
u \in \mathcal{U}[0, 1] \\
\text{take } x \text{ so that } u \leq F(x) \text{ where } \\
F(x) \text{ is the the cumulative distribution funcion of } x
$$

As proof we have that

$$
F_{X}(x) \in [0, 1] = X \rightarrow F_{X}^{-1}(X) = x \in \R \\
\text{Let's define } Y = F_{X}^{-1}(U) \\
F_{Y}(y) =
    P(Y \leq y) =
    P(F_{X}^{-1}(U) \leq y) \rightarrow \\
\rightarrow P(F_{X}(F_{X}^{-1}(U)) \leq F_{X}(y)) =
    P(U \leq F_{X}(y)) = F_{X}(y)
$$

In particular, we demonstrated that by crafting a variable such as that,
makes it possible to sample $X$ and $Y$ by sampling $U$

> [!NOTE]
> The last passage says that a uniform variable is less than a value, that
> comes from a distribution. Since $F_{U}(x) = P(U \leq x) = x$ the last
> part holds.
>
> We can use the CDF of X in the probability function, without inverting the
> inequality because any CDF is positively monotone

## Generative Models

The idea is that, given a $n \times n$ vector $N$, made of stacked pixels,
not all vectors will be a dog photo, so we must find the probability
distribution associated with dog images

However we have little to no information about the **actual distribution**
of dog images.

Our solution is to **sample from a simple distribution, transform** it
into our target distribution and then **compare with** a **sampled subset** of
the **complex distribution**

We then train our network to make better transformations

### Direct Learning

We see the MMD distance between our generated set and the real one

### Indirect Learning

We introduce the concept of a **discriminator** which will guess whether
our image was generated or it comes from the actual distribution.

We then update our model based on the grade achieved by the **discriminator**

## Generative Adversarial Networks

If we use the **indirect learning** approach, we need a `Network` capable of
classificating generated and genuine content according to their labels.

However, for the same reasons of the **generative models**, we don't have such
a `Network` readily available, but rather we have to **learn** it.

## Training Phases

We can train both the **generator an discriminator** together. They will
`backpropagate` based on **classification errors** of the **discriminator**,
however they will have 2 distinct objective:

- `Generator`: **Maximize** this error
- `Discriminator`: **Minimize** this error

From a **game theory** view, they are playing a **zero-sum** game and the
perfect outcome is **when the discriminator matches tags 50% of times**

### Loss functions

Assuming that $G(\vec{z})$ is the result of the `generator`, $D(\vec{a})$ is the result of the `discriminator`, $y = 0$ is the generated content and $y = 1$ is the real one

- **`Generator`**:\
We want to maximize the error of the `Discriminator` $D(\vec{a})$ when $y = 0$

$$
\max_{G}\{ -(1 -y) \log(1 - D(G(\vec{z})))\} =
    \max_{G}\{- \log(1 - D(G(\vec{z})))\}
$$


- **`Discriminator`**: \
We want to minimize its error when $y = 1$. However, if the `Generator` is good at its job
$D(G(\vec{z})) \approx 1$ the above equation
will be near 0, so we use these instead:

$$
\begin{aligned}
\min_{G}\{ (1 -y) \log(1 - D(G(\vec{z})))\} &=
    \min_{G}\{\log(1 - D(G(\vec{z})))\} \\
    &\approx \max_{G}\{- \log(D(G(\vec{z})))\}
\end{aligned}
$$

However they are basically playing a `minimax` game:

$$
\min_{D} \max_{G} \{
- E_{x \sim Data} \log(D(\vec{x})) - E_{z \sim Noise} \log(1 - D(G(\vec{z})))
\}
$$
Added Chapter 11 2025-09-03 19:29:00 +02:00			`# GANs`

			`## Pseudorandom Numbers`

			`### Rejection Sampling`

			`Instead of sampling directly over the complex distribution, we sample over a simpler one`
			`and accept or reject values according to some conditions.`

			`If these conditions are crafted in a specific way, our samples will resemble those of the complex distribution`

			`### Metropolis-Hasting`

			`The idea is of constructing a stationary Markov Chain along the possible values we can sample.`

			`Then we take a long path and see where we end up and sample that point`

			`### Inverse Transform`

			`The idea is of sampling directly from a well known distribution and then return a number from the complex distribution such`
			`that its probability function is higher than our simple sample`

			`$$`
			`u \in \mathcal{U}[0, 1] \\`
			`\text{take } x \text{ so that } u \leq F(x) \text{ where } \\`
			`F(x) \text{ is the the cumulative distribution funcion of } x`
			`$$`

			`As proof we have that`

			`$$`
			`F_{X}(x) \in [0, 1] = X \rightarrow F_{X}^{-1}(X) = x \in \R \\`
			`\text{Let's define } Y = F_{X}^{-1}(U) \\`
			`F_{Y}(y) =`
			`P(Y \leq y) =`
			`P(F_{X}^{-1}(U) \leq y) \rightarrow \\`
			`\rightarrow P(F_{X}(F_{X}^{-1}(U)) \leq F_{X}(y)) =`
			`P(U \leq F_{X}(y)) = F_{X}(y)`
			`$$`

			`In particular, we demonstrated that by crafting a variable such as that,`
			`makes it possible to sample $X$ and $Y$ by sampling $U$`

			`> [!NOTE]`
			`> The last passage says that a uniform variable is less than a value, that`
			`> comes from a distribution. Since $F_{U}(x) = P(U \leq x) = x$ the last`
			`> part holds.`
			`>`
			`> We can use the CDF of X in the probability function, without inverting the`
			`> inequality because any CDF is positively monotone`

			`## Generative Models`

			`The idea is that, given a $n \times n$ vector $N$, made of stacked pixels,`
			`not all vectors will be a dog photo, so we must find the probability`
			`distribution associated with dog images`

			`However we have little to no information about the actual distribution`
			`of dog images.`

			`Our solution is to sample from a simple distribution, transform it`
			`into our target distribution and then compare with a sampled subset of`
			`the complex distribution`

			`We then train our network to make better transformations`

			`### Direct Learning`

			`We see the MMD distance between our generated set and the real one`

			`### Indirect Learning`

			`We introduce the concept of a discriminator which will guess whether`
			`our image was generated or it comes from the actual distribution.`

			`We then update our model based on the grade achieved by the discriminator`

			`## Generative Adversarial Networks`

			If we use the indirect learning approach, we need a `Network` capable of
			`classificating generated and genuine content according to their labels.`

			`However, for the same reasons of the generative models, we don't have such`
			a `Network` readily available, but rather we have to learn it.

			`## Training Phases`

			`We can train both the generator an discriminator together. They will`
			`backpropagate` based on classification errors of the discriminator,
			`however they will have 2 distinct objective:`

			- `Generator`: Maximize this error
			- `Discriminator`: Minimize this error

			`From a game theory view, they are playing a zero-sum game and the`
			`perfect outcome is when the discriminator matches tags 50% of times`

			`### Loss functions`

			Assuming that $G(\vec{z})$ is the result of the `generator`, $D(\vec{a})$ is the result of the `discriminator`, $y = 0$ is the generated content and $y = 1$ is the real one

			- `Generator`:\
			We want to maximize the error of the `Discriminator` $D(\vec{a})$ when $y = 0$

			`$$`
			`\max_{G}\{ -(1 -y) \log(1 - D(G(\vec{z})))\} =`
			`\max_{G}\{- \log(1 - D(G(\vec{z})))\}`
			`$$`




			- `Discriminator`: \
			We want to minimize its error when $y = 1$. However, if the `Generator` is good at its job
			`$D(G(\vec{z})) \approx 1$ the above equation`
			`will be near 0, so we use these instead:`

			`$$`
			`\begin{aligned}`
			`\min_{G}\{ (1 -y) \log(1 - D(G(\vec{z})))\} &=`
			`\min_{G}\{\log(1 - D(G(\vec{z})))\} \\`
			`&\approx \max_{G}\{- \log(D(G(\vec{z})))\}`
			`\end{aligned}`
			`$$`

			However they are basically playing a `minimax` game:

			`$$`
			`\min_{D} \max_{G} \{`
			`- E_{x \sim Data} \log(D(\vec{x})) - E_{z \sim Noise} \log(1 - D(G(\vec{z})))`
			`\}`
			`$$`