In particular, we demonstrated that by crafting a variable such as that,
makes it possible to sample $X$ and $Y$ by sampling $U$
> [!NOTE]
> The last passage says that a uniform variable is less than a value, that
> comes from a distribution. Since $F_{U}(x) = P(U \leq x) = x$ the last
> part holds.
>
> We can use the CDF of X in the probability function, without inverting the
> inequality because any CDF is positively monotone
## Generative Models
The idea is that, given a $n \times n$ vector $N$, made of stacked pixels,
not all vectors will be a dog photo, so we must find the probability
distribution associated with dog images
However we have little to no information about the **actual distribution**
of dog images.
Our solution is to **sample from a simple distribution, transform** it
into our target distribution and then **compare with** a **sampled subset** of
the **complex distribution**
We then train our network to make better transformations
### Direct Learning
We see the MMD distance between our generated set and the real one
### Indirect Learning
We introduce the concept of a **discriminator** which will guess whether
our image was generated or it comes from the actual distribution.
We then update our model based on the grade achieved by the **discriminator**
## Generative Adversarial Networks
If we use the **indirect learning** approach, we need a `Network` capable of
classificating generated and genuine content according to their labels.
However, for the same reasons of the **generative models**, we don't have such
a `Network` readily available, but rather we have to **learn** it.
## Training Phases
We can train both the **generator an discriminator** together. They will
`backpropagate` based on **classification errors** of the **discriminator**,
however they will have 2 distinct objective:
-`Generator`: **Maximize** this error
-`Discriminator`: **Minimize** this error
From a **game theory** view, they are playing a **zero-sum** game and the
perfect outcome is **when the discriminator matches tags 50% of times**
### Loss functions
Assuming that $G(\vec{z})$ is the result of the `generator`, $D(\vec{a})$ is the result of the `discriminator`, $y = 0$ is the generated content and $y = 1$ is the real one
- **`Generator`**:\
We want to maximize the error of the `Discriminator` $D(\vec{a})$ when $y = 0$
$$
\max_{G}\{ -(1 -y) \log(1 - D(G(\vec{z})))\} =
\max_{G}\{- \log(1 - D(G(\vec{z})))\}
$$
- **`Discriminator`**: \
We want to minimize its error when $y = 1$. However, if the `Generator` is good at its job
$D(G(\vec{z})) \approx 1$ the above equation
will be near 0, so we use these instead:
$$
\begin{aligned}
\min_{G}\{ (1 -y) \log(1 - D(G(\vec{z})))\} &=
\min_{G}\{\log(1 - D(G(\vec{z})))\} \\
&\approx \max_{G}\{- \log(D(G(\vec{z})))\}
\end{aligned}
$$
However they are basically playing a `minimax` game: