Revised Chapter 3 and added definitions to appendix

This commit is contained in:
Christian Risi
2025-11-17 17:04:33 +01:00
parent e07a80649a
commit 247daf4d56
3 changed files with 84 additions and 11 deletions

View File

@@ -96,8 +96,9 @@ $$
RReLU(x) =
\begin{cases}
x \text{ if } x \geq 0 \\
a\cdot x \text{ if } x < 0
\end{cases}
\vec{a} \cdot x \text{ if } x < 0
\end{cases} \\
a_{i,j} \sim U (l, u): \;l < u \wedge l, u \in [0, 1[
$$
It is not ***derivable***, but on $0$ we usually put the value as $\vec{a}$ or $1$, though any value between them is
@@ -108,19 +109,22 @@ $$
\frac{d\,RReLU(x)}{dx} &=
\begin{cases}
1 \text{ if } x \geq 0 \\
a_{i,j} \cdot x_{i,j} \text{ if } x < 0
\vec{a} \text{ if } x < 0
\end{cases} \\
a_{i,j} \sim U (l, u)&: \;l < u \wedge l, u \in [0, 1[
\end{aligned}
$$
Here $\vec{a}$ is a **random** paramter that is
Here $\vec{a}$ is a **random** parameter that is
**always sampled** during **training** and **fixed**
during **tests and inference** to $\frac{l + u}{2}$
### ELU
This function allows the system to average output to 0, thus it may
converge faster
$$
ELU(x) =
\begin{cases}
@@ -207,6 +211,9 @@ space of manouver.
### Softplus
This is a smoothed version of a [ReLU](#relu) and as such outputs only positive
values
$$
Softplus(x) =
\frac{1}{\beta} \cdot
@@ -224,7 +231,7 @@ to **constraint the output to positive values**.
The **larger $\beta$**, the **similar to [ReLU](#relu)**
$$
\frac{d\,Softplus(x)}{dx} = \frac{e^{b*x}}{e^{b*x} + 1}
\frac{d\,Softplus(x)}{dx} = \frac{e^{\beta*x}}{e^{\beta*x} + 1}
$$
For **numerical-stability** when $\beta > tresh$, the
@@ -232,12 +239,16 @@ implementation **reverts back to a linear function**
### GELU[^GELU]
This function saturates like ramps over negative values.
$$
GELU(x) = x \cdot \Phi(x)
GELU(x) = x \cdot \Phi(x) \\
\Phi(x) = P(X \leq x) \,\, X \sim \mathcal{N}(0, 1)
$$
This can be considered as a **smooth [ReLU](#relu)**,
however it's **not monothonic**
$$
\frac{d\,GELU(x)}{dx} = \Phi(x)+ x\cdot P(X = x)
$$
@@ -388,7 +399,7 @@ Hardtanh(x) =
$$
It is not ***differentiable***, but
**works well with values around $0$**.
**works well with values around $0$**, small values.
$$
\frac{d\,Hardtanh(x)}{dx} =