Revised Chapter 3 and added definitions to appendix

2025-11-17 17:04:33 +01:00
parent e07a80649a
commit 247daf4d56
3 changed files with 84 additions and 11 deletions
--- a/Chapters/3-Activation-Functions/INDEX.md
+++ b/Chapters/3-Activation-Functions/INDEX.md
@@ -96,8 +96,9 @@ $$
 RReLU(x) =
 \begin{cases}
    x \text{ if } x \geq 0 \\
-    a\cdot x \text{ if } x < 0
-\end{cases}
+    \vec{a} \cdot x \text{ if } x < 0
+\end{cases} \\
+a_{i,j} \sim U (l, u): \;l < u \wedge l, u \in [0, 1[
 $$

 It is not ***derivable***, but on $0$ we usually put the value as $\vec{a}$ or $1$, though any value between them is
@@ -108,19 +109,22 @@ $$
    \frac{d\,RReLU(x)}{dx} &=
 \begin{cases}
    1 \text{ if } x \geq 0 \\
-    a_{i,j} \cdot x_{i,j} \text{ if } x < 0
+    \vec{a} \text{ if } x < 0
 \end{cases} \\

-a_{i,j} \sim U (l, u)&: \;l < u \wedge l, u \in [0, 1[
+
 \end{aligned}
 $$

-Here $\vec{a}$ is a **random** paramter that is
+Here $\vec{a}$ is a **random** parameter that is
 **always sampled** during **training** and **fixed**
 during **tests and inference** to $\frac{l + u}{2}$

 ### ELU

+This function allows the system to average output to 0, thus it may
+converge faster
+
 $$
 ELU(x) =
 \begin{cases}
@@ -207,6 +211,9 @@ space of manouver.

 ### Softplus

+This is a smoothed version of a [ReLU](#relu) and as such outputs only positive
+values
+
 $$
 Softplus(x) =
 \frac{1}{\beta} \cdot
@@ -224,7 +231,7 @@ to **constraint the output to positive values**.
 The **larger $\beta$**, the **similar to [ReLU](#relu)**

 $$
-\frac{d\,Softplus(x)}{dx} = \frac{e^{b*x}}{e^{b*x} + 1}
+\frac{d\,Softplus(x)}{dx} = \frac{e^{\beta*x}}{e^{\beta*x} + 1}
 $$

 For **numerical-stability** when $\beta > tresh$, the
@@ -232,12 +239,16 @@ implementation **reverts back to a linear function**

 ### GELU[^GELU]

+This function saturates like ramps over negative values.
+
 $$
-GELU(x) = x \cdot \Phi(x)
+GELU(x) = x \cdot \Phi(x) \\
+\Phi(x) = P(X \leq x) \,\, X \sim \mathcal{N}(0, 1)
 $$

 This can be considered as a **smooth [ReLU](#relu)**,
 however it's **not monothonic**
+
 $$
 \frac{d\,GELU(x)}{dx} = \Phi(x)+ x\cdot P(X = x)
 $$
@@ -388,7 +399,7 @@ Hardtanh(x) =
 $$

 It is not ***differentiable***, but
-**works well with values around $0$**.
+**works well with values around $0$**, small values.

 $$
 \frac{d\,Hardtanh(x)}{dx} =