Revised Chapter 3 and added definitions to appendix
This commit is contained in:
@@ -96,8 +96,9 @@ $$
|
||||
RReLU(x) =
|
||||
\begin{cases}
|
||||
x \text{ if } x \geq 0 \\
|
||||
a\cdot x \text{ if } x < 0
|
||||
\end{cases}
|
||||
\vec{a} \cdot x \text{ if } x < 0
|
||||
\end{cases} \\
|
||||
a_{i,j} \sim U (l, u): \;l < u \wedge l, u \in [0, 1[
|
||||
$$
|
||||
|
||||
It is not ***derivable***, but on $0$ we usually put the value as $\vec{a}$ or $1$, though any value between them is
|
||||
@@ -108,19 +109,22 @@ $$
|
||||
\frac{d\,RReLU(x)}{dx} &=
|
||||
\begin{cases}
|
||||
1 \text{ if } x \geq 0 \\
|
||||
a_{i,j} \cdot x_{i,j} \text{ if } x < 0
|
||||
\vec{a} \text{ if } x < 0
|
||||
\end{cases} \\
|
||||
|
||||
a_{i,j} \sim U (l, u)&: \;l < u \wedge l, u \in [0, 1[
|
||||
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
Here $\vec{a}$ is a **random** paramter that is
|
||||
Here $\vec{a}$ is a **random** parameter that is
|
||||
**always sampled** during **training** and **fixed**
|
||||
during **tests and inference** to $\frac{l + u}{2}$
|
||||
|
||||
### ELU
|
||||
|
||||
This function allows the system to average output to 0, thus it may
|
||||
converge faster
|
||||
|
||||
$$
|
||||
ELU(x) =
|
||||
\begin{cases}
|
||||
@@ -207,6 +211,9 @@ space of manouver.
|
||||
|
||||
### Softplus
|
||||
|
||||
This is a smoothed version of a [ReLU](#relu) and as such outputs only positive
|
||||
values
|
||||
|
||||
$$
|
||||
Softplus(x) =
|
||||
\frac{1}{\beta} \cdot
|
||||
@@ -224,7 +231,7 @@ to **constraint the output to positive values**.
|
||||
The **larger $\beta$**, the **similar to [ReLU](#relu)**
|
||||
|
||||
$$
|
||||
\frac{d\,Softplus(x)}{dx} = \frac{e^{b*x}}{e^{b*x} + 1}
|
||||
\frac{d\,Softplus(x)}{dx} = \frac{e^{\beta*x}}{e^{\beta*x} + 1}
|
||||
$$
|
||||
|
||||
For **numerical-stability** when $\beta > tresh$, the
|
||||
@@ -232,12 +239,16 @@ implementation **reverts back to a linear function**
|
||||
|
||||
### GELU[^GELU]
|
||||
|
||||
This function saturates like ramps over negative values.
|
||||
|
||||
$$
|
||||
GELU(x) = x \cdot \Phi(x)
|
||||
GELU(x) = x \cdot \Phi(x) \\
|
||||
\Phi(x) = P(X \leq x) \,\, X \sim \mathcal{N}(0, 1)
|
||||
$$
|
||||
|
||||
This can be considered as a **smooth [ReLU](#relu)**,
|
||||
however it's **not monothonic**
|
||||
|
||||
$$
|
||||
\frac{d\,GELU(x)}{dx} = \Phi(x)+ x\cdot P(X = x)
|
||||
$$
|
||||
@@ -388,7 +399,7 @@ Hardtanh(x) =
|
||||
$$
|
||||
|
||||
It is not ***differentiable***, but
|
||||
**works well with values around $0$**.
|
||||
**works well with values around $0$**, small values.
|
||||
|
||||
$$
|
||||
\frac{d\,Hardtanh(x)}{dx} =
|
||||
|
||||
Reference in New Issue
Block a user