Added 4th Chapter
This commit is contained in:
parent
73c11ebf9d
commit
f1f89417a9
210
Chapters/4-Loss-Functions/INDEX.md
Normal file
210
Chapters/4-Loss-Functions/INDEX.md
Normal file
@ -0,0 +1,210 @@
|
||||
# Loss Functions
|
||||
|
||||
## MSELoss | AKA L2
|
||||
|
||||
$$
|
||||
MSE(\vec{\bar{y}}, \vec{y}) = \begin{bmatrix}
|
||||
(\bar{y}_1 - y_1)^2 \\
|
||||
(\bar{y}_2 - y_2)^2 \\
|
||||
... \\
|
||||
(\bar{y}_n - y_n)^2 \\
|
||||
\end{bmatrix}^T
|
||||
$$
|
||||
|
||||
Though, it can be reduced to a **scalar** by making
|
||||
either the `sum` of all the values, or the `mean`.
|
||||
|
||||
## L1Loss
|
||||
|
||||
This measures the **M**ean **A**bsolute **E**rror
|
||||
|
||||
$$
|
||||
L1(\vec{\bar{y}}, \vec{y}) = \begin{bmatrix}
|
||||
|\bar{y}_1 - y_1| \\
|
||||
|\bar{y}_2 - y_2| \\
|
||||
... \\
|
||||
|\bar{y}_n - y_n| \\
|
||||
\end{bmatrix}^T
|
||||
$$
|
||||
|
||||
This is more **robust against outliers** as their
|
||||
value is not **squared**.
|
||||
|
||||
However this is not ***differentiable*** towards
|
||||
**small values**, thus the existance of
|
||||
[SmoothL1Loss](#smoothl1loss--aka-huber-loss)
|
||||
|
||||
As [MSELoss](#mseloss--aka-l2), it can be reduces into
|
||||
a **scalar**
|
||||
|
||||
## SmoothL1Loss | AKA Huber Loss
|
||||
|
||||
> [!NOTE]
|
||||
> Called `Elastic Network` when used as an
|
||||
> **objective function**
|
||||
|
||||
$$
|
||||
L1(\vec{\bar{y}}, \vec{y}) = \begin{bmatrix}
|
||||
l_1 \\
|
||||
l_2 \\
|
||||
... \\
|
||||
l_n \\
|
||||
\end{bmatrix}^T;\\
|
||||
|
||||
ln = \begin{cases}
|
||||
\frac{0.5 \cdot (\bar{y}_n -y_n)^2}{\beta}
|
||||
&\text{ if }
|
||||
|\bar{y}_n -y_n| < \beta \\
|
||||
|
||||
|\bar{y}_n -y_n| - 0.5 \cdot \beta
|
||||
&\text{ if }
|
||||
|\bar{y}_n -y_n| \geq \beta
|
||||
\end{cases}
|
||||
$$
|
||||
|
||||
This behaves like [MSELoss](#mseloss--aka-l2) for
|
||||
values **under a treshold** and [L1Loss](#l1loss)
|
||||
**otherwise**.
|
||||
|
||||
It has the **advantage** of being **differentiable**
|
||||
and is **very useful for `computer vision`**
|
||||
|
||||
As [MSELoss](#mseloss--aka-l2), it can be reduces into
|
||||
a **scalar**
|
||||
|
||||
## L1 vs L2 For Image Classification
|
||||
|
||||
Usually with `L2` losses, we get a **blurrier** image as
|
||||
opposed with `L1` loss. This comes from the fact that
|
||||
`L2` averages all values and does not respect
|
||||
`distances`.
|
||||
|
||||
Moreover, since `L1` takes the difference, this is
|
||||
constant over **all values** and **does not
|
||||
decrease towards $0$**
|
||||
|
||||
## NLLLoss[^NLLLoss]
|
||||
|
||||
This is basically the ***distance*** towards
|
||||
real ***class tags***.
|
||||
|
||||
$$
|
||||
NLLLoss(\vec{\bar{y}}, \vec{y}) = \begin{bmatrix}
|
||||
l_1 \\
|
||||
l_2 \\
|
||||
... \\
|
||||
l_n \\
|
||||
\end{bmatrix}^T;\\
|
||||
|
||||
l_n = - w_n \cdot \bar{y}_{n, y_n}
|
||||
$$
|
||||
|
||||
Even here there's the possibility to reduce the vector
|
||||
to a **scalar**:
|
||||
|
||||
$$
|
||||
NLLLoss(\vec{\bar{y}}, \vec{y}, mode) = \begin{cases}
|
||||
\sum^N_{n=1} \frac{
|
||||
l_n
|
||||
}{
|
||||
\sum^N_{n=1} w_n
|
||||
} & \text{ if mode = "mean"}\\
|
||||
\sum^N_{n=1} l_n & \text{ if mode = "sum"}
|
||||
\end{cases}
|
||||
$$
|
||||
|
||||
Technically speaking, in `Pytorch` you have the
|
||||
possibility to ***exclude*** some `classes` during
|
||||
training. Moreover it's possible to pass
|
||||
`weights` for `classes`, **useful when dealing
|
||||
with unbalanced training set**
|
||||
|
||||
> [!TIP]
|
||||
>
|
||||
> So, what's $\vec{\bar{y}}$?
|
||||
>
|
||||
> It's the `tensor` containing the probability of
|
||||
> a `point` to belong to those `classes`.
|
||||
>
|
||||
> For example, let's say we have 10 `points` and 3
|
||||
> `classes`, then $\vec{\bar{y}}_{p,c}$ is the
|
||||
> **`probability` of `point` `p` belonging to `class`
|
||||
> `c`**
|
||||
>
|
||||
> This is why we have
|
||||
> $l_n = - w_n \cdot \bar{y}_{n, y_n}$.
|
||||
> In fact, we take the error over the
|
||||
> **actual `class tag` of that `point`**.
|
||||
>
|
||||
> To get a clear idea, check this website[^NLLLoss]
|
||||
|
||||
<!-- Comment to suppress linter -->
|
||||
|
||||
> [!WARNING]
|
||||
>
|
||||
> Technically speaking the `input` data should come
|
||||
> from a `LogLikelihood` like
|
||||
> [LogSoftmax](./../3-Activation-Functions/INDEX.md#logsoftmax).
|
||||
> However this is not enforced by `Pytorch`
|
||||
|
||||
## CrossEntropyLoss[^Anelli-CEL]
|
||||
|
||||
$$
|
||||
CrossEntropyLoss(\vec{\bar{y}}, \vec{y}) = \begin{bmatrix}
|
||||
l_1 \\
|
||||
l_2 \\
|
||||
... \\
|
||||
l_n \\
|
||||
\end{bmatrix}^T;\\
|
||||
|
||||
l_n = - w_n \cdot \ln\left(
|
||||
\frac{
|
||||
e^{\bar{y}_{n, y_n}}
|
||||
}{
|
||||
\sum_c e^{\bar{y}_{n, y_c}}
|
||||
}
|
||||
\right)
|
||||
$$
|
||||
|
||||
Even here there's the possibility to reduce the vector
|
||||
to a **scalar**:
|
||||
|
||||
$$
|
||||
CrossEntropyLoss(\vec{\bar{y}}, \vec{y}, mode) = \begin{cases}
|
||||
\sum^N_{n=1} \frac{
|
||||
l_n
|
||||
}{
|
||||
\sum^N_{n=1} w_n
|
||||
} & \text{ if mode = "mean"}\\
|
||||
\sum^N_{n=1} l_n & \text{ if mode = "sum"}
|
||||
\end{cases}
|
||||
$$
|
||||
|
||||
> [!NOTE]
|
||||
>
|
||||
> This is basically a **good version** of
|
||||
> [NLLLoss](#nllloss)
|
||||
|
||||
## AdaptiveLogSoftmaxWithLoss
|
||||
|
||||
## BCELoss | AKA Binary Cross Entropy Loss
|
||||
|
||||
## KLDivLoss | AKA Kullback-Leibler Divergence Loss
|
||||
|
||||
## BCEWithLogitsLoss
|
||||
|
||||
## HingeEmbeddingLoss
|
||||
|
||||
## MarginRankingLoss
|
||||
|
||||
## TripletMarginLoss
|
||||
|
||||
## SoftMarginLoss
|
||||
|
||||
## MultiLabelMarginLoss
|
||||
|
||||
## CosineEmbeddingLoss
|
||||
|
||||
[^NLLLoss]: [Remy Lau | Towards Data Science | 4th April 2025](https://towardsdatascience.com/cross-entropy-negative-log-likelihood-and-all-that-jazz-47a95bd2e81/)
|
||||
|
||||
[^Anelli-CEL]: Anelli | Deep Learning PDF 4 pg. 11
|
||||
Loading…
x
Reference in New Issue
Block a user