Added other Loss Functions

This commit is contained in:
Christian Risi 2025-04-15 17:21:47 +02:00
parent f614eaf565
commit 47eac8ff47

View File

@ -187,24 +187,264 @@ $$
## AdaptiveLogSoftmaxWithLoss
<!-- TODO: Read https://arxiv.org/abs/1609.04309 -->
This is an ***approximative*** method to train models with ***large `outputs`*** on `GPUs`.
Usually used when we have ***many `classes`*** and we have ***[imbalances](DEALING-WITH-IMBALANCES.md)*** in
our `training set`.
## BCELoss | AKA Binary Cross Entropy Loss
$$
BCELoss(\vec{\bar{y}}, \vec{y}) = \begin{bmatrix}
l_1 \\
l_2 \\
... \\
l_n \\
\end{bmatrix}^T;\\
l_n = - w_n \cdot \left(
y_n \ln{\bar{y}_n} + (1 - y_n) \cdot \ln{(1 - \bar{y}_n)}
\right)
$$
This is a ***special case of [Cross Entropy Loss](#crossentropyloss)*** with just 2 `classes`
Even here we can reduce with either `mean` or `sum` modifiers
## KLDivLoss | AKA Kullback-Leibler Divergence Loss
$$
KLDivLoss(\vec{\bar{y}}, \vec{y}) = \begin{bmatrix}
l_1 \\
l_2 \\
... \\
l_n \\
\end{bmatrix}^T;\\
l_n = y_n \cdot \ln{
\frac{
y_n
}{
\bar{y}_n
}
} = y_n \cdot \left(
\ln{(y_n)} - \ln{(\bar{y}_n)}
\right)
$$
This is just the ***[Kullback Leibler Loss](https://en.wikipedia.org/wiki/KullbackLeibler_divergence)***.
This is used because we are predicting the ***distribution*** $\vec{y}$ by using $\vec{\bar{y}}$
> [!CAUTION]
> This method assumes you have `probablities` but it does not enforce the use of
> [Softmax](./../3-Activation-Functions/INDEX.md#softmax) or [LogSoftmax](./../3-Activation-Functions/INDEX.md#logsoftmax),
> leading to ***numerical instabilities***
## BCEWithLogitsLoss
<!-- TODO: Define Logits -->
$$
BCEWithLogitsLoss(\vec{\bar{y}}, \vec{y}) = \begin{bmatrix}
l_1 \\
l_2 \\
... \\
l_n \\
\end{bmatrix}^T;\\
l_n = - w_n \cdot \left(\,
y_n \ln{\sigma(\bar{y}_n)} + (1 - y_n)
\cdot
\ln{(1 - \sigma(\bar{y}_n))}\,
\right)
$$
This is basically a [BCELoss](#bceloss--aka-binary-cross-entropy-loss)
with a
[Sigmoid](./../3-Activation-Functions/INDEX.md#sigmoid--aka-logistic) layer to deal with
***numerical instabilities***
## HingeEmbeddingLoss
$$
HingeEmbdeddingLoss(\vec{\bar{y}}, \vec{y}) = \begin{bmatrix}
l_1 \\
l_2 \\
... \\
l_n \\
\end{bmatrix}^T;\\
l_n = \begin{cases}
\bar{y}_n \; \; & y=1 \\
max(0, margin - \bar{y}_n) & y = -1
\end{cases}
$$
In order to understand this type of `Loss` let's reason
as an actual `model`, thus our ***objective*** is to reduce
the `Loss`.
By observing the `Loss`, we get that if we
***predict high values*** for `positive classes` we get a
***high `loss`***.
At the same time, we observe that if we ***predict low values***
for `negative classes`, we get a ***high `loss`***.
Now, what we'll do is:
- ***predict low `outputs` for `positive classes`***
- ***predict high `outputs` for `negative classes`***
This makes these 2 `classes` ***more distant***
between each other,
and make `points` of each `class` ***closer***
## MarginRankingLoss
## TripletMarginLoss
$$
MarginRankingLoss(\vec{\bar{y}}_1, \vec{\bar{y}}_2, \vec{y}) =
\begin{bmatrix}
l_1 \\
l_2 \\
... \\
l_n \\
\end{bmatrix}^T;\\
l_n = max\left(
0,\, -y_n \cdot (\bar{y}_{1,n} - \bar{y}_{2,n} ) + margin \,
\right)
$$
$\vec{\bar{y}}_1$ and $\vec{\bar{y}}_2$ represent ***vectors***
of predictions of that `point` being `class 1` or `class 2`
(both ***positive***),
while $\vec{y}$ is the ***vector*** of `labels`.
As before, our goal is to ***minimize*** the `loss`, thus we
always want ***negative values*** that are ***larger*** than
$margin$:
- $y_n = 1 \rightarrow \bar{y}_{1,n} >> \bar{y}_{2,n}$
- $y_n = -1 \rightarrow \bar{y}_{2,n} >> \bar{y}_{1,n}$
> [!TIP]
> Let's say we are trying to use this for `classification`,
> we can ***cheat*** a bit to make the `model` more ***robust***
> by having ***all correct predictions*** on $\vec{\bar{y}}_1$
> and on $\vec{\bar{y}}_2$ only the
> ***highest wrong prediction*** repeated $n$ times.
## TripletMarginLoss[^tripletmarginloss]
$$
TripleMarginLoss(\vec{a}, \vec{p}, \vec{n}) =
\begin{bmatrix}
l_1 \\
l_2 \\
... \\
l_n \\
\end{bmatrix}^T;\\
l_n = max\left(
0,\,
d(a_n, p_n) - d(a_n, n_n) + margin \,
\right)
$$
Here we have:
- $\vec{a}$: ***anchor point*** that represents a `class`
- $\vec{p}$: ***positive example*** that is a `point` in the same
`class` as $\vec{a}$
- $\vec{n}$: ***negative example*** that is a `point` in another
`class` with respect to $\vec{a}$.
Optimizing here means
***having similar points near to each other*** and
***dissilimal points further from each other***, with the
***latter*** being the most important thing to do (as it's
the only *negative* term in the equation)
## SoftMarginLoss
## MultiLabelMarginLoss
$$
SoftMarginLoss(\vec{\bar{y}}, \vec{y}) =
\sum_n \frac{
\ln \left( 1 + e^{-y_i \cdot \bar{y}_i} \right)
}{
N
}
$$
This `loss` gives only ***positive*** results, thus the
optimization consist in reducing $e^{-y_i \cdot \bar{y}_i}$.
Since $\vec{y}$ has only $1$ or $-1$ as values, our strategy
is to make:
- $y_i = -1 \rightarrow \bar{y}_i >> 0$
- $y_i = 1 \rightarrow \bar{y}_i << 0$
## MultiClassHingeLoss | AKA MultiLabelMarginLoss
<!-- TODO: make examples to understand it better -->
$$
MultiClassHingeLoss(\vec{\bar{y}}, \vec{y}) =
\begin{bmatrix}
l_1 \\
l_2 \\
... \\
l_n \\
\end{bmatrix}^T;\\
l_n =
\sum_{i,j} \frac{
max(0, 1 - (\bar{y}_{n,y_{n,j}} - \bar{y}_{n,i}) \,)
}{
\text{num\_of\_classes}
}
\; \;\forall i,j \text{ with } i \neq j
$$
Essentially, it works as the [HingeLoss](#hingeembeddingloss), but
with ***multiple classes*** as
[CrossEntropyLoss](#crossentropyloss)
## CosineEmbeddingLoss
$$
CosineEmbeddingLoss(\vec{\bar{y}}_1, \vec{\bar{y}}_2, \vec{y}) =
\begin{bmatrix}
l_1 \\
l_2 \\
... \\
l_n \\
\end{bmatrix}^T;\\
l_n = \begin{cases}
1 - \cos{(\bar{y}_{n,1}, \bar{y}_{n,2})} & y_n = 1 \\
max(0, \cos{(\bar{y}_{n,1}, \bar{y}_{n,2})} - margin)
& y_n = -1
\end{cases}
$$
With this loss we make these things:
- bring the angle to $0$ between $\bar{y}_{n,1}$
and $\bar{y}_{n,2}$ when $y_n = 1$
- bring the angle to $\pi$ between $\bar{y}_{n,1}$
and $\bar{y}_{n,2}$ when $y_n = -1$, or $\frac{\pi}{2}$ if
only positive values of $\cos$ are allowed, making them
***orthogonal***
[^NLLLoss]: [Remy Lau | Towards Data Science | 4th April 2025](https://towardsdatascience.com/cross-entropy-negative-log-likelihood-and-all-that-jazz-47a95bd2e81/)
[^Anelli-CEL]: Anelli | Deep Learning PDF 4 pg. 11
[^tripletmarginloss]: [Official Paper](https://bmva-archive.org.uk/bmvc/2016/papers/paper119/paper119.pdf)