2025-11-14 12:09:44 +01:00

108 lines
3.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Learning Representations
## Linear Classifiers and Limitations
A `Linear Classifier` can be defined as an **hyperplane** with the following:
$$
\sum_{i=1}^{N} w_{i}x_{i} + b = 0
$$
<!-- TODO: Add images -->
So, for each point in the hyperspace the output is
$$
o = sign \left( \sum_{i=1}^{N} w_{i}x_{i} + b \right)
$$
This quickly gives us a separation of points in 2 classes. However
is it always possible to linearly separate $P$ `points` in $N$ `dimensions`?
According to Cover's Theorem ***"The probability that a dichotomy over $P$
points in $N$ dimensions is linearly separable goes to zero as $P$ gets
larger than $N$"***[^cover]
## Ideas to extract features generically
### Polynomial Classifier[^polinomial-sklearn]
We could try to extract features by combining all features in inputs,
however this scales poorly in higher dimensions.
Let's say that we choose a degree $d$ and the number of dimensions of
point $\vec{x}$ are $N$, we would have $N^d$ computations to do.
However most of the times we will have $d = 2$, so as long as we don't
exagerate with $d$ it is still feasible
### Space Tiling
It is a family of functions to preserve as much info as possible in images
while reducing their dimension.
### Random Projections
It consists on lowering dimensionality of data by using a random chosen
matrix. The idea is that points that are in a high dimensional space may be
projected into a low dimensional spaces preserving their most of their
distance
### Radial Basis Functions[^wikipedia-radial]
This is a method that employs a family of functions that operate over
**distance** of the input over a fixed point.
> [!CAUTION]
> They may take a point (or vector) as their input, but they'll compute a
> distance, usually the euclidean one, and then use it for the final
> computation
### Kernel Machines[^wikipedia-kernel-machines]
They are algorithms in machine learning that make use of a kernel function
which makes computation on higher dimensions possible without actually
computing higher coordinates for our points.
## Deep vs Shallow Networks
While it is **theoretically possible** to have only `shallow-networks` as
**universal predictors**,
this is **not technically possible** as it would require more **hardware**.
Instead `deep-networks` trade `time` for `space`, taking longer, but requiring **less hardware**.
Usually, for just boolean function we have a complexity of $O(n^2)$ for shallow
networks, making it scale exponentially.
> [!NOTE]
> Mainly the hardware we are talking about is both Compute Units (we don't care
> id they come from CPU, GPU or NPU. You name it) and Memory.
## Invariant Feature Learning
When we need to learn something that may have varying features (**backgrounds**, **colors**, **shapes**, etc...),
it is useful to do these steps:
1. Embed data into **high-dimensional** spaces
2. Bring **closer** data that are **similar** and **reduce-dimensions**
## Sparse Non-Linear Expansion
In this case, we break our datas, and then we aggreate things together
## Manifold Hypothesis
We can assume that whatever we are trying to represent **doesn't need that much features** but rather
**is a point of a latent space with a lower count of dimensions**.
Thus, our goal is to find this **low dimensional latent-space** and **disentangle** features, so that each
`direction` of our **latent-space** is a `feature`. Essentially, what we want to achieve with `Deep-Learning` is
a **system** capable of *learning* this **latent-space** on ***its own***.
[^cover]: Covers theorem 1966
[^polinomial-sklearn]: [Sklearn | Polynomial Features | 14th November 2025](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)
[^wikipedia-radial]: [Wikipedia | Radial Basis Function | 14th November 2025](https://en.wikipedia.org/wiki/Radial_basis_function)
[^wikipedia-kernel-machines]: [Wikipedia | Kernel Method | 14th November 2025](https://en.wikipedia.org/wiki/Kernel_method)