Revised notes and added citations

2025-11-14 12:09:44 +01:00
parent 434e4cdd0e
commit e07a80649a
1 changed files with 68 additions and 10 deletions
--- a/Chapters/2-Learning-Representations/INDEX.md
+++ b/Chapters/2-Learning-Representations/INDEX.md
@@ -2,30 +2,80 @@

 ## Linear Classifiers and Limitations

-A `Linear Classifier` can be defined as an ***hyperplane** with the following expression:
+A `Linear Classifier` can be defined as an **hyperplane** with the following:

 $$
 \sum_{i=1}^{N} w_{i}x_{i} + b = 0
 $$
 <!-- TODO: Add images -->
-So, the output is $o = sign \left( \sum_{i=1}^{N} w_{i}x_{i} + b = 0 \right)$
+So, for each point in the hyperspace the output is

-### Limitations
+$$
+o = sign \left( \sum_{i=1}^{N} w_{i}x_{i} + b  \right)
+$$

-The question is, how probable is to divide $P$ `points` in $N$ `dimensions`?
+This quickly gives us a separation of points in 2 classes. However
+is it always possible to linearly separate $P$ `points` in $N$ `dimensions`?

->*The probability that a dichotomy over $P$ points in $N$ dimensions is* ***linearly separable***
->*goes to zero as $P$ gets larger than $N$* [Cover’s theorem 1966]
+According to Cover's Theorem ***"The probability that a dichotomy over $P$
+points in $N$ dimensions is linearly separable goes to zero as $P$ gets
+ larger than $N$"***[^cover]

-So, once you have the number of features $d$, the number of dimensions is **exponentially higher**
-$N^{d}$.
+## Ideas to extract features generically
+
+### Polynomial Classifier[^polinomial-sklearn]
+
+We could try to extract features by combining all features in inputs,
+however this scales poorly in higher dimensions.
+
+Let's say that we choose a degree $d$ and the number of dimensions of
+point $\vec{x}$ are $N$, we would have $N^d$ computations to do.
+
+However most of the times we will have $d = 2$, so as long as we don't
+exagerate with $d$ it is still feasible
+
+### Space Tiling
+
+It is a family of functions to preserve as much info as possible in images
+while reducing their dimension.
+
+### Random Projections
+
+It consists on lowering dimensionality of data by using a random chosen
+matrix. The idea is that points that are in a high dimensional space may be
+projected into a low dimensional spaces preserving their most of their
+distance
+
+### Radial Basis Functions[^wikipedia-radial]
+
+This is a method that employs a family of functions that operate over
+**distance** of the input over a fixed point.
+
+> [!CAUTION]
+> They may take a point (or vector) as their input, but they'll compute a
+> distance, usually the euclidean one, and then use it for the final
+> computation
+
+### Kernel Machines[^wikipedia-kernel-machines]
+
+They are algorithms in machine learning that make use of a kernel function
+which makes computation on higher dimensions possible without actually
+computing higher coordinates for our points.

 ## Deep vs Shallow Networks

-While it is **theoretically possible** to have only `shallow-networks` as **universal predictors**,
+While it is **theoretically possible** to have only `shallow-networks` as
+**universal predictors**,
 this is **not technically possible** as it would require more **hardware**.

-Instead `deep-networks` trade `time` for `space`, taking longer, but requiring **less hardware**
+Instead `deep-networks` trade `time` for `space`, taking longer, but requiring **less hardware**.
+
+Usually, for just boolean function we have a complexity of $O(n^2)$ for shallow
+networks, making it scale exponentially.
+
+> [!NOTE]
+> Mainly the hardware we are talking about is both Compute Units (we don't care
+> id they come from CPU, GPU or NPU. You name it) and Memory.

 ## Invariant Feature Learning

@@ -47,3 +97,11 @@ We can assume that whatever we are trying to represent **doesn't need that much
 Thus, our goal is to find this **low dimensional latent-space** and **disentangle** features, so that each
 `direction` of our **latent-space** is a `feature`. Essentially, what we want to achieve with `Deep-Learning` is
 a **system** capable of *learning* this **latent-space** on ***its own***.
+
+[^cover]: Cover’s theorem 1966
+
+[^polinomial-sklearn]: [Sklearn | Polynomial Features | 14th November 2025](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)
+
+[^wikipedia-radial]: [Wikipedia | Radial Basis Function | 14th November 2025](https://en.wikipedia.org/wiki/Radial_basis_function)
+
+[^wikipedia-kernel-machines]: [Wikipedia | Kernel Method | 14th November 2025](https://en.wikipedia.org/wiki/Kernel_method)