Added 2nd Chapter
This commit is contained in:
parent
ed09a0b9ee
commit
f4ede64690
49
Chapters/2-Learning-Representations/INDEX.md
Normal file
49
Chapters/2-Learning-Representations/INDEX.md
Normal file
@ -0,0 +1,49 @@
|
|||||||
|
# Learning Representations
|
||||||
|
|
||||||
|
## Linear Classifiers and Limitations
|
||||||
|
|
||||||
|
A `Linear Classifier` can be defined as an ***hyperplane** with the following expression:
|
||||||
|
|
||||||
|
$$
|
||||||
|
\sum_{i=1}^{N} w_{i}x_{i} + b = 0
|
||||||
|
$$
|
||||||
|
<!-- TODO: Add images -->
|
||||||
|
So, the output is $o = sign \left( \sum_{i=1}^{N} w_{i}x_{i} + b = 0 \right)$
|
||||||
|
|
||||||
|
### Limitations
|
||||||
|
|
||||||
|
The question is, how probable is to divide $P$ `points` in $N$ `dimensions`?
|
||||||
|
|
||||||
|
>*The probability that a dichotomy over $P$ points in $N$ dimensions is* ***linearly separable***
|
||||||
|
>*goes to zero as $P$ gets larger than $N$* [Cover’s theorem 1966]
|
||||||
|
|
||||||
|
So, once you have the number of features $d$, the number of dimensions is **exponentially higher**
|
||||||
|
$N^{d}$.
|
||||||
|
|
||||||
|
## Deep vs Shallow Networks
|
||||||
|
|
||||||
|
While it is **theoretically possible** to have only `shallow-networks` as **universal predictors**,
|
||||||
|
this is **not technically possible** as it would require more **hardware**.
|
||||||
|
|
||||||
|
Instead `deep-networks` trade `time` for `space`, taking longer, but requiring **less hardware**
|
||||||
|
|
||||||
|
## Invariant Feature Learning
|
||||||
|
|
||||||
|
When we need to learn something that may have varying features (**backgrounds**, **colors**, **shapes**, etc...),
|
||||||
|
it is useful to do these steps:
|
||||||
|
|
||||||
|
1. Embed data into **high-dimensional** spaces
|
||||||
|
2. Bring **closer** data that are **similar** and **reduce-dimensions**
|
||||||
|
|
||||||
|
## Sparse Non-Linear Expansion
|
||||||
|
|
||||||
|
In this case, we break our datas, and then we aggreate things together
|
||||||
|
|
||||||
|
## Manifold Hypothesis
|
||||||
|
|
||||||
|
We can assume that whatever we are trying to represent **doesn't need that much features** but rather
|
||||||
|
**is a point of a latent space with a lower count of dimensions**.
|
||||||
|
|
||||||
|
Thus, our goal is to find this **low dimensional latent-space** and **disentangle** features, so that each
|
||||||
|
`direction` of our **latent-space** is a `feature`. Essentially, what we want to achieve with `Deep-Learning` is
|
||||||
|
a **system** capable of *learning* this **latent-space** on ***its own***.
|
||||||
Loading…
x
Reference in New Issue
Block a user