1.8 KiB
Learning Representations
Linear Classifiers and Limitations
A Linear Classifier can be defined as an *hyperplane with the following expression:
\sum_{i=1}^{N} w_{i}x_{i} + b = 0
So, the output is o = sign \left( \sum_{i=1}^{N} w_{i}x_{i} + b = 0 \right)
Limitations
The question is, how probable is to divide P points in N dimensions?
The probability that a dichotomy over
Ppoints inNdimensions is linearly separable goes to zero asPgets larger than $N$ [Cover’s theorem 1966]
So, once you have the number of features d, the number of dimensions is exponentially higher
N^{d}.
Deep vs Shallow Networks
While it is theoretically possible to have only shallow-networks as universal predictors,
this is not technically possible as it would require more hardware.
Instead deep-networks trade time for space, taking longer, but requiring less hardware
Invariant Feature Learning
When we need to learn something that may have varying features (backgrounds, colors, shapes, etc...), it is useful to do these steps:
- Embed data into high-dimensional spaces
- Bring closer data that are similar and reduce-dimensions
Sparse Non-Linear Expansion
In this case, we break our datas, and then we aggreate things together
Manifold Hypothesis
We can assume that whatever we are trying to represent doesn't need that much features but rather is a point of a latent space with a lower count of dimensions.
Thus, our goal is to find this low dimensional latent-space and disentangle features, so that each
direction of our latent-space is a feature. Essentially, what we want to achieve with Deep-Learning is
a system capable of learning this latent-space on its own.