Learning Representations

Linear Classifiers and Limitations

A Linear Classifier can be defined as an *hyperplane with the following expression:


\sum_{i=1}^{N} w_{i}x_{i} + b = 0

So, the output is o = sign \left( \sum_{i=1}^{N} w_{i}x_{i} + b = 0 \right)

Limitations

The question is, how probable is to divide P points in N dimensions?

The probability that a dichotomy over P points in N dimensions is linearly separable goes to zero as P gets larger than $N$ [Cover’s theorem 1966]

So, once you have the number of features d, the number of dimensions is exponentially higher N^{d}.

Deep vs Shallow Networks

While it is theoretically possible to have only shallow-networks as universal predictors, this is not technically possible as it would require more hardware.

Instead deep-networks trade time for space, taking longer, but requiring less hardware

Invariant Feature Learning

When we need to learn something that may have varying features (backgrounds, colors, shapes, etc...), it is useful to do these steps:

Embed data into high-dimensional spaces
Bring closer data that are similar and reduce-dimensions

Sparse Non-Linear Expansion

In this case, we break our datas, and then we aggreate things together

Manifold Hypothesis

We can assume that whatever we are trying to represent doesn't need that much features but rather is a point of a latent space with a lower count of dimensions.

Thus, our goal is to find this low dimensional latent-space and disentangle features, so that each direction of our latent-space is a feature. Essentially, what we want to achieve with Deep-Learning is a system capable of learning this latent-space on its own.

1.8 KiB Raw Blame History Unescape Escape