diff --git a/Chapters/7-Convolutional-Networks/INDEX.md b/Chapters/7-Convolutional-Networks/INDEX.md new file mode 100644 index 0000000..333956f --- /dev/null +++ b/Chapters/7-Convolutional-Networks/INDEX.md @@ -0,0 +1,178 @@ +# Convolutional Networks[^anelli-convolutional-networks] + + + +> [!WARNING] +> We apply this concept ***mainly*** to `images` + +Usually, for `images`, `fcnn` (short for `f`ully +`c`onnected `n`eural `n`etworks), are not suitable, +as `images` have a ***large number of `inputs`*** that is +***highly dimensional*** (e.g. a `32x32`, `RGB` picture +has dimension of `weights`)[^anelli-convolutional-networks-1] + +Combine this with the fact that ***nowadays pictures +have (the least) `1920x1080` pixels*** makes `FCnn` +***prone to overfitting***[^anelli-convolutional-networks-1] + +> [!NOTE] +> +> - From here on `depth` is the **3rd dimention of the +> activation voulume** +> - `FCnn` are just ***traditional `NeuralNetworks` +> + +## ConvNet + +The basic network we can achieve with a +`convolutional-layer` is a `ConvNet`. + + + +It is composed of: + + + +1. `input` (picture) +2. [`Convolutional Layer`](#convolutional-layer) +3. [`ReLU`](./../3-Activation-Functions/INDEX.md#relu) +4. [`Pooling layer`](#pooling-layer) +5. `FCnn` (Normal `NeuralNetork`) +6. `output` (classes tags) + + + +## Building Blocks + +### Convolutional Layer + +`Convolutional Layers` are `layers` that ***reduce the +size of the computational load*** by creating +`activation maps` ***computed starting from a `subset` of +all the available `data`*** + +#### Local Connectivity + +To achieve such thing, we introduce the concept of +`local connectivity`. Basically ***each `output` is +linked with a `volume` smaller than the original one +concerning the `width` and `height`*** +(the `depth` is always fully connected) + + + +#### Filters + +These are the ***work-horse*** of the whole `layer`. +A filter is a ***small window that contains weights*** +and produces the `outputs`. + + + +We have a ***number of `filter` equal to the `depth` of +the `output`***. +This means that ***each `output-value` at +the same `depth` has been generated by the same `filter`***, and as such, +***any `volume` shares `weights` +across a single `depth`***. + +Each `filter` share the same `height` and `width` and +has a `depth` equal to the one in the `input`, and their +`output` is usually called `activation-map`. + +> [!NOTE] +> Usually what the first `activation-maps` *learn* are +> oriented edges, opposing colors, ecc... + +Another parameter for `filters` is the `stride`, which +is basically the number of "hops" made from one +convolution and another. + +The formula to determine the `output` size for any side +is: + +$$ +out_{side\_len} = \frac{ + in_{side\_len} - filter_{side\_len} +}{ + stride + 1 +} +$$ + +Whenever the `stride` makes $out_{side\_len}$ ***not +an integer value, we add $0$ `padding`*** +to correct this. + +> [!NOTE] +> +> To avoid downsizing, it is not uncommon to apply a +> $0$ padding of size 1 (per dimension) before applying +> a `filter` with `stride` equal to 1 +> +> However, for a ***fast downsizing*** we can increment +> `striding` + +> [!CAUTION] +> Don't shrink too fast, it doesn't bring good results + +### Pooling Layer[^pooling-layer-wikipedia] + +It ***downsamples the image without resorting to +`learnable-parameters`*** + + + +There are many `algorithms` to make this `layer`, as: + +#### Max Pooling + +Takes the max element in the `window` + +#### Average Pooling + +Takes the average of elements in the `window` + +#### Mixed Pooling + +Linear sum of [Max Pooling](#max-pooling) and [Average +Pooling](#average-pooling) + +> [!NOTE] +> This list is **NOT EXHAUSTIVE**, please refer to +> [this article](https://en.wikipedia.org/wiki/Pooling_layer) +> to know more. + +This `layer` ***introduces space invariance*** + +## Tips[^anelli-convolutional-networks-2] + +- `1x1` `filters` make sense. ***They allow us + to reduce the `depth` of the next `volume`*** +- ***Trends goes towards increasing the `depth` and + having smaller `filters`*** +- ***The trend is to remove + [`pooling-layers`](#pooling-layer) and use only + [`convolutional-layers`](#convolutional-layer)*** +- ***Common settings for + [`convolutional-layers`](#convolutional-layer) are:*** + - number of filters: $K = 2^{a}$ + [^anelli-convolutional-networks-3] + - tuple of `filter-size` $F$ `stride` $S$, + `0-padding` $P$: + - (3, 1, 1) + - (5, 1, 2) + - (5, 2, *whatever fits*) + - (1, 1, 0) +- See ResNet/GoogLeNet + + + +[^anelli-convolutional-networks]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 + +[^anelli-convolutional-networks-1]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 2 + +[^pooling-layer-wikipedia]: [Pooling Layer | Wikipedia | 22nd April 2025](https://en.wikipedia.org/wiki/Pooling_layer) + +[^anelli-convolutional-networks-2]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 85 + +[^anelli-convolutional-networks-3]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 70