Added Convolutional Networks

2025-04-24 13:22:58 +02:00
parent ac20c47e5a
commit 9c8caf4120
1 changed files with 178 additions and 0 deletions
--- a/Chapters/7-Convolutional-Networks/INDEX.md
+++ b/Chapters/7-Convolutional-Networks/INDEX.md
@@ -0,0 +1,178 @@
+# Convolutional Networks[^anelli-convolutional-networks]
+
+<!-- TODO: Add Images -->
+
+> [!WARNING]
+> We apply this concept ***mainly*** to `images`
+
+Usually, for `images`, `fcnn` (short for `f`ully
+`c`onnected `n`eural `n`etworks), are not suitable,
+as `images` have a ***large number of `inputs`*** that is
+***highly dimensional*** (e.g. a `32x32`, `RGB` picture
+has dimension of `weights`)[^anelli-convolutional-networks-1]
+
+Combine this with the fact that ***nowadays pictures
+have (the least) `1920x1080` pixels*** makes `FCnn`
+***prone to overfitting***[^anelli-convolutional-networks-1]
+
+> [!NOTE]
+>
+> - From here on `depth` is the **3rd dimention of the
+> activation voulume**
+> - `FCnn` are just ***traditional `NeuralNetworks`
+>
+
+## ConvNet
+
+The basic network we can achieve with a
+`convolutional-layer` is a `ConvNet`.
+
+<!-- TODO: Insert mermaid or image -->
+
+It is composed of:
+
+<!-- TODO: Add links -->
+
+1. `input` (picture)
+2. [`Convolutional Layer`](#convolutional-layer)
+3. [`ReLU`](./../3-Activation-Functions/INDEX.md#relu)
+4. [`Pooling layer`](#pooling-layer)
+5. `FCnn` (Normal `NeuralNetork`)
+6. `output` (classes tags)
+
+<!-- TODO: Add PDF 7 pg 7-8 -->
+
+## Building Blocks
+
+### Convolutional Layer
+
+`Convolutional Layers` are `layers` that ***reduce the
+size of the computational load*** by creating
+`activation maps` ***computed starting from a `subset` of
+all the available `data`***
+
+#### Local Connectivity
+
+To achieve such thing, we introduce the concept of
+`local connectivity`. Basically ***each `output` is
+linked with a `volume` smaller than the original one
+concerning the `width` and `height`***
+(the `depth` is always fully connected)
+
+<!-- TODO: Add image -->
+
+#### Filters
+
+These are the ***work-horse*** of the whole `layer`.
+A filter is a ***small window that contains weights***
+and produces the `outputs`.
+
+<!-- TODO: Add image -->
+
+We have a ***number of `filter` equal to the `depth` of
+the `output`***.
+This means that ***each `output-value` at
+the same `depth` has been generated by the same `filter`***, and as such,
+***any `volume` shares `weights`
+across a single `depth`***.
+
+Each `filter` share the same `height` and `width` and
+has a `depth` equal to the one in the `input`, and their
+`output` is usually called `activation-map`.
+
+> [!NOTE]
+> Usually what the first `activation-maps` *learn* are
+> oriented edges, opposing colors, ecc...
+
+Another parameter for `filters` is the `stride`, which
+is basically the number of "hops" made from one
+convolution and another.
+
+The formula to determine the `output` size for any side
+is:
+
+$$
+out_{side\_len} = \frac{
+    in_{side\_len} - filter_{side\_len}
+}{
+    stride + 1
+}
+$$
+
+Whenever the `stride` makes $out_{side\_len}$ ***not
+an integer value, we add $0$ `padding`***
+to correct this.
+
+> [!NOTE]
+>
+> To avoid downsizing, it is not uncommon to apply a
+> $0$ padding of size 1 (per dimension) before applying
+> a `filter` with `stride` equal to 1
+>
+> However, for a ***fast downsizing*** we can increment
+> `striding`
+
+> [!CAUTION]
+> Don't shrink too fast, it doesn't bring good results
+
+### Pooling Layer[^pooling-layer-wikipedia]
+
+It ***downsamples the image without resorting to
+`learnable-parameters`***
+
+<!-- TODO: Insert image -->
+
+There are many `algorithms` to make this `layer`, as:
+
+#### Max Pooling
+
+Takes the max element in the `window`
+
+#### Average Pooling
+
+Takes the average of elements in the `window`
+
+#### Mixed Pooling
+
+Linear sum of [Max Pooling](#max-pooling) and [Average
+Pooling](#average-pooling)
+
+> [!NOTE]
+> This list is **NOT EXHAUSTIVE**, please refer to
+> [this article](https://en.wikipedia.org/wiki/Pooling_layer)
+> to know more.
+
+This `layer` ***introduces space invariance***
+
+## Tips[^anelli-convolutional-networks-2]
+
+- `1x1` `filters` make sense. ***They allow us
+    to reduce the `depth` of the next `volume`***
+- ***Trends goes towards increasing the `depth` and
+    having smaller `filters`***
+- ***The trend is to remove
+    [`pooling-layers`](#pooling-layer) and use only
+    [`convolutional-layers`](#convolutional-layer)***
+- ***Common settings for
+  [`convolutional-layers`](#convolutional-layer) are:***
+    - number of filters: $K = 2^{a}$
+    [^anelli-convolutional-networks-3]
+    - tuple of `filter-size` $F$ `stride` $S$,
+    `0-padding` $P$:
+        - (3, 1, 1)
+        - (5, 1, 2)
+        - (5, 2, *whatever fits*)
+        - (1, 1, 0)
+- See ResNet/GoogLeNet
+
+
+<!-- Footnotes -->
+[^anelli-convolutional-networks]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7
+
+[^anelli-convolutional-networks-1]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 2
+
+[^pooling-layer-wikipedia]: [Pooling Layer | Wikipedia | 22nd April 2025](https://en.wikipedia.org/wiki/Pooling_layer)
+
+[^anelli-convolutional-networks-2]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 85
+
+[^anelli-convolutional-networks-3]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 70