Added receptive fields section and fixed some info

2025-10-23 17:55:09 +02:00
parent fc7cefb93e
commit d23d847c2e
1 changed files with 30 additions and 8 deletions
--- a/Chapters/7-Convolutional-Networks/INDEX.md
+++ b/Chapters/7-Convolutional-Networks/INDEX.md
@@ -5,14 +5,14 @@
 > [!WARNING]
 > We apply this concept ***mainly*** to `images`
-Usually, for `images`, `fcnn` (short for `f`ully
+Usually, for `images`, `fcnn` (short for **f**ully
-`c`onnected `n`eural `n`etworks), are not suitable,
+**c**onnected **n**eural **n**etworks), are not suitable,
 as `images` have a ***large number of `inputs`*** that is
 ***highly dimensional*** (e.g. a `32x32`, `RGB` picture
-has dimension of `weights`)[^anelli-convolutional-networks-1]
+has dimension of 3072 data inputs)[^anelli-convolutional-networks-1]
 Combine this with the fact that ***nowadays pictures
-have (the least) `1920x1080` pixels*** makes `FCnn`
+have (the least) `1920x1080` pixels***. This makes `FCnn`
 ***prone to overfitting***[^anelli-convolutional-networks-1]
 > [!NOTE]
@@ -61,13 +61,13 @@ concerning the `width` and `height`***
 <!-- TODO: Add image -->
-#### Filters
+#### Filters (aka Kernels)
 These are the ***work-horse*** of the whole `layer`.
 A filter is a ***small window that contains weights***
 and produces the `outputs`.
-<!-- TODO: Add image -->
+![Filter acting on an RGB picture that is 9x9](./pngs/convolution.png)
 We have a ***number of `filter` equal to the `depth` of
 the `output`***.
@@ -80,6 +80,9 @@ Each `filter` share the same `height` and `width` and
 has a `depth` equal to the one in the `input`, and their
 `output` is usually called `activation-map`.
 > [!WARNING]
 > Don't forget about biases, one for each`kernel`
 > [!NOTE]
 > Usually what the first `activation-maps` *learn* are
 > oriented edges, opposing colors, ecc...
@@ -95,8 +98,8 @@ $$
 out_{side\_len} = \frac{
    in_{side\_len} - filter_{side\_len}
 }{
-    stride + 1
+    stride
-}
+} + 1
 $$
 Whenever the `stride` makes $out_{side\_len}$ ***not
@@ -144,6 +147,23 @@ Pooling](#average-pooling)
 This `layer` ***introduces space invariance***
 ## Receptive Fields[^youtube-video-receptive-fields]
 At the end of our convolution we may want our output to have been influenced by all
 pixels in our picture.
 The amount of pixels that influenced our output is called receptive field and it increases
 each time we do a convolution by a factor of $k - 1$ where $k$ is the kernel size. This is
 due to our kernel of producing an output deriving from more inputs, thus influenced by more
 pixels.
 However this means that before being able to have an output influenced by all pixels, we need to
 go very deep.
 To mitigate this, we can downsample by striding. This means that we will collect more pixel
 information during upper layers, even though more sparse, and thus we'll be able to get more
 pixel info over deep layers.
 ## Tips[^anelli-convolutional-networks-2]
 - `1x1` `filters` make sense. ***They allow us
@@ -176,3 +196,5 @@ This `layer` ***introduces space invariance***
 [^anelli-convolutional-networks-2]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 85
 [^anelli-convolutional-networks-3]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 70
 [^youtube-video-receptive-fields]: [CNN Receptive Fields | YouTube | 23rd October 2025](https://www.youtube.com/watch?v=ip2HYPC_T9Q)