Added receptive fields section and fixed some info

2025-10-23 17:55:09 +02:00
parent fc7cefb93e
commit d23d847c2e
1 changed files with 30 additions and 8 deletions
--- a/Chapters/7-Convolutional-Networks/INDEX.md
+++ b/Chapters/7-Convolutional-Networks/INDEX.md
@@ -5,14 +5,14 @@
 > [!WARNING]
 > We apply this concept ***mainly*** to `images`

-Usually, for `images`, `fcnn` (short for `f`ully
-`c`onnected `n`eural `n`etworks), are not suitable,
+Usually, for `images`, `fcnn` (short for **f**ully
+**c**onnected **n**eural **n**etworks), are not suitable,
 as `images` have a ***large number of `inputs`*** that is
 ***highly dimensional*** (e.g. a `32x32`, `RGB` picture
-has dimension of `weights`)[^anelli-convolutional-networks-1]
+has dimension of 3072 data inputs)[^anelli-convolutional-networks-1]

 Combine this with the fact that ***nowadays pictures
-have (the least) `1920x1080` pixels*** makes `FCnn`
+have (the least) `1920x1080` pixels***. This makes `FCnn`
 ***prone to overfitting***[^anelli-convolutional-networks-1]

 > [!NOTE]
@@ -61,13 +61,13 @@ concerning the `width` and `height`***

 <!-- TODO: Add image -->

-#### Filters
+#### Filters (aka Kernels)

 These are the ***work-horse*** of the whole `layer`.
 A filter is a ***small window that contains weights***
 and produces the `outputs`.

-<!-- TODO: Add image -->
+![Filter acting on an RGB picture that is 9x9](./pngs/convolution.png)

 We have a ***number of `filter` equal to the `depth` of
 the `output`***.
@@ -80,6 +80,9 @@ Each `filter` share the same `height` and `width` and
 has a `depth` equal to the one in the `input`, and their
 `output` is usually called `activation-map`.

+> [!WARNING]
+> Don't forget about biases, one for each`kernel`
+
 > [!NOTE]
 > Usually what the first `activation-maps` *learn* are
 > oriented edges, opposing colors, ecc...
@@ -95,8 +98,8 @@ $$
 out_{side\_len} = \frac{
    in_{side\_len} - filter_{side\_len}
 }{
-    stride + 1
-}
+    stride
+} + 1
 $$

 Whenever the `stride` makes $out_{side\_len}$ ***not
@@ -144,6 +147,23 @@ Pooling](#average-pooling)

 This `layer` ***introduces space invariance***

+## Receptive Fields[^youtube-video-receptive-fields]
+
+At the end of our convolution we may want our output to have been influenced by all
+pixels in our picture.
+
+The amount of pixels that influenced our output is called receptive field and it increases
+each time we do a convolution by a factor of $k - 1$ where $k$ is the kernel size. This is
+due to our kernel of producing an output deriving from more inputs, thus influenced by more
+pixels.
+
+However this means that before being able to have an output influenced by all pixels, we need to
+go very deep.
+
+To mitigate this, we can downsample by striding. This means that we will collect more pixel
+information during upper layers, even though more sparse, and thus we'll be able to get more
+pixel info over deep layers.
+
 ## Tips[^anelli-convolutional-networks-2]

 - `1x1` `filters` make sense. ***They allow us
@@ -176,3 +196,5 @@ This `layer` ***introduces space invariance***
 [^anelli-convolutional-networks-2]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 85

 [^anelli-convolutional-networks-3]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 70
+
+[^youtube-video-receptive-fields]: [CNN Receptive Fields | YouTube | 23rd October 2025](https://www.youtube.com/watch?v=ip2HYPC_T9Q)