From d23d847c2e9d1e25d960cd46f5fae1088ef80af3 Mon Sep 17 00:00:00 2001
From: Christian Risi <75698846+CnF-Gris@users.noreply.github.com>
Date: Thu, 23 Oct 2025 17:55:09 +0200
Subject: [PATCH] Added receptive fields section and fixed some info

---
 Chapters/7-Convolutional-Networks/INDEX.md | 38 +++++++++++++++++-----
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/Chapters/7-Convolutional-Networks/INDEX.md b/Chapters/7-Convolutional-Networks/INDEX.md
index 333956f..2bdac04 100644
--- a/Chapters/7-Convolutional-Networks/INDEX.md
+++ b/Chapters/7-Convolutional-Networks/INDEX.md
@@ -5,14 +5,14 @@
 > [!WARNING]
 > We apply this concept ***mainly*** to `images`
 
-Usually, for `images`, `fcnn` (short for `f`ully
-`c`onnected `n`eural `n`etworks), are not suitable,
+Usually, for `images`, `fcnn` (short for **f**ully
+**c**onnected **n**eural **n**etworks), are not suitable,
 as `images` have a ***large number of `inputs`*** that is
 ***highly dimensional*** (e.g. a `32x32`, `RGB` picture
-has dimension of `weights`)[^anelli-convolutional-networks-1]
+has dimension of 3072 data inputs)[^anelli-convolutional-networks-1]
 
 Combine this with the fact that ***nowadays pictures
-have (the least) `1920x1080` pixels*** makes `FCnn`
+have (the least) `1920x1080` pixels***. This makes `FCnn`
 ***prone to overfitting***[^anelli-convolutional-networks-1]
 
 > [!NOTE]
@@ -61,13 +61,13 @@ concerning the `width` and `height`***
 
 <!-- TODO: Add image -->
 
-#### Filters
+#### Filters (aka Kernels)
 
 These are the ***work-horse*** of the whole `layer`.
 A filter is a ***small window that contains weights***
 and produces the `outputs`.
 
-<!-- TODO: Add image -->
+![Filter acting on an RGB picture that is 9x9](./pngs/convolution.png)
 
 We have a ***number of `filter` equal to the `depth` of
 the `output`***.
@@ -80,6 +80,9 @@ Each `filter` share the same `height` and `width` and
 has a `depth` equal to the one in the `input`, and their
 `output` is usually called `activation-map`.
 
+> [!WARNING]
+> Don't forget about biases, one for each`kernel`
+
 > [!NOTE]
 > Usually what the first `activation-maps` *learn* are
 > oriented edges, opposing colors, ecc...
@@ -95,8 +98,8 @@ $$
 out_{side\_len} = \frac{
     in_{side\_len} - filter_{side\_len}
 }{
-    stride + 1
-}
+    stride
+} + 1
 $$
 
 Whenever the `stride` makes $out_{side\_len}$ ***not
@@ -144,6 +147,23 @@ Pooling](#average-pooling)
 
 This `layer` ***introduces space invariance***
 
+## Receptive Fields[^youtube-video-receptive-fields]
+
+At the end of our convolution we may want our output to have been influenced by all
+pixels in our picture.
+
+The amount of pixels that influenced our output is called receptive field and it increases
+each time we do a convolution by a factor of $k - 1$ where $k$ is the kernel size. This is
+due to our kernel of producing an output deriving from more inputs, thus influenced by more
+pixels.
+
+However this means that before being able to have an output influenced by all pixels, we need to
+go very deep.
+
+To mitigate this, we can downsample by striding. This means that we will collect more pixel
+information during upper layers, even though more sparse, and thus we'll be able to get more
+pixel info over deep layers.
+
 ## Tips[^anelli-convolutional-networks-2]
 
 - `1x1` `filters` make sense. ***They allow us
@@ -176,3 +196,5 @@ This `layer` ***introduces space invariance***
 [^anelli-convolutional-networks-2]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 85
 
 [^anelli-convolutional-networks-3]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 70
+
+[^youtube-video-receptive-fields]: [CNN Receptive Fields | YouTube | 23rd October 2025](https://www.youtube.com/watch?v=ip2HYPC_T9Q)