From d23d847c2e9d1e25d960cd46f5fae1088ef80af3 Mon Sep 17 00:00:00 2001 From: Christian Risi <75698846+CnF-Gris@users.noreply.github.com> Date: Thu, 23 Oct 2025 17:55:09 +0200 Subject: [PATCH] Added receptive fields section and fixed some info --- Chapters/7-Convolutional-Networks/INDEX.md | 38 +++++++++++++++++----- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/Chapters/7-Convolutional-Networks/INDEX.md b/Chapters/7-Convolutional-Networks/INDEX.md index 333956f..2bdac04 100644 --- a/Chapters/7-Convolutional-Networks/INDEX.md +++ b/Chapters/7-Convolutional-Networks/INDEX.md @@ -5,14 +5,14 @@ > [!WARNING] > We apply this concept ***mainly*** to `images` -Usually, for `images`, `fcnn` (short for `f`ully -`c`onnected `n`eural `n`etworks), are not suitable, +Usually, for `images`, `fcnn` (short for **f**ully +**c**onnected **n**eural **n**etworks), are not suitable, as `images` have a ***large number of `inputs`*** that is ***highly dimensional*** (e.g. a `32x32`, `RGB` picture -has dimension of `weights`)[^anelli-convolutional-networks-1] +has dimension of 3072 data inputs)[^anelli-convolutional-networks-1] Combine this with the fact that ***nowadays pictures -have (the least) `1920x1080` pixels*** makes `FCnn` +have (the least) `1920x1080` pixels***. This makes `FCnn` ***prone to overfitting***[^anelli-convolutional-networks-1] > [!NOTE] @@ -61,13 +61,13 @@ concerning the `width` and `height`*** -#### Filters +#### Filters (aka Kernels) These are the ***work-horse*** of the whole `layer`. A filter is a ***small window that contains weights*** and produces the `outputs`. - +![Filter acting on an RGB picture that is 9x9](./pngs/convolution.png) We have a ***number of `filter` equal to the `depth` of the `output`***. @@ -80,6 +80,9 @@ Each `filter` share the same `height` and `width` and has a `depth` equal to the one in the `input`, and their `output` is usually called `activation-map`. +> [!WARNING] +> Don't forget about biases, one for each`kernel` + > [!NOTE] > Usually what the first `activation-maps` *learn* are > oriented edges, opposing colors, ecc... @@ -95,8 +98,8 @@ $$ out_{side\_len} = \frac{ in_{side\_len} - filter_{side\_len} }{ - stride + 1 -} + stride +} + 1 $$ Whenever the `stride` makes $out_{side\_len}$ ***not @@ -144,6 +147,23 @@ Pooling](#average-pooling) This `layer` ***introduces space invariance*** +## Receptive Fields[^youtube-video-receptive-fields] + +At the end of our convolution we may want our output to have been influenced by all +pixels in our picture. + +The amount of pixels that influenced our output is called receptive field and it increases +each time we do a convolution by a factor of $k - 1$ where $k$ is the kernel size. This is +due to our kernel of producing an output deriving from more inputs, thus influenced by more +pixels. + +However this means that before being able to have an output influenced by all pixels, we need to +go very deep. + +To mitigate this, we can downsample by striding. This means that we will collect more pixel +information during upper layers, even though more sparse, and thus we'll be able to get more +pixel info over deep layers. + ## Tips[^anelli-convolutional-networks-2] - `1x1` `filters` make sense. ***They allow us @@ -176,3 +196,5 @@ This `layer` ***introduces space invariance*** [^anelli-convolutional-networks-2]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 85 [^anelli-convolutional-networks-3]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 70 + +[^youtube-video-receptive-fields]: [CNN Receptive Fields | YouTube | 23rd October 2025](https://www.youtube.com/watch?v=ip2HYPC_T9Q)