Added receptive fields section and fixed some info
This commit is contained in:
parent
fc7cefb93e
commit
d23d847c2e
@ -5,14 +5,14 @@
|
|||||||
> [!WARNING]
|
> [!WARNING]
|
||||||
> We apply this concept ***mainly*** to `images`
|
> We apply this concept ***mainly*** to `images`
|
||||||
|
|
||||||
Usually, for `images`, `fcnn` (short for `f`ully
|
Usually, for `images`, `fcnn` (short for **f**ully
|
||||||
`c`onnected `n`eural `n`etworks), are not suitable,
|
**c**onnected **n**eural **n**etworks), are not suitable,
|
||||||
as `images` have a ***large number of `inputs`*** that is
|
as `images` have a ***large number of `inputs`*** that is
|
||||||
***highly dimensional*** (e.g. a `32x32`, `RGB` picture
|
***highly dimensional*** (e.g. a `32x32`, `RGB` picture
|
||||||
has dimension of `weights`)[^anelli-convolutional-networks-1]
|
has dimension of 3072 data inputs)[^anelli-convolutional-networks-1]
|
||||||
|
|
||||||
Combine this with the fact that ***nowadays pictures
|
Combine this with the fact that ***nowadays pictures
|
||||||
have (the least) `1920x1080` pixels*** makes `FCnn`
|
have (the least) `1920x1080` pixels***. This makes `FCnn`
|
||||||
***prone to overfitting***[^anelli-convolutional-networks-1]
|
***prone to overfitting***[^anelli-convolutional-networks-1]
|
||||||
|
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
@ -61,13 +61,13 @@ concerning the `width` and `height`***
|
|||||||
|
|
||||||
<!-- TODO: Add image -->
|
<!-- TODO: Add image -->
|
||||||
|
|
||||||
#### Filters
|
#### Filters (aka Kernels)
|
||||||
|
|
||||||
These are the ***work-horse*** of the whole `layer`.
|
These are the ***work-horse*** of the whole `layer`.
|
||||||
A filter is a ***small window that contains weights***
|
A filter is a ***small window that contains weights***
|
||||||
and produces the `outputs`.
|
and produces the `outputs`.
|
||||||
|
|
||||||
<!-- TODO: Add image -->
|

|
||||||
|
|
||||||
We have a ***number of `filter` equal to the `depth` of
|
We have a ***number of `filter` equal to the `depth` of
|
||||||
the `output`***.
|
the `output`***.
|
||||||
@ -80,6 +80,9 @@ Each `filter` share the same `height` and `width` and
|
|||||||
has a `depth` equal to the one in the `input`, and their
|
has a `depth` equal to the one in the `input`, and their
|
||||||
`output` is usually called `activation-map`.
|
`output` is usually called `activation-map`.
|
||||||
|
|
||||||
|
> [!WARNING]
|
||||||
|
> Don't forget about biases, one for each`kernel`
|
||||||
|
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> Usually what the first `activation-maps` *learn* are
|
> Usually what the first `activation-maps` *learn* are
|
||||||
> oriented edges, opposing colors, ecc...
|
> oriented edges, opposing colors, ecc...
|
||||||
@ -95,8 +98,8 @@ $$
|
|||||||
out_{side\_len} = \frac{
|
out_{side\_len} = \frac{
|
||||||
in_{side\_len} - filter_{side\_len}
|
in_{side\_len} - filter_{side\_len}
|
||||||
}{
|
}{
|
||||||
stride + 1
|
stride
|
||||||
}
|
} + 1
|
||||||
$$
|
$$
|
||||||
|
|
||||||
Whenever the `stride` makes $out_{side\_len}$ ***not
|
Whenever the `stride` makes $out_{side\_len}$ ***not
|
||||||
@ -144,6 +147,23 @@ Pooling](#average-pooling)
|
|||||||
|
|
||||||
This `layer` ***introduces space invariance***
|
This `layer` ***introduces space invariance***
|
||||||
|
|
||||||
|
## Receptive Fields[^youtube-video-receptive-fields]
|
||||||
|
|
||||||
|
At the end of our convolution we may want our output to have been influenced by all
|
||||||
|
pixels in our picture.
|
||||||
|
|
||||||
|
The amount of pixels that influenced our output is called receptive field and it increases
|
||||||
|
each time we do a convolution by a factor of $k - 1$ where $k$ is the kernel size. This is
|
||||||
|
due to our kernel of producing an output deriving from more inputs, thus influenced by more
|
||||||
|
pixels.
|
||||||
|
|
||||||
|
However this means that before being able to have an output influenced by all pixels, we need to
|
||||||
|
go very deep.
|
||||||
|
|
||||||
|
To mitigate this, we can downsample by striding. This means that we will collect more pixel
|
||||||
|
information during upper layers, even though more sparse, and thus we'll be able to get more
|
||||||
|
pixel info over deep layers.
|
||||||
|
|
||||||
## Tips[^anelli-convolutional-networks-2]
|
## Tips[^anelli-convolutional-networks-2]
|
||||||
|
|
||||||
- `1x1` `filters` make sense. ***They allow us
|
- `1x1` `filters` make sense. ***They allow us
|
||||||
@ -176,3 +196,5 @@ This `layer` ***introduces space invariance***
|
|||||||
[^anelli-convolutional-networks-2]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 85
|
[^anelli-convolutional-networks-2]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 85
|
||||||
|
|
||||||
[^anelli-convolutional-networks-3]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 70
|
[^anelli-convolutional-networks-3]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 70
|
||||||
|
|
||||||
|
[^youtube-video-receptive-fields]: [CNN Receptive Fields | YouTube | 23rd October 2025](https://www.youtube.com/watch?v=ip2HYPC_T9Q)
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user