Deep-Learning/Chapters/7-Convolutional-Networks/INDEX.md

# Convolutional Networks[^anelli-convolutional-networks]

<!-- TODO: Add Images -->

> [!WARNING]
> We apply this concept ***mainly*** to `images`

Usually, for `images`, `fcnn` (short for `f`ully
`c`onnected `n`eural `n`etworks), are not suitable,
as `images` have a ***large number of `inputs`*** that is
***highly dimensional*** (e.g. a `32x32`, `RGB` picture
has dimension of `weights`)[^anelli-convolutional-networks-1]

Combine this with the fact that ***nowadays pictures
have (the least) `1920x1080` pixels*** makes `FCnn`
***prone to overfitting***[^anelli-convolutional-networks-1]

> [!NOTE]
>
> - From here on `depth` is the **3rd dimention of the
> activation voulume**
> - `FCnn` are just ***traditional `NeuralNetworks`
>

## ConvNet

The basic network we can achieve with a
`convolutional-layer` is a `ConvNet`.

<!-- TODO: Insert mermaid or image -->

It is composed of:

<!-- TODO: Add links -->

1. `input` (picture)
2. [`Convolutional Layer`](#convolutional-layer)
3. [`ReLU`](./../3-Activation-Functions/INDEX.md#relu)
4. [`Pooling layer`](#pooling-layer)
5. `FCnn` (Normal `NeuralNetork`)
6. `output` (classes tags)

<!-- TODO: Add PDF 7 pg 7-8 -->

## Building Blocks

### Convolutional Layer

`Convolutional Layers` are `layers` that ***reduce the
size of the computational load*** by creating
`activation maps` ***computed starting from a `subset` of
all the available `data`***

#### Local Connectivity

To achieve such thing, we introduce the concept of
`local connectivity`. Basically ***each `output` is
linked with a `volume` smaller than the original one
concerning the `width` and `height`***
(the `depth` is always fully connected)

<!-- TODO: Add image -->

#### Filters

These are the ***work-horse*** of the whole `layer`.
A filter is a ***small window that contains weights***
and produces the `outputs`.

<!-- TODO: Add image -->

We have a ***number of `filter` equal to the `depth` of
the `output`***.
This means that ***each `output-value` at
the same `depth` has been generated by the same `filter`***, and as such,
***any `volume` shares `weights`
across a single `depth`***.

Each `filter` share the same `height` and `width` and
has a `depth` equal to the one in the `input`, and their
`output` is usually called `activation-map`.

> [!NOTE]
> Usually what the first `activation-maps` *learn* are
> oriented edges, opposing colors, ecc...

Another parameter for `filters` is the `stride`, which
is basically the number of "hops" made from one
convolution and another.

The formula to determine the `output` size for any side
is:

$$
out_{side\_len} = \frac{
    in_{side\_len} - filter_{side\_len}
}{
    stride + 1
}
$$

Whenever the `stride` makes $out_{side\_len}$ ***not
an integer value, we add $0$ `padding`***
to correct this.

> [!NOTE]
>
> To avoid downsizing, it is not uncommon to apply a
> $0$ padding of size 1 (per dimension) before applying
> a `filter` with `stride` equal to 1
>
> However, for a ***fast downsizing*** we can increment
> `striding`

> [!CAUTION]
> Don't shrink too fast, it doesn't bring good results

### Pooling Layer[^pooling-layer-wikipedia]

It ***downsamples the image without resorting to
`learnable-parameters`***

<!-- TODO: Insert image -->

There are many `algorithms` to make this `layer`, as:

#### Max Pooling

Takes the max element in the `window`

#### Average Pooling

Takes the average of elements in the `window`

#### Mixed Pooling

Linear sum of [Max Pooling](#max-pooling) and [Average
Pooling](#average-pooling)

> [!NOTE]
> This list is **NOT EXHAUSTIVE**, please refer to
> [this article](https://en.wikipedia.org/wiki/Pooling_layer)
> to know more.

This `layer` ***introduces space invariance***

## Tips[^anelli-convolutional-networks-2]

- `1x1` `filters` make sense. ***They allow us
    to reduce the `depth` of the next `volume`***
- ***Trends goes towards increasing the `depth` and
    having smaller `filters`***
- ***The trend is to remove
    [`pooling-layers`](#pooling-layer) and use only
    [`convolutional-layers`](#convolutional-layer)***
- ***Common settings for
  [`convolutional-layers`](#convolutional-layer) are:***
    - number of filters: $K = 2^{a}$
    [^anelli-convolutional-networks-3]
    - tuple of `filter-size` $F$ `stride` $S$,
    `0-padding` $P$:
        - (3, 1, 1)
        - (5, 1, 2)
        - (5, 2, *whatever fits*)
        - (1, 1, 0)
- See ResNet/GoogLeNet


<!-- Footnotes -->
[^anelli-convolutional-networks]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7

[^anelli-convolutional-networks-1]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 2

[^pooling-layer-wikipedia]: [Pooling Layer | Wikipedia | 22nd April 2025](https://en.wikipedia.org/wiki/Pooling_layer)

[^anelli-convolutional-networks-2]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 85

[^anelli-convolutional-networks-3]: Vito Walter Anelli | Deep Learning Material 2024/2025 | PDF 7 pg. 70
Added Convolutional Networks 2025-04-24 13:22:58 +02:00			`# Convolutional Networks[^anelli-convolutional-networks]`

			`<!-- TODO: Add Images -->`

			`> [!WARNING]`
			> We apply this concept *mainly* to `images`

			Usually, for `images`, `fcnn` (short for `f`ully
			`c`onnected `n`eural `n`etworks), are not suitable,
			as `images` have a *large number of `inputs`* that is
			*highly dimensional* (e.g. a `32x32`, `RGB` picture
			has dimension of `weights`)[^anelli-convolutional-networks-1]

			`Combine this with the fact that ***nowadays pictures`
			have (the least) `1920x1080` pixels*** makes `FCnn`
			`*prone to overfitting*[^anelli-convolutional-networks-1]`

			`> [!NOTE]`
			`>`
			> - From here on `depth` is the **3rd dimention of the
			`> activation voulume**`
			> - `FCnn` are just ***traditional `NeuralNetworks`
			`>`

			`## ConvNet`

			`The basic network we can achieve with a`
			`convolutional-layer` is a `ConvNet`.

			`<!-- TODO: Insert mermaid or image -->`

			`It is composed of:`

			`<!-- TODO: Add links -->`

			1. `input` (picture)
			2. [`Convolutional Layer`](#convolutional-layer)
			3. [`ReLU`](./../3-Activation-Functions/INDEX.md#relu)
			4. [`Pooling layer`](#pooling-layer)
			5. `FCnn` (Normal `NeuralNetork`)
			6. `output` (classes tags)

			`<!-- TODO: Add PDF 7 pg 7-8 -->`

			`## Building Blocks`

			`### Convolutional Layer`

			`Convolutional Layers` are `layers` that ***reduce the
			`size of the computational load*** by creating`
			`activation maps` ***computed starting from a `subset` of
			all the available `data`***

			`#### Local Connectivity`

			`To achieve such thing, we introduce the concept of`
			`local connectivity`. Basically ***each `output` is
			linked with a `volume` smaller than the original one
			concerning the `width` and `height`***
			(the `depth` is always fully connected)

			`<!-- TODO: Add image -->`

			`#### Filters`

			These are the *work-horse* of the whole `layer`.
			`A filter is a *small window that contains weights*`
			and produces the `outputs`.

			`<!-- TODO: Add image -->`

			We have a ***number of `filter` equal to the `depth` of
			the `output`***.
			This means that ***each `output-value` at
			the same `depth` has been generated by the same `filter`***, and as such,
			***any `volume` shares `weights`
			across a single `depth`***.

			Each `filter` share the same `height` and `width` and
			has a `depth` equal to the one in the `input`, and their
			`output` is usually called `activation-map`.

			`> [!NOTE]`
			> Usually what the first `activation-maps` learn are
			`> oriented edges, opposing colors, ecc...`

			Another parameter for `filters` is the `stride`, which
			`is basically the number of "hops" made from one`
			`convolution and another.`

			The formula to determine the `output` size for any side
			`is:`

			`$$`
			`out_{side\_len} = \frac{`
			`in_{side\_len} - filter_{side\_len}`
			`}{`
			`stride + 1`
			`}`
			`$$`

			Whenever the `stride` makes $out_{side\_len}$ ***not
			an integer value, we add $0$ `padding`***
			`to correct this.`

			`> [!NOTE]`
			`>`
			`> To avoid downsizing, it is not uncommon to apply a`
			`> $0$ padding of size 1 (per dimension) before applying`
			> a `filter` with `stride` equal to 1
			`>`
			`> However, for a *fast downsizing* we can increment`
			> `striding`

			`> [!CAUTION]`
			`> Don't shrink too fast, it doesn't bring good results`

			`### Pooling Layer[^pooling-layer-wikipedia]`

			`It ***downsamples the image without resorting to`
			`learnable-parameters`***

			`<!-- TODO: Insert image -->`

			There are many `algorithms` to make this `layer`, as:

			`#### Max Pooling`

			Takes the max element in the `window`

			`#### Average Pooling`

			Takes the average of elements in the `window`

			`#### Mixed Pooling`

			`Linear sum of [Max Pooling](#max-pooling) and [Average`
			`Pooling](#average-pooling)`

			`> [!NOTE]`
			`> This list is NOT EXHAUSTIVE, please refer to`
			`> [this article](https://en.wikipedia.org/wiki/Pooling_layer)`
			`> to know more.`

			This `layer` *introduces space invariance*

			`## Tips[^anelli-convolutional-networks-2]`

			- `1x1` `filters` make sense. ***They allow us
			to reduce the `depth` of the next `volume`***
			- ***Trends goes towards increasing the `depth` and
			having smaller `filters`***
			`- ***The trend is to remove`
			[`pooling-layers`](#pooling-layer) and use only
			[`convolutional-layers`](#convolutional-layer)***
			`- ***Common settings for`
			[`convolutional-layers`](#convolutional-layer) are:***
			`- number of filters: $K = 2^{a}$`
			`[^anelli-convolutional-networks-3]`
			- tuple of `filter-size` $F$ `stride` $S$,
			`0-padding` $P$:
			`- (3, 1, 1)`
			`- (5, 1, 2)`
			`- (5, 2, whatever fits)`
			`- (1, 1, 0)`
			`- See ResNet/GoogLeNet`


			`<!-- Footnotes -->`
			`[^anelli-convolutional-networks]: Vito Walter Anelli \| Deep Learning Material 2024/2025 \| PDF 7`

			`[^anelli-convolutional-networks-1]: Vito Walter Anelli \| Deep Learning Material 2024/2025 \| PDF 7 pg. 2`

			`[^pooling-layer-wikipedia]: [Pooling Layer \| Wikipedia \| 22nd April 2025](https://en.wikipedia.org/wiki/Pooling_layer)`

			`[^anelli-convolutional-networks-2]: Vito Walter Anelli \| Deep Learning Material 2024/2025 \| PDF 7 pg. 85`

			`[^anelli-convolutional-networks-3]: Vito Walter Anelli \| Deep Learning Material 2024/2025 \| PDF 7 pg. 70`