Added First Chapter
This commit is contained in:
parent
76c39bc9c3
commit
ed09a0b9ee
106
Chapters/1-Basic-Architecture/INDEX.md
Normal file
106
Chapters/1-Basic-Architecture/INDEX.md
Normal file
@ -0,0 +1,106 @@
|
||||
# Index
|
||||
|
||||
$g()$ is any ***Non-Linear Function***
|
||||
|
||||
## Basic Architecture
|
||||
|
||||
### Multiplicative Modules
|
||||
|
||||
With these modules we can modify our ***traditional*** ways of ***neural networks***
|
||||
and implement ***switch-like*** functions
|
||||
|
||||
#### Professor's one
|
||||
|
||||
Basically here we want a ***way to modify `weights` with `inputs`***.
|
||||
|
||||
Here $\vec{z}$ and $\vec{x}$ are both `inputs`
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
\vec{y} &= \sum_{w_{i,j}x{j}} \\
|
||||
\vec{w} &= \sum_{k} u_{i,j,k} z_{k} \rightarrow \\
|
||||
\rightarrow \vec{y} &= \sum_{j,k} u_{i,j,k} z_{k} x_{j}
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
As we can see here, $z_{k}$ modifies, along $u$, $x_{j}$.
|
||||
|
||||
#### Quadratic Layer
|
||||
|
||||
This layer expands data by applying the **quadratic formula**
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
\vec{v} &= [a_1, a_2, a_3] \\
|
||||
|
||||
quad\_layer(\vec{v}) &= [ a_1 \cdot a_1, a_1 \cdot a_2, a_1 \cdot a_3, ... , a_3 \cdot a_3 ]
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
#### Product Unit[^product-unit]
|
||||
|
||||
$$
|
||||
o_k = \sum_{j}^{m} v_{k,j} \cdot \left( \prod_{i=1}^{n} x_{i}^{w_{j,i}}\right) + v_{k,0}
|
||||
$$
|
||||
|
||||
#### Sigma-Pi Unit[^simga-pi][^simga-pi-2]
|
||||
|
||||
This *layer* is basically a product of `input` terms **times**
|
||||
a `weight`, intead of a `matrix multiplication` of a `linear-layer`.
|
||||
|
||||
Moreover, this is ***not necessarily*** `fully-connected`
|
||||
|
||||
$$
|
||||
o_k = g\left( \sum_{q \in conjunct} w_{q} \prod_{k=1}^{N} z_{q,k} \right)
|
||||
$$
|
||||
|
||||
### Attention Modules
|
||||
|
||||
They define a way for our `model` to get what's ***more important***
|
||||
|
||||
#### Softmax
|
||||
|
||||
We use this function to output the ***importance*** of a certain
|
||||
value over all the others.
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
\sigma(\vec{x})_{j} &= \frac{e^{x_{j}}}{\sum_{k} e^{x_{k}}} \;\; \forall k \in {0, ..., N} \\
|
||||
|
||||
\sigma(\vec{x})_{j} &\in [0, 1] \;\; \forall j \in {0, ..., N}
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
## Mixture of Experts[^mixture-of-experts]
|
||||
|
||||
What happens if we have more `models` and we want to take their output?
|
||||
|
||||
Basically we have a set of `weights` over our `outputs` before the `output-layer`.
|
||||
Both the **experts** and the **gating-function** need to be `trained`.
|
||||
|
||||
> [!TIP]
|
||||
>
|
||||
> Since we are talking about `weights` and `importance`, probably here it is better to use an [attention-model](#attention-modules)
|
||||
|
||||
## Parameter Transformation
|
||||
|
||||
It is basically when the `wheights` are the `output` of a ***function***
|
||||
|
||||
Since they are controlled by some other `parameters`, then we need to ***learn***
|
||||
those instead
|
||||
|
||||
### Weights Sharing
|
||||
|
||||
Here we ***copy*** our weights over more ***basic components***.
|
||||
Since we have ***more than one value*** for our `original weights`, then we need to ***sum*** those.
|
||||
|
||||
> [!TIP]
|
||||
>
|
||||
> This is used to find ***motifs*** on an `input`
|
||||
|
||||
<!-- Footnotes -->
|
||||
|
||||
[^simga-pi]: [University of Pretoria | sigma-pi | pg. 2](https://repository.up.ac.za/bitstream/handle/2263/29715/03chapter3.pdf?sequence=4#:~:text=A%20pi%2Dsigma%20network%20\(PSN,of%20sums%20of%20input%20components.)
|
||||
[^simga-pi-2]:[]
|
||||
[^product-unit]: doi: 10.13053/CyS-20-2-2218
|
||||
[^mixture-of-experts]: [Wikipedia | 1st April 2025](https://en.wikipedia.org/wiki/Mixture_of_experts)
|
||||
Loading…
x
Reference in New Issue
Block a user