2025-04-15 14:07:46 +02:00

2.9 KiB

Index

g() is any Non-Linear Function

Basic Architecture

Multiplicative Modules

With these modules we can modify our traditional ways of neural networks and implement switch-like functions

Professor's one

Basically here we want a way to modify weights with inputs.

Here \vec{z} and \vec{x} are both inputs


\begin{aligned}
    \vec{y} &= \sum_{w_{i,j}x{j}} \\
    \vec{w} &= \sum_{k} u_{i,j,k} z_{k} \rightarrow \\
    \rightarrow \vec{y} &= \sum_{j,k} u_{i,j,k} z_{k} x_{j}
\end{aligned}

As we can see here, z_{k} modifies, along u, x_{j}.

Quadratic Layer

This layer expands data by applying the quadratic formula


\begin{aligned}
    \vec{v} &= [a_1, a_2, a_3] \\

    quad\_layer(\vec{v}) &= [ a_1 \cdot a_1, a_1 \cdot a_2, a_1 \cdot a_3, ... , a_3 \cdot a_3 ]
\end{aligned}

Product Unit1


o_k =  \sum_{j}^{m} v_{k,j} \cdot \left( \prod_{i=1}^{n} x_{i}^{w_{j,i}}\right) + v_{k,0}

Sigma-Pi Unit23

This layer is basically a product of input terms times a weight, intead of a matrix multiplication of a linear-layer.

Moreover, this is not necessarily fully-connected


o_k = g\left( \sum_{q \in conjunct} w_{q} \prod_{k=1}^{N} z_{q,k} \right)

Attention Modules

They define a way for our model to get what's more important

Softmax

We use this function to output the importance of a certain value over all the others.


\begin{aligned}
    \sigma(\vec{x})_{j} &= \frac{e^{x_{j}}}{\sum_{k} e^{x_{k}}} \;\; \forall k \in {0, ..., N} \\

    \sigma(\vec{x})_{j} &\in [0, 1] \;\; \forall j \in {0, ..., N}
\end{aligned}

Mixture of Experts4

What happens if we have more models and we want to take their output?

Basically we have a set of weights over our outputs before the output-layer. Both the experts and the gating-function need to be trained.

Tip

Since we are talking about weights and importance, probably here it is better to use an attention-model

Parameter Transformation

It is basically when the wheights are the output of a function

Since they are controlled by some other parameters, then we need to learn those instead

Weights Sharing

Here we copy our weights over more basic components. Since we have more than one value for our original weights, then we need to sum those.

Tip

This is used to find motifs on an input