Index

g() is any Non-Linear Function

Basic Architecture

Multiplicative Modules

With these modules we can modify our traditional ways of neural networks and implement switch-like functions

Professor's one

Basically here we want a way to modify weights with inputs.

Here \vec{z} and \vec{x} are both inputs


\begin{aligned}
    \vec{y} &= \sum_{w_{i,j}x{j}} \\
    \vec{w} &= \sum_{k} u_{i,j,k} z_{k} \rightarrow \\
    \rightarrow \vec{y} &= \sum_{j,k} u_{i,j,k} z_{k} x_{j}
\end{aligned}

As we can see here, z_{k} modifies, along u, x_{j}.

Quadratic Layer

This layer expands data by applying the quadratic formula


\begin{aligned}
    \vec{v} &= [a_1, a_2, a_3] \\

    quad\_layer(\vec{v}) &= [ a_1 \cdot a_1, a_1 \cdot a_2, a_1 \cdot a_3, ... , a_3 \cdot a_3 ]
\end{aligned}

Product Unit¹


o_k =  \sum_{j}^{m} v_{k,j} \cdot \left( \prod_{i=1}^{n} x_{i}^{w_{j,i}}\right) + v_{k,0}

Sigma-Pi Unit²³

This layer is basically a product of input terms times a weight, intead of a matrix multiplication of a linear-layer.

Moreover, this is not necessarily fully-connected


o_k = g\left( \sum_{q \in conjunct} w_{q} \prod_{k=1}^{N} z_{q,k} \right)

Attention Modules

They define a way for our model to get what's more important

Softmax

We use this function to output the importance of a certain value over all the others.


\begin{aligned}
    \sigma(\vec{x})_{j} &= \frac{e^{x_{j}}}{\sum_{k} e^{x_{k}}} \;\; \forall k \in {0, ..., N} \\

    \sigma(\vec{x})_{j} &\in [0, 1] \;\; \forall j \in {0, ..., N}
\end{aligned}

Mixture of Experts⁴

What happens if we have more models and we want to take their output?

Basically we have a set of weights over our outputs before the output-layer. Both the experts and the gating-function need to be trained.

Tip

Since we are talking about weights and importance, probably here it is better to use an attention-model

Parameter Transformation

It is basically when the wheights are the output of a function

Since they are controlled by some other parameters, then we need to learn those instead

Here we copy our weights over more basic components. Since we have more than one value for our original weights, then we need to sum those.

Tip

This is used to find motifs on an input

doi: 10.13053/CyS-20-2-2218 ↩︎
University of Pretoria | sigma-pi | pg. 2 ↩︎
[] ↩︎
Wikipedia | 1st April 2025 ↩︎

2.9 KiB Raw Blame History

Index

Basic Architecture

Multiplicative Modules

Professor's one

Quadratic Layer

Product Unit1

Sigma-Pi Unit23

Attention Modules

Softmax

Mixture of Experts4

Parameter Transformation

Weights Sharing

2.9 KiB

Raw Blame History

Product Unit¹

Sigma-Pi Unit²³

Mixture of Experts⁴