2.9 KiB
Index
g() is any Non-Linear Function
Basic Architecture
Multiplicative Modules
With these modules we can modify our traditional ways of neural networks and implement switch-like functions
Professor's one
Basically here we want a way to modify weights with inputs.
Here \vec{z} and \vec{x} are both inputs
\begin{aligned}
\vec{y} &= \sum_{w_{i,j}x{j}} \\
\vec{w} &= \sum_{k} u_{i,j,k} z_{k} \rightarrow \\
\rightarrow \vec{y} &= \sum_{j,k} u_{i,j,k} z_{k} x_{j}
\end{aligned}
As we can see here, z_{k} modifies, along u, x_{j}.
Quadratic Layer
This layer expands data by applying the quadratic formula
\begin{aligned}
\vec{v} &= [a_1, a_2, a_3] \\
quad\_layer(\vec{v}) &= [ a_1 \cdot a_1, a_1 \cdot a_2, a_1 \cdot a_3, ... , a_3 \cdot a_3 ]
\end{aligned}
Product Unit1
o_k = \sum_{j}^{m} v_{k,j} \cdot \left( \prod_{i=1}^{n} x_{i}^{w_{j,i}}\right) + v_{k,0}
Sigma-Pi Unit23
This layer is basically a product of input terms times
a weight, intead of a matrix multiplication of a linear-layer.
Moreover, this is not necessarily fully-connected
o_k = g\left( \sum_{q \in conjunct} w_{q} \prod_{k=1}^{N} z_{q,k} \right)
Attention Modules
They define a way for our model to get what's more important
Softmax
We use this function to output the importance of a certain value over all the others.
\begin{aligned}
\sigma(\vec{x})_{j} &= \frac{e^{x_{j}}}{\sum_{k} e^{x_{k}}} \;\; \forall k \in {0, ..., N} \\
\sigma(\vec{x})_{j} &\in [0, 1] \;\; \forall j \in {0, ..., N}
\end{aligned}
Mixture of Experts4
What happens if we have more models and we want to take their output?
Basically we have a set of weights over our outputs before the output-layer.
Both the experts and the gating-function need to be trained.
Tip
Since we are talking about
weightsandimportance, probably here it is better to use an attention-model
Parameter Transformation
It is basically when the wheights are the output of a function
Since they are controlled by some other parameters, then we need to learn
those instead
Weights Sharing
Here we copy our weights over more basic components.
Since we have more than one value for our original weights, then we need to sum those.
Tip
This is used to find motifs on an
input