Revised Notes

2025-11-14 11:11:32 +01:00
parent 82310e51f2
commit 434e4cdd0e
1 changed files with 122 additions and 6 deletions
--- a/Chapters/1-Basic-Architecture/INDEX.md
+++ b/Chapters/1-Basic-Architecture/INDEX.md
@@ -1,15 +1,127 @@
-# Index
+# Basic Architecture

-$g()$ is any ***Non-Linear Function***
+> [!NOTE]
+> Here $g(\vec{x})$ is any
+> [activation function](./../3-Activation-Functions/INDEX.md)

-## Basic Architecture
+## Multiplicative Modules

-### Multiplicative Modules
+These modules lets us combine outputs from other networks to modify
+a behaviour.
+
+### Sigma-Pi Unit
+
+> [!NOTE]
+> This module takes his name for its sum ($\sum$ - sigma) and muliplication
+> ($\prod$ - pi) operations
+
+Thise module multiply the input for the output of another network:
+
+$$
+\begin{aligned}
+    W &= \vec{z} \times U &
+        \vec{z} \in \R^{1 \times b}, \,\, U \in \R^{b \times c \times d}\\
+    \vec{y} &= \vec{x} \times W &
+        \vec{x} \in \R^{1 \times c}, \,\, W \in \R^{c \times d}
+\end{aligned}
+$$
+
+This is equivalent to:
+
+$$
+\begin{aligned}
+    w_{i,j} &= \sum_{h = 1}^{b} z_h u_{h,i,j} \\
+    y_{j} &= \sum_{i = 1}^{c} x_i w_{i,j} = \sum_{h, i}x_i z_h u_{h,i,j}
+\end{aligned}
+$$
+
+As per this paper[^stanford-sigma-pi] from Stanford University, `sigma-pi`
+units can be represented as this:
+
+![stanford university sigma-pi](./pngs/stanford-sigma-pi.png)
+
+Assuming $a_b$ and $a_d$ elements of $\vec{a}_1$ and $a_c$ and $a_e$ elements of $\vec{a}_2$, this becomes
+
+$$
+\hat{y}_i = \sum_{j} w_{i,j} \prod_{k \in \{1, 2\}} a_{j, k}
+$$
+
+In other words, once you can mix outputs coming from other networks via
+element-wise products and then combine the result via weights like normal.
+
+### Mixture of experts
+
+If you have different networks tranined for the same objective, you can
+multiply their output by a weight vector coming from another controlling
+network.
+
+The controller network has the objective of giving a score to each expert
+based on which is the most *"experienced"* in that context. The more
+*"experienced"* an expert, the higher its influence over the output.
+
+$$
+\begin{aligned}
+    \vec{w} &= \text{softmax}\left(
+        \vec{z}
+    \right)
+    \\
+    \hat{y} &= \sum_{j} \text{expert\_out}_j \cdot w_j
+\end{aligned}
+$$
+
+> [!NOTE]
+> While we used a [`softmax`](./../3-Activation-Functions/INDEX.md#softmax),
+> this can be replaced by a `softmin` or any other scoring function.
+
+### Switch Like
+
+> [!NOTE]
+> I call them switch like because if we put  $z_i = 1$, element
+> of $\vec{z}$ and all the others to 0, it results $\hat{y} = \vec{x}_i$
+
+We can use another network to produce a signal to mix outputs of other
+networks through a matmul
+
+$$
+\begin{aligned}
+    X &= \text{concat}(\vec{x}_1, \dots, \vec{x}_n) \\
+    \hat{y} &= \vec{z} \times X = \sum_{i=1}^{n} z_i \cdot \vec{x}_i
+\end{aligned}
+$$
+
+Even though it's difficult to see here, this means that each $z_i$ is a
+mixing weight for each output vector $\vec{x}_i$
+
+### Parameter Transformation
+
+This is when we use the output of a **fixed function** as weights for out
+network
+
+$$
+\begin{aligned}
+    W &= f(\vec{z}) \\
+    \hat{y} &= g(\vec{x}W)
+\end{aligned}
+$$
+
+#### Weight Sharing
+
+This is a special case of parameter transformation
+
+$$
+f(\vec{x}) = \begin{bmatrix}
+    x_1, & x_1, & \dots, & x_n, & x_n
+\end{bmatrix}
+$$
+
+or similar, replicating elements of $\vec{x}$ across its output.
+
+<!--

 With these modules we can modify our ***traditional*** ways of ***neural networks***
 and implement ***switch-like*** functions

-#### Professor's one
+### Professor's one

 Basically here we want a ***way to modify `weights` with `inputs`***.

@@ -97,10 +209,14 @@ Since we have ***more than one value*** for our `original weights`, then we need
 > [!TIP]
 >
 > This is used to find ***motifs*** on an `input`
+-->

 <!-- Footnotes -->

 [^simga-pi]: [University of Pretoria | sigma-pi | pg. 2](https://repository.up.ac.za/bitstream/handle/2263/29715/03chapter3.pdf?sequence=4#:~:text=A%20pi%2Dsigma%20network%20\(PSN,of%20sums%20of%20input%20components.)
-[^simga-pi-2]:[]
+
 [^product-unit]: doi: 10.13053/CyS-20-2-2218
+
 [^mixture-of-experts]: [Wikipedia | 1st April 2025](https://en.wikipedia.org/wiki/Mixture_of_experts)
+
+[^stanford-sigma-pi]: [D. E. Ruhmelhart, G. E. Hinton, J. L. McClelland | A General Framework for Paralled Distributed Processing | Ch. 2 pg. 73](https://stanford.edu/~jlmcc/papers/PDP/Chapter2.pdf)