# Graph ML
## Graph Introduction
- **Nodes**: Pieces of Information
- **Edges**: Relationship between nodes
- **Mutual**
- **One-Sided**
- **Directionality**
- **Directed**: We care about the order of connections
- **Unidirectional**
- **Bidirectional**
- **Undirected**: We don't care about order of connections
Now, we can have attributes over
- **nodes**
- **edges**
- **master nodes** (a collection of nodes and edges)
for example images may be represented as a graph where each non edge pixel is a vertex connected to other 8 ones.
Its information at the vertex is a 3 (or 4) dimensional vector (think of RGB and RGBA)
### Adjacency Graph
Take a picture and make a matrix with dimension $\{0, 1\}^{(h \cdot w) \times (h \cdot w)}$ and we put a 1 if these
nodes are connected (share and edge), or 0 if they do not.
> [!NOTE]
> For a $300 \times 250$ image our matrix would be $\{0, 1\}^{(250 \cdot 300) \times (250 \cdot 300)}$
The way we put a 1 or a 0 has this rules:
- **Row element** has connection **towards** **Column element**
- **Column element** has a connection **coming** from **Row element**
### Tasks
#### Graph-Level
We want to predict a graph property
#### Node-Level
We want to predict a node property, such as classification
#### Edge-Level
We want to predict relationships between nodes such as if they share an edge, or the value of the edge they share.
For this task we may start with a fully connected graph and then prune edges, as predictions go on, to come to a
sparse graph
### Downsides of Graphs
- They are not consistent in their structure and sometimes representing something as a graph is difficult
- If we don't care about order of nodes, we need to find a way to represent this **node-order equivariance**
- Graphs may be too large
## Representing Graphs
### Adjacency List
We store info about:
- **Nodes**: list of values. index $Node_k$ is the value of that node
- **Edges**: list of values. index $Edge_k$ is the value of that edge
- **Adjacent_list**: list of Tuples with indices over nodes. index $Tuple_k$
represent the Nodes involved in the $kth$ edge
- **Graph**: Value of graph
```python
nodes: list[any] = [
"forchetta", "spaghetti", "coltello", "cucchiao", "brodo"
]
edges: list[any] = [
"serve per mangiare", "strumento", "cibo",
"strumento", "strumento", "serve per mangiare"
]
adj_list: list[(int, int)] = [
(0, 1), (0, 2), (1, 4),
(0, 3), (2, 3), (3, 4)
]
graph: any = "tavola"
```
If we find some parts of the graph that are disconnected, we can just avoid storing and computing those parts
## Graph Neural Networks (GNNs)
At the simpkest form we take a **graph-in** and **graph-out** approach with MLPs separate for
vertices, edges and master nodes that we apply **one at a time** over each element
$$
\begin{aligned}
V_{i + 1} &= MLP_{V_{i}}(V_{i}) \\
E_{i + 1} &= MLP_{E_{i}}(E_{i}) \\
U_{i + 1} &= MLP_{U_{i}}(U_{i}) \\
\end{aligned}
$$
### Pooling
> [!CAUTION]
> This step comes after the embedding phase described above
This is a step that can be used to take info about other elements, different from what we were considering
(for example, taking info from edges while making the computation over vertices).
By using this approach we usually gather some info from edges of a vertex, then we concat them in a matrix and
aggregate by summing them.
### Message Passing
Take all node embeddings that are in the neighbouroud and do similar steps as the pooling function.
### Special Layers
## Polynomial Filters
Each polynomial filter is order invariant
### Graph Laplacian
Let's set an order over nodes of a graph, where $A$ is the adjacency matrix:
$$
D_{v,v} = \sum_{u} A_{v,u}
$$
In other words, $D_{v, v}$ is the number of nodes connected ot that one
The **graph Laplacian** of the graph will be
$$
L = D - A
$$
### Polynomials of Laplacian
These polynomials, which have the same dimensions of $L$, can be though as being **filter** like in
[CNNs](./../7-Convolutional-Networks/INDEX.md#convolutional-networks)
$$
p_{\vec{w}}(L) = w_{0}I_{n} + w_{1}L^{1} + \dots + w_{d}L^{d} = \sum_{i=0}^{d} w_{i}L^{i}
$$
We then can get a ***filtered node*** by simply multiplying the polynomial with the node value
$$
\begin{aligned}
\vec{x}' = p_{\vec{w}}(L) \vec{x}
\end{aligned}
$$
> [!NOTE]
> In order to extract new features for a single vertex, supposing only $w_1 \neq 0$
>
> Observe that we are only taking $L_{v}$
>
> $$
> \begin{aligned}
> \vec{x}'_{v} &= (L\vec{x})_{v} \\
> &= \sum_{u \in G} L_{v,u} \vec{x}_{u} \\
> &= \sum_{u \in G} (D_{v,u} - A_{v,u}) \vec{x}_{u} \\
> &= \sum_{u \in G} D_{v,u} \vec{x}_{u} - A_{v,u} \vec{x}_{u} \\
> &= D_{v, v} \vec{x}_{v} - \sum_{u \in \mathcal{N}(v)} \vec{x}_{u}
> \end{aligned}
> $$
>
> Where the last step holds as $D$ is a diagonal matrix, and in the summatory we are only considering the neighbours
> of v
>
> It can be demonstrated that in any graph
>
> $$
> dist_{G}(v, u) > i \rightarrow L_{v, u}^{i} = 0
> $$
>
> More in general it holds
>
> $$
> \begin{aligned}
> \vec{x}'_{v} = (p_{\vec{w}}(L)\vec{x})_{v} &= (p_{\vec{w}}(L))_{v} \vec{x} \\
> &= \sum_{i = 0}^{d} w_{i}L_{v}^{i} \vec{x} \\
> &= \sum_{i = 0}^{d} w_{i} \sum_{u \in G} L_{v,u}^{i}\vec{x}_{u} \\
> &= \sum_{i = 0}^{d} w_{i} \sum_{\substack{u \in G \\ dist_{G}(v, u) \leq i}} L_{v,u}^{i}\vec{x}_{u} \\
> \end{aligned}
> $$
>
> So this shows that the degree of the polynomial decides the max number of hops
> to be included during the filtering stage, like if it were defining a [kernel](./../7-Convolutional-Networks/INDEX.md#filters)
### ChebNet
The polynomial in ChebNet becomes:
$$
\begin{aligned}
p_{\vec{w}}(L) &= \sum_{i = 1}^{d} w_{i} T_{i}(\tilde{L}) \\
T_{i} &= cos(i\theta) \\
\tilde{L} &= \frac{2L}{\lambda_{\max}(L)} - I_{n}
\end{aligned}
$$
- $T_{i}$ is Chebischev first kind polynomial
- $\tilde{L}$ is a reduced version of $L$ because we divide for its max eigenvalue,
keeping it in range $[-1, 1]$. Moreover $L$ ha no negative eigenvalues, so it's
positive semi-definite
These polynomials are more stable as they do not explode with higher powers
### Embedding Computation
## Other methods
- Learnable parameters
- Embeddings of node v
- Embeddings of neighbours of v
### Graph Convolutional Networks
$$
\textcolor{orange}{h_{v}^{(k)}} =
\textcolor{skyblue}{f^{(k)}} \left(
\underbrace{\textcolor{skyblue}{W^{(k)}} \cdot
\frac{
\sum_{u \in \mathcal{N}(v)} \textcolor{violet}{h_{u}^{(k-1)}}
}{
|\mathcal{N}(v)|
}}_{\text{mean of previous neighbour embeddings}} + \underbrace{\textcolor{skyblue}{B^{(k)}} \cdot
\textcolor{orange}{h_{v}^{(k - 1)}}}_{\text{previous embeddings}}
\right) \forall v \in V
$$
### Graph Attention Networks
$$
\textcolor{orange}{h_{v}^{(k)}} =
\textcolor{skyblue}{f^{(k)}} \left(
\textcolor{skyblue}{W^{(k)}} \cdot \left[
\underbrace{
\sum_{u \in \mathcal{N}(v)} \alpha^{(k-1)}_{v,u}
\textcolor{violet}{h_{u}^{(k-1)}}
}_{\text{weighted mean of previous neighbour embeddings}} +
\underbrace{\alpha^{(k-1)}_{v,v}
\textcolor{orange}{h_{v}^{(k-1)}}}_{\text{previous embeddings}}
\right] \right) \forall v \in V
$$
where
$$
\alpha^{(k)}_{v,u} = \frac{
\textcolor{skyblue}{A^{(k)}}(
\textcolor{orange}{h_{v}^{(k)}},
\textcolor{violet}{h_{u}^{(k)}},
)
}{
\sum_{w \in \mathcal{N}(v)} \textcolor{skyblue}{A^{(k)}}(
\textcolor{orange}{h_{v}^{(k)}},
\textcolor{violet}{h_{w}^{(k)}},
)
} \forall (v, u) \in E
$$
### Graph Sample and Aggregate (GraphSAGE)
### Graph Isomorphism Network (GIN)
$$
\textcolor{orange}{h_{v}^{(k)}} =
\textcolor{skyblue}{f^{(k)}}
\left(
\sum_{u \in \mathcal{N}(v)}
\textcolor{violet}{h_{u}^{(k - 1)}} +
(
1 +
\textcolor{skyblue}{\epsilon^{(k)}}
) \cdot \textcolor{orange}{h_{v}^{(k - 1)}}
\right)
\forall v \in V
$$