Deep-Learning/Chapters/14-GNN-GCN/INDEX.md

# Graph ML

## Graph Introduction

- **Nodes**: Pieces of Information
- **Edges**: Relationship between nodes
    - **Mutual**
    - **One-Sided**
- **Directionality**
    - **Directed**: We care about the order of connections
        - **Unidirectional**
        - **Bidirectional**
    - **Undirected**: We don't care about order of connections

Now, we can have attributes over

- **nodes**
- **edges**
- **master nodes** (a collection of nodes and edges)

for example images may be represented as a graph where each non edge pixel is a vertex connected to other 8 ones.
Its information at the vertex is a 3 (or 4) dimensional vector (think of RGB and RGBA)

### Adjacency Graph

Take a picture and make a matrix with dimension $\{0, 1\}^{(h \cdot w) \times (h \cdot w)}$ and we put a 1 if these
nodes are connected (share and edge), or 0 if they do not.

> [!NOTE]
> For a $300 \times 250$ image our matrix would be $\{0, 1\}^{(250 \cdot 300) \times (250 \cdot 300)}$

The way we put a 1  or a 0 has this rules:
    - **Row element** has connection **towards** **Column element**
    - **Column element** has a connection **coming** from **Row element**

### Tasks

#### Graph-Level

We want to predict a graph property

#### Node-Level

We want to predict a node property, such as classification

#### Edge-Level

We want to predict relationships between nodes such as if they share an edge, or the value of the edge they share.

For this task we may start with a fully connected graph and then prune edges, as predictions go on, to come to a
sparse graph

### Downsides of Graphs

- They are not consistent in their structure and sometimes representing something as a graph is difficult
- If we don't care about order of nodes, we need to find a way to represent this **node-order equivariance**
- Graphs may be too large

## Representing Graphs

### Adjacency List

We store info about:

- **Nodes**: list of values. index $Node_k$ is the value of that node
- **Edges**: list of values. index $Edge_k$ is the value of that edge
- **Adjacent_list**: list of Tuples with indices over nodes. index $Tuple_k$
    represent the Nodes involved in the $kth$ edge
- **Graph**: Value of graph

```python
nodes: list[any] = [
    "forchetta", "spaghetti", "coltello", "cucchiao", "brodo"
]

edges: list[any] = [
    "serve per mangiare", "strumento", "cibo",
    "strumento", "strumento", "serve per mangiare"
]

adj_list: list[(int, int)] = [
    (0, 1), (0, 2), (1, 4),
    (0, 3), (2, 3), (3, 4)
]

graph: any = "tavola"
```

If we find some parts of the graph that are disconnected, we can just avoid storing and computing those parts

## Graph Neural Networks (GNNs)

At the simpkest form we take a **graph-in** and **graph-out** approach with MLPs separate for
vertices, edges and master nodes that we apply **one at a time** over each element

$$
\begin{aligned}
    V_{i + 1} &= MLP_{V_{i}}(V_{i}) \\
    E_{i + 1} &= MLP_{E_{i}}(E_{i}) \\
    U_{i + 1} &= MLP_{U_{i}}(U_{i}) \\
\end{aligned}
$$

### Pooling

> [!CAUTION]
> This step comes after the embedding phase described above

This is a step that can be used to take info about other elements, different from what we were considering
(for example, taking info from edges while making the computation over vertices).

By using this approach we usually gather some info from edges of a vertex, then we concat them in a matrix and
aggregate by summing them.

### Message Passing

Take all node embeddings that are in the neighbouroud and do similar steps as the pooling function.

### Special Layers

<!-- TODO: Read PDF 14 Anelli pg 47 to 52 -->

## Polynomial Filters

Each polynomial filter is order invariant

### Graph Laplacian

Let's set an order over nodes of a graph, where $A$ is the adjacency matrix:

$$

D_{v,v} = \sum_{u} A_{v,u}

$$

In other words, $D_{v, v}$ is the number of nodes connected ot that one

The **graph Laplacian** of the graph will be

$$
L = D - A
$$

### Polynomials of Laplacian

These polynomials, which have the same dimensions of $L$, can be though as being **filter** like in
[CNNs](./../7-Convolutional-Networks/INDEX.md#convolutional-networks)

$$
p_{\vec{w}}(L) = w_{0}I_{n} + w_{1}L^{1} + \dots + w_{d}L^{d} = \sum_{i=0}^{d} w_{i}L^{i}
$$

We then can get a ***filtered node*** by simply multiplying the polynomial with the node value

$$
\begin{aligned}
    \vec{x}' = p_{\vec{w}}(L) \vec{x}
\end{aligned}
$$

> [!NOTE]
> In order to extract new features for a single vertex, supposing only $w_1 \neq 0$
>
> Observe that we are only taking $L_{v}$
>
> $$
> \begin{aligned}
>     \vec{x}'_{v} &= (L\vec{x})_{v}    \\
>       &= \sum_{u \in G} L_{v,u} \vec{x}_{u}   \\
>       &= \sum_{u \in G} (D_{v,u} - A_{v,u}) \vec{x}_{u}   \\
>       &= \sum_{u \in G} D_{v,u} \vec{x}_{u} - A_{v,u} \vec{x}_{u} \\
>       &= D_{v, v} \vec{x}_{v} - \sum_{u \in \mathcal{N}(v)} \vec{x}_{u}
> \end{aligned}
> $$
>
> Where the last step holds as $D$ is a diagonal matrix, and in the summatory we are only considering the neighbours
> of v
>
> It can be demonstrated that in any graph
>
> $$
> dist_{G}(v, u) > i \rightarrow L_{v, u}^{i} = 0
> $$
>
> More in general it holds
>
> $$
> \begin{aligned}
>     \vec{x}'_{v} = (p_{\vec{w}}(L)\vec{x})_{v} &= (p_{\vec{w}}(L))_{v} \vec{x}   \\
>       &= \sum_{i = 0}^{d} w_{i}L_{v}^{i} \vec{x}   \\
>       &= \sum_{i = 0}^{d} w_{i} \sum_{u \in G} L_{v,u}^{i}\vec{x}_{u}   \\
>       &= \sum_{i = 0}^{d} w_{i} \sum_{\substack{u \in G \\ dist_{G}(v, u) \leq i}} L_{v,u}^{i}\vec{x}_{u}   \\
> \end{aligned}
> $$
>
> So this shows that the degree of the polynomial decides the max number of hops
> to be included during the filtering stage, like if it were defining a [kernel](./../7-Convolutional-Networks/INDEX.md#filters)

### ChebNet

The polynomial in ChebNet becomes:

$$
\begin{aligned}
p_{\vec{w}}(L) &= \sum_{i = 1}^{d} w_{i} T_{i}(\tilde{L}) \\
T_{i} &= cos(i\theta) \\
\tilde{L} &= \frac{2L}{\lambda_{\max}(L)} - I_{n}
\end{aligned}
$$

- $T_{i}$ is Chebischev first kind polynomial
- $\tilde{L}$ is a reduced version of $L$ because we divide for its max eigenvalue,
    keeping it in range $[-1, 1]$. Moreover $L$ ha no negative eigenvalues, so it's
    positive semi-definite

These polynomials are more stable as they do not explode with higher powers

### Embedding Computation

<!-- TODO: Read PDF 14 Anelli from 81 to 83 -->

## Other methods

- <span style="color:skyblue">Learnable parameters</span>
- <span style="color:orange">Embeddings of node v</span>
- <span style="color:violet">Embeddings of neighbours of v</span>

### Graph Convolutional Networks

$$
\textcolor{orange}{h_{v}^{(k)}} =
\textcolor{skyblue}{f^{(k)}} \left(
    \underbrace{\textcolor{skyblue}{W^{(k)}} \cdot
    \frac{
        \sum_{u \in \mathcal{N}(v)} \textcolor{violet}{h_{u}^{(k-1)}}
    }{
        |\mathcal{N}(v)|
    }}_{\text{mean of previous neighbour embeddings}} + \underbrace{\textcolor{skyblue}{B^{(k)}} \cdot
\textcolor{orange}{h_{v}^{(k - 1)}}}_{\text{previous embeddings}}
\right) \forall v \in V
$$

### Graph Attention Networks

$$
\textcolor{orange}{h_{v}^{(k)}} =
\textcolor{skyblue}{f^{(k)}} \left(
    \textcolor{skyblue}{W^{(k)}} \cdot \left[
    \underbrace{
        \sum_{u \in \mathcal{N}(v)} \alpha^{(k-1)}_{v,u}
        \textcolor{violet}{h_{u}^{(k-1)}}
     }_{\text{weighted mean of previous neighbour embeddings}}  +
        \underbrace{\alpha^{(k-1)}_{v,v}
        \textcolor{orange}{h_{v}^{(k-1)}}}_{\text{previous embeddings}}
\right] \right) \forall v \in V
$$

where

$$
\alpha^{(k)}_{v,u} = \frac{
    \textcolor{skyblue}{A^{(k)}}(
        \textcolor{orange}{h_{v}^{(k)}},
        \textcolor{violet}{h_{u}^{(k)}},
    )
}{
    \sum_{w \in \mathcal{N}(v)} \textcolor{skyblue}{A^{(k)}}(
        \textcolor{orange}{h_{v}^{(k)}},
        \textcolor{violet}{h_{w}^{(k)}},
    )
} \forall (v, u) \in E
$$

### Graph Sample and Aggregate (GraphSAGE)

<!-- TODO: See PDF 14 Anelli from 98 to 102 -->

### Graph Isomorphism Network (GIN)

$$
\textcolor{orange}{h_{v}^{(k)}} =
 \textcolor{skyblue}{f^{(k)}}
\left(
    \sum_{u \in \mathcal{N}(v)}
    \textcolor{violet}{h_{u}^{(k - 1)}} +
    (
        1 +
        \textcolor{skyblue}{\epsilon^{(k)}}
    ) \cdot \textcolor{orange}{h_{v}^{(k - 1)}}
\right)
\forall v \in V
$$