293 lines
8.1 KiB
Markdown
293 lines
8.1 KiB
Markdown
# Graph ML
|
|
|
|
## Graph Introduction
|
|
|
|
- **Nodes**: Pieces of Information
|
|
- **Edges**: Relationship between nodes
|
|
- **Mutual**
|
|
- **One-Sided**
|
|
- **Directionality**
|
|
- **Directed**: We care about the order of connections
|
|
- **Unidirectional**
|
|
- **Bidirectional**
|
|
- **Undirected**: We don't care about order of connections
|
|
|
|
Now, we can have attributes over
|
|
|
|
- **nodes**
|
|
- **edges**
|
|
- **master nodes** (a collection of nodes and edges)
|
|
|
|
for example images may be represented as a graph where each non edge pixel is a vertex connected to other 8 ones.
|
|
Its information at the vertex is a 3 (or 4) dimensional vector (think of RGB and RGBA)
|
|
|
|
### Adjacency Graph
|
|
|
|
Take a picture and make a matrix with dimension $\{0, 1\}^{(h \cdot w) \times (h \cdot w)}$ and we put a 1 if these
|
|
nodes are connected (share and edge), or 0 if they do not.
|
|
|
|
> [!NOTE]
|
|
> For a $300 \times 250$ image our matrix would be $\{0, 1\}^{(250 \cdot 300) \times (250 \cdot 300)}$
|
|
|
|
The way we put a 1 or a 0 has this rules:
|
|
- **Row element** has connection **towards** **Column element**
|
|
- **Column element** has a connection **coming** from **Row element**
|
|
|
|
### Tasks
|
|
|
|
#### Graph-Level
|
|
|
|
We want to predict a graph property
|
|
|
|
#### Node-Level
|
|
|
|
We want to predict a node property, such as classification
|
|
|
|
#### Edge-Level
|
|
|
|
We want to predict relationships between nodes such as if they share an edge, or the value of the edge they share.
|
|
|
|
For this task we may start with a fully connected graph and then prune edges, as predictions go on, to come to a
|
|
sparse graph
|
|
|
|
### Downsides of Graphs
|
|
|
|
- They are not consistent in their structure and sometimes representing something as a graph is difficult
|
|
- If we don't care about order of nodes, we need to find a way to represent this **node-order equivariance**
|
|
- Graphs may be too large
|
|
|
|
## Representing Graphs
|
|
|
|
### Adjacency List
|
|
|
|
We store info about:
|
|
|
|
- **Nodes**: list of values. index $Node_k$ is the value of that node
|
|
- **Edges**: list of values. index $Edge_k$ is the value of that edge
|
|
- **Adjacent_list**: list of Tuples with indices over nodes. index $Tuple_k$
|
|
represent the Nodes involved in the $kth$ edge
|
|
- **Graph**: Value of graph
|
|
|
|
```python
|
|
nodes: list[any] = [
|
|
"forchetta", "spaghetti", "coltello", "cucchiao", "brodo"
|
|
]
|
|
|
|
edges: list[any] = [
|
|
"serve per mangiare", "strumento", "cibo",
|
|
"strumento", "strumento", "serve per mangiare"
|
|
]
|
|
|
|
adj_list: list[(int, int)] = [
|
|
(0, 1), (0, 2), (1, 4),
|
|
(0, 3), (2, 3), (3, 4)
|
|
]
|
|
|
|
graph: any = "tavola"
|
|
```
|
|
|
|
If we find some parts of the graph that are disconnected, we can just avoid storing and computing those parts
|
|
|
|
## Graph Neural Networks (GNNs)
|
|
|
|
At the simpkest form we take a **graph-in** and **graph-out** approach with MLPs separate for
|
|
vertices, edges and master nodes that we apply **one at a time** over each element
|
|
|
|
$$
|
|
\begin{aligned}
|
|
V_{i + 1} &= MLP_{V_{i}}(V_{i}) \\
|
|
E_{i + 1} &= MLP_{E_{i}}(E_{i}) \\
|
|
U_{i + 1} &= MLP_{U_{i}}(U_{i}) \\
|
|
\end{aligned}
|
|
$$
|
|
|
|
### Pooling
|
|
|
|
> [!CAUTION]
|
|
> This step comes after the embedding phase described above
|
|
|
|
This is a step that can be used to take info about other elements, different from what we were considering
|
|
(for example, taking info from edges while making the computation over vertices).
|
|
|
|
By using this approach we usually gather some info from edges of a vertex, then we concat them in a matrix and
|
|
aggregate by summing them.
|
|
|
|
### Message Passing
|
|
|
|
Take all node embeddings that are in the neighbouroud and do similar steps as the pooling function.
|
|
|
|
### Special Layers
|
|
|
|
<!-- TODO: Read PDF 14 Anelli pg 47 to 52 -->
|
|
|
|
## Polynomial Filters
|
|
|
|
Each polynomial filter is order invariant
|
|
|
|
### Graph Laplacian
|
|
|
|
Let's set an order over nodes of a graph, where $A$ is the adjacency matrix:
|
|
|
|
$$
|
|
|
|
D_{v,v} = \sum_{u} A_{v,u}
|
|
|
|
$$
|
|
|
|
In other words, $D_{v, v}$ is the number of nodes connected ot that one
|
|
|
|
The **graph Laplacian** of the graph will be
|
|
|
|
$$
|
|
L = D - A
|
|
$$
|
|
|
|
### Polynomials of Laplacian
|
|
|
|
These polynomials, which have the same dimensions of $L$, can be though as being **filter** like in
|
|
[CNNs](./../7-Convolutional-Networks/INDEX.md#convolutional-networks)
|
|
|
|
$$
|
|
p_{\vec{w}}(L) = w_{0}I_{n} + w_{1}L^{1} + \dots + w_{d}L^{d} = \sum_{i=0}^{d} w_{i}L^{i}
|
|
$$
|
|
|
|
We then can get a ***filtered node*** by simply multiplying the polynomial with the node value
|
|
|
|
$$
|
|
\begin{aligned}
|
|
\vec{x}' = p_{\vec{w}}(L) \vec{x}
|
|
\end{aligned}
|
|
$$
|
|
|
|
> [!NOTE]
|
|
> In order to extract new features for a single vertex, supposing only $w_1 \neq 0$
|
|
>
|
|
> Observe that we are only taking $L_{v}$
|
|
>
|
|
> $$
|
|
> \begin{aligned}
|
|
> \vec{x}'_{v} &= (L\vec{x})_{v} \\
|
|
> &= \sum_{u \in G} L_{v,u} \vec{x}_{u} \\
|
|
> &= \sum_{u \in G} (D_{v,u} - A_{v,u}) \vec{x}_{u} \\
|
|
> &= \sum_{u \in G} D_{v,u} \vec{x}_{u} - A_{v,u} \vec{x}_{u} \\
|
|
> &= D_{v, v} \vec{x}_{v} - \sum_{u \in \mathcal{N}(v)} \vec{x}_{u}
|
|
> \end{aligned}
|
|
> $$
|
|
>
|
|
> Where the last step holds as $D$ is a diagonal matrix, and in the summatory we are only considering the neighbours
|
|
> of v
|
|
>
|
|
> It can be demonstrated that in any graph
|
|
>
|
|
> $$
|
|
> dist_{G}(v, u) > i \rightarrow L_{v, u}^{i} = 0
|
|
> $$
|
|
>
|
|
> More in general it holds
|
|
>
|
|
> $$
|
|
> \begin{aligned}
|
|
> \vec{x}'_{v} = (p_{\vec{w}}(L)\vec{x})_{v} &= (p_{\vec{w}}(L))_{v} \vec{x} \\
|
|
> &= \sum_{i = 0}^{d} w_{i}L_{v}^{i} \vec{x} \\
|
|
> &= \sum_{i = 0}^{d} w_{i} \sum_{u \in G} L_{v,u}^{i}\vec{x}_{u} \\
|
|
> &= \sum_{i = 0}^{d} w_{i} \sum_{\substack{u \in G \\ dist_{G}(v, u) \leq i}} L_{v,u}^{i}\vec{x}_{u} \\
|
|
> \end{aligned}
|
|
> $$
|
|
>
|
|
> So this shows that the degree of the polynomial decides the max number of hops
|
|
> to be included during the filtering stage, like if it were defining a [kernel](./../7-Convolutional-Networks/INDEX.md#filters)
|
|
|
|
### ChebNet
|
|
|
|
The polynomial in ChebNet becomes:
|
|
|
|
$$
|
|
\begin{aligned}
|
|
p_{\vec{w}}(L) &= \sum_{i = 1}^{d} w_{i} T_{i}(\tilde{L}) \\
|
|
T_{i} &= cos(i\theta) \\
|
|
\tilde{L} &= \frac{2L}{\lambda_{\max}(L)} - I_{n}
|
|
\end{aligned}
|
|
$$
|
|
|
|
- $T_{i}$ is Chebischev first kind polynomial
|
|
- $\tilde{L}$ is a reduced version of $L$ because we divide for its max eigenvalue,
|
|
keeping it in range $[-1, 1]$. Moreover $L$ ha no negative eigenvalues, so it's
|
|
positive semi-definite
|
|
|
|
These polynomials are more stable as they do not explode with higher powers
|
|
|
|
### Embedding Computation
|
|
|
|
<!-- TODO: Read PDF 14 Anelli from 81 to 83 -->
|
|
|
|
## Other methods
|
|
|
|
- <span style="color:skyblue">Learnable parameters</span>
|
|
- <span style="color:orange">Embeddings of node v</span>
|
|
- <span style="color:violet">Embeddings of neighbours of v</span>
|
|
|
|
### Graph Convolutional Networks
|
|
|
|
$$
|
|
\textcolor{orange}{h_{v}^{(k)}} =
|
|
\textcolor{skyblue}{f^{(k)}} \left(
|
|
\underbrace{\textcolor{skyblue}{W^{(k)}} \cdot
|
|
\frac{
|
|
\sum_{u \in \mathcal{N}(v)} \textcolor{violet}{h_{u}^{(k-1)}}
|
|
}{
|
|
|\mathcal{N}(v)|
|
|
}}_{\text{mean of previous neighbour embeddings}} + \underbrace{\textcolor{skyblue}{B^{(k)}} \cdot
|
|
\textcolor{orange}{h_{v}^{(k - 1)}}}_{\text{previous embeddings}}
|
|
\right) \forall v \in V
|
|
$$
|
|
|
|
### Graph Attention Networks
|
|
|
|
$$
|
|
\textcolor{orange}{h_{v}^{(k)}} =
|
|
\textcolor{skyblue}{f^{(k)}} \left(
|
|
\textcolor{skyblue}{W^{(k)}} \cdot \left[
|
|
\underbrace{
|
|
\sum_{u \in \mathcal{N}(v)} \alpha^{(k-1)}_{v,u}
|
|
\textcolor{violet}{h_{u}^{(k-1)}}
|
|
}_{\text{weighted mean of previous neighbour embeddings}} +
|
|
\underbrace{\alpha^{(k-1)}_{v,v}
|
|
\textcolor{orange}{h_{v}^{(k-1)}}}_{\text{previous embeddings}}
|
|
\right] \right) \forall v \in V
|
|
$$
|
|
|
|
where
|
|
|
|
$$
|
|
\alpha^{(k)}_{v,u} = \frac{
|
|
\textcolor{skyblue}{A^{(k)}}(
|
|
\textcolor{orange}{h_{v}^{(k)}},
|
|
\textcolor{violet}{h_{u}^{(k)}},
|
|
)
|
|
}{
|
|
\sum_{w \in \mathcal{N}(v)} \textcolor{skyblue}{A^{(k)}}(
|
|
\textcolor{orange}{h_{v}^{(k)}},
|
|
\textcolor{violet}{h_{w}^{(k)}},
|
|
)
|
|
} \forall (v, u) \in E
|
|
$$
|
|
|
|
### Graph Sample and Aggregate (GraphSAGE)
|
|
|
|
<!-- TODO: See PDF 14 Anelli from 98 to 102 -->
|
|
|
|
### Graph Isomorphism Network (GIN)
|
|
|
|
$$
|
|
\textcolor{orange}{h_{v}^{(k)}} =
|
|
\textcolor{skyblue}{f^{(k)}}
|
|
\left(
|
|
\sum_{u \in \mathcal{N}(v)}
|
|
\textcolor{violet}{h_{u}^{(k - 1)}} +
|
|
(
|
|
1 +
|
|
\textcolor{skyblue}{\epsilon^{(k)}}
|
|
) \cdot \textcolor{orange}{h_{v}^{(k - 1)}}
|
|
\right)
|
|
\forall v \in V
|
|
$$ |