2025-09-13 16:17:35 +02:00

8.1 KiB

Graph ML

Graph Introduction

  • Nodes: Pieces of Information
  • Edges: Relationship between nodes
    • Mutual
    • One-Sided
  • Directionality
    • Directed: We care about the order of connections
      • Unidirectional
      • Bidirectional
    • Undirected: We don't care about order of connections

Now, we can have attributes over

  • nodes
  • edges
  • master nodes (a collection of nodes and edges)

for example images may be represented as a graph where each non edge pixel is a vertex connected to other 8 ones. Its information at the vertex is a 3 (or 4) dimensional vector (think of RGB and RGBA)

Adjacency Graph

Take a picture and make a matrix with dimension \{0, 1\}^{(h \cdot w) \times (h \cdot w)} and we put a 1 if these nodes are connected (share and edge), or 0 if they do not.

Note

For a 300 \times 250 image our matrix would be \{0, 1\}^{(250 \cdot 300) \times (250 \cdot 300)}

The way we put a 1 or a 0 has this rules: - Row element has connection towards Column element - Column element has a connection coming from Row element

Tasks

Graph-Level

We want to predict a graph property

Node-Level

We want to predict a node property, such as classification

Edge-Level

We want to predict relationships between nodes such as if they share an edge, or the value of the edge they share.

For this task we may start with a fully connected graph and then prune edges, as predictions go on, to come to a sparse graph

Downsides of Graphs

  • They are not consistent in their structure and sometimes representing something as a graph is difficult
  • If we don't care about order of nodes, we need to find a way to represent this node-order equivariance
  • Graphs may be too large

Representing Graphs

Adjacency List

We store info about:

  • Nodes: list of values. index Node_k is the value of that node
  • Edges: list of values. index Edge_k is the value of that edge
  • Adjacent_list: list of Tuples with indices over nodes. index Tuple_k represent the Nodes involved in the kth edge
  • Graph: Value of graph
nodes: list[any] = [
    "forchetta", "spaghetti", "coltello", "cucchiao", "brodo"
]

edges: list[any] = [
    "serve per mangiare", "strumento", "cibo",
    "strumento", "strumento", "serve per mangiare"
]

adj_list: list[(int, int)] = [
    (0, 1), (0, 2), (1, 4),
    (0, 3), (2, 3), (3, 4)
]

graph: any = "tavola"

If we find some parts of the graph that are disconnected, we can just avoid storing and computing those parts

Graph Neural Networks (GNNs)

At the simpkest form we take a graph-in and graph-out approach with MLPs separate for vertices, edges and master nodes that we apply one at a time over each element


\begin{aligned}
    V_{i + 1} &= MLP_{V_{i}}(V_{i}) \\
    E_{i + 1} &= MLP_{E_{i}}(E_{i}) \\
    U_{i + 1} &= MLP_{U_{i}}(U_{i}) \\
\end{aligned}

Pooling

Caution

This step comes after the embedding phase described above

This is a step that can be used to take info about other elements, different from what we were considering (for example, taking info from edges while making the computation over vertices).

By using this approach we usually gather some info from edges of a vertex, then we concat them in a matrix and aggregate by summing them.

Message Passing

Take all node embeddings that are in the neighbouroud and do similar steps as the pooling function.

Special Layers

Polynomial Filters

Each polynomial filter is order invariant

Graph Laplacian

Let's set an order over nodes of a graph, where A is the adjacency matrix:



D_{v,v} = \sum_{u} A_{v,u}

In other words, D_{v, v} is the number of nodes connected ot that one

The graph Laplacian of the graph will be


L = D - A

Polynomials of Laplacian

These polynomials, which have the same dimensions of L, can be though as being filter like in CNNs


p_{\vec{w}}(L) = w_{0}I_{n} + w_{1}L^{1} + \dots + w_{d}L^{d} = \sum_{i=0}^{d} w_{i}L^{i}

We then can get a filtered node by simply multiplying the polynomial with the node value


\begin{aligned}
    \vec{x}' = p_{\vec{w}}(L) \vec{x}
\end{aligned}

Note

In order to extract new features for a single vertex, supposing only w_1 \neq 0

Observe that we are only taking L_{v}


\begin{aligned}
    \vec{x}'_{v} &= (L\vec{x})_{v}    \\
      &= \sum_{u \in G} L_{v,u} \vec{x}_{u}   \\
      &= \sum_{u \in G} (D_{v,u} - A_{v,u}) \vec{x}_{u}   \\
      &= \sum_{u \in G} D_{v,u} \vec{x}_{u} - A_{v,u} \vec{x}_{u} \\
      &= D_{v, v} \vec{x}_{v} - \sum_{u \in \mathcal{N}(v)} \vec{x}_{u}
\end{aligned}

Where the last step holds as D is a diagonal matrix, and in the summatory we are only considering the neighbours of v

It can be demonstrated that in any graph


dist_{G}(v, u) > i \rightarrow L_{v, u}^{i} = 0

More in general it holds


\begin{aligned}
    \vec{x}'_{v} = (p_{\vec{w}}(L)\vec{x})_{v} &= (p_{\vec{w}}(L))_{v} \vec{x}   \\
      &= \sum_{i = 0}^{d} w_{i}L_{v}^{i} \vec{x}   \\
      &= \sum_{i = 0}^{d} w_{i} \sum_{u \in G} L_{v,u}^{i}\vec{x}_{u}   \\
      &= \sum_{i = 0}^{d} w_{i} \sum_{\substack{u \in G \\ dist_{G}(v, u) \leq i}} L_{v,u}^{i}\vec{x}_{u}   \\
\end{aligned}

So this shows that the degree of the polynomial decides the max number of hops to be included during the filtering stage, like if it were defining a kernel

ChebNet

The polynomial in ChebNet becomes:


\begin{aligned}
p_{\vec{w}}(L) &= \sum_{i = 1}^{d} w_{i} T_{i}(\tilde{L}) \\
T_{i} &= cos(i\theta) \\
\tilde{L} &= \frac{2L}{\lambda_{\max}(L)} - I_{n}
\end{aligned}
  • T_{i} is Chebischev first kind polynomial
  • \tilde{L} is a reduced version of L because we divide for its max eigenvalue, keeping it in range [-1, 1]. Moreover L ha no negative eigenvalues, so it's positive semi-definite

These polynomials are more stable as they do not explode with higher powers

Embedding Computation

Other methods

  • Learnable parameters
  • Embeddings of node v
  • Embeddings of neighbours of v

Graph Convolutional Networks


\textcolor{orange}{h_{v}^{(k)}} =
\textcolor{skyblue}{f^{(k)}} \left(
    \underbrace{\textcolor{skyblue}{W^{(k)}} \cdot
    \frac{
        \sum_{u \in \mathcal{N}(v)} \textcolor{violet}{h_{u}^{(k-1)}}
    }{
        |\mathcal{N}(v)|
    }}_{\text{mean of previous neighbour embeddings}} + \underbrace{\textcolor{skyblue}{B^{(k)}} \cdot
\textcolor{orange}{h_{v}^{(k - 1)}}}_{\text{previous embeddings}}
\right) \forall v \in V

Graph Attention Networks


\textcolor{orange}{h_{v}^{(k)}} =
\textcolor{skyblue}{f^{(k)}} \left(
    \textcolor{skyblue}{W^{(k)}} \cdot \left[
    \underbrace{
        \sum_{u \in \mathcal{N}(v)} \alpha^{(k-1)}_{v,u}
        \textcolor{violet}{h_{u}^{(k-1)}}
     }_{\text{weighted mean of previous neighbour embeddings}}  +
        \underbrace{\alpha^{(k-1)}_{v,v}
        \textcolor{orange}{h_{v}^{(k-1)}}}_{\text{previous embeddings}}
\right] \right) \forall v \in V

where


\alpha^{(k)}_{v,u} = \frac{
    \textcolor{skyblue}{A^{(k)}}(
        \textcolor{orange}{h_{v}^{(k)}},
        \textcolor{violet}{h_{u}^{(k)}},
    )
}{
    \sum_{w \in \mathcal{N}(v)} \textcolor{skyblue}{A^{(k)}}(
        \textcolor{orange}{h_{v}^{(k)}},
        \textcolor{violet}{h_{w}^{(k)}},
    )
} \forall (v, u) \in E

Graph Sample and Aggregate (GraphSAGE)

Graph Isomorphism Network (GIN)


\textcolor{orange}{h_{v}^{(k)}} =
 \textcolor{skyblue}{f^{(k)}}
\left(
    \sum_{u \in \mathcal{N}(v)}
    \textcolor{violet}{h_{u}^{(k - 1)}} +
    (
        1 +
        \textcolor{skyblue}{\epsilon^{(k)}}
    ) \cdot \textcolor{orange}{h_{v}^{(k - 1)}}
\right)
\forall v \in V