8.1 KiB
Graph ML
Graph Introduction
- Nodes: Pieces of Information
- Edges: Relationship between nodes
- Mutual
- One-Sided
- Directionality
- Directed: We care about the order of connections
- Unidirectional
- Bidirectional
- Undirected: We don't care about order of connections
- Directed: We care about the order of connections
Now, we can have attributes over
- nodes
- edges
- master nodes (a collection of nodes and edges)
for example images may be represented as a graph where each non edge pixel is a vertex connected to other 8 ones. Its information at the vertex is a 3 (or 4) dimensional vector (think of RGB and RGBA)
Adjacency Graph
Take a picture and make a matrix with dimension \{0, 1\}^{(h \cdot w) \times (h \cdot w)} and we put a 1 if these
nodes are connected (share and edge), or 0 if they do not.
Note
For a
300 \times 250image our matrix would be\{0, 1\}^{(250 \cdot 300) \times (250 \cdot 300)}
The way we put a 1 or a 0 has this rules: - Row element has connection towards Column element - Column element has a connection coming from Row element
Tasks
Graph-Level
We want to predict a graph property
Node-Level
We want to predict a node property, such as classification
Edge-Level
We want to predict relationships between nodes such as if they share an edge, or the value of the edge they share.
For this task we may start with a fully connected graph and then prune edges, as predictions go on, to come to a sparse graph
Downsides of Graphs
- They are not consistent in their structure and sometimes representing something as a graph is difficult
- If we don't care about order of nodes, we need to find a way to represent this node-order equivariance
- Graphs may be too large
Representing Graphs
Adjacency List
We store info about:
- Nodes: list of values. index
Node_kis the value of that node - Edges: list of values. index
Edge_kis the value of that edge - Adjacent_list: list of Tuples with indices over nodes. index
Tuple_krepresent the Nodes involved in thekthedge - Graph: Value of graph
nodes: list[any] = [
"forchetta", "spaghetti", "coltello", "cucchiao", "brodo"
]
edges: list[any] = [
"serve per mangiare", "strumento", "cibo",
"strumento", "strumento", "serve per mangiare"
]
adj_list: list[(int, int)] = [
(0, 1), (0, 2), (1, 4),
(0, 3), (2, 3), (3, 4)
]
graph: any = "tavola"
If we find some parts of the graph that are disconnected, we can just avoid storing and computing those parts
Graph Neural Networks (GNNs)
At the simpkest form we take a graph-in and graph-out approach with MLPs separate for vertices, edges and master nodes that we apply one at a time over each element
\begin{aligned}
V_{i + 1} &= MLP_{V_{i}}(V_{i}) \\
E_{i + 1} &= MLP_{E_{i}}(E_{i}) \\
U_{i + 1} &= MLP_{U_{i}}(U_{i}) \\
\end{aligned}
Pooling
Caution
This step comes after the embedding phase described above
This is a step that can be used to take info about other elements, different from what we were considering (for example, taking info from edges while making the computation over vertices).
By using this approach we usually gather some info from edges of a vertex, then we concat them in a matrix and aggregate by summing them.
Message Passing
Take all node embeddings that are in the neighbouroud and do similar steps as the pooling function.
Special Layers
Polynomial Filters
Each polynomial filter is order invariant
Graph Laplacian
Let's set an order over nodes of a graph, where A is the adjacency matrix:
D_{v,v} = \sum_{u} A_{v,u}
In other words, D_{v, v} is the number of nodes connected ot that one
The graph Laplacian of the graph will be
L = D - A
Polynomials of Laplacian
These polynomials, which have the same dimensions of L, can be though as being filter like in
CNNs
p_{\vec{w}}(L) = w_{0}I_{n} + w_{1}L^{1} + \dots + w_{d}L^{d} = \sum_{i=0}^{d} w_{i}L^{i}
We then can get a filtered node by simply multiplying the polynomial with the node value
\begin{aligned}
\vec{x}' = p_{\vec{w}}(L) \vec{x}
\end{aligned}
Note
In order to extract new features for a single vertex, supposing only
w_1 \neq 0Observe that we are only taking
L_{v}\begin{aligned} \vec{x}'_{v} &= (L\vec{x})_{v} \\ &= \sum_{u \in G} L_{v,u} \vec{x}_{u} \\ &= \sum_{u \in G} (D_{v,u} - A_{v,u}) \vec{x}_{u} \\ &= \sum_{u \in G} D_{v,u} \vec{x}_{u} - A_{v,u} \vec{x}_{u} \\ &= D_{v, v} \vec{x}_{v} - \sum_{u \in \mathcal{N}(v)} \vec{x}_{u} \end{aligned}Where the last step holds as
Dis a diagonal matrix, and in the summatory we are only considering the neighbours of vIt can be demonstrated that in any graph
dist_{G}(v, u) > i \rightarrow L_{v, u}^{i} = 0More in general it holds
\begin{aligned} \vec{x}'_{v} = (p_{\vec{w}}(L)\vec{x})_{v} &= (p_{\vec{w}}(L))_{v} \vec{x} \\ &= \sum_{i = 0}^{d} w_{i}L_{v}^{i} \vec{x} \\ &= \sum_{i = 0}^{d} w_{i} \sum_{u \in G} L_{v,u}^{i}\vec{x}_{u} \\ &= \sum_{i = 0}^{d} w_{i} \sum_{\substack{u \in G \\ dist_{G}(v, u) \leq i}} L_{v,u}^{i}\vec{x}_{u} \\ \end{aligned}So this shows that the degree of the polynomial decides the max number of hops to be included during the filtering stage, like if it were defining a kernel
ChebNet
The polynomial in ChebNet becomes:
\begin{aligned}
p_{\vec{w}}(L) &= \sum_{i = 1}^{d} w_{i} T_{i}(\tilde{L}) \\
T_{i} &= cos(i\theta) \\
\tilde{L} &= \frac{2L}{\lambda_{\max}(L)} - I_{n}
\end{aligned}
T_{i}is Chebischev first kind polynomial\tilde{L}is a reduced version ofLbecause we divide for its max eigenvalue, keeping it in range[-1, 1]. MoreoverLha no negative eigenvalues, so it's positive semi-definite
These polynomials are more stable as they do not explode with higher powers
Embedding Computation
Other methods
- Learnable parameters
- Embeddings of node v
- Embeddings of neighbours of v
Graph Convolutional Networks
\textcolor{orange}{h_{v}^{(k)}} =
\textcolor{skyblue}{f^{(k)}} \left(
\underbrace{\textcolor{skyblue}{W^{(k)}} \cdot
\frac{
\sum_{u \in \mathcal{N}(v)} \textcolor{violet}{h_{u}^{(k-1)}}
}{
|\mathcal{N}(v)|
}}_{\text{mean of previous neighbour embeddings}} + \underbrace{\textcolor{skyblue}{B^{(k)}} \cdot
\textcolor{orange}{h_{v}^{(k - 1)}}}_{\text{previous embeddings}}
\right) \forall v \in V
Graph Attention Networks
\textcolor{orange}{h_{v}^{(k)}} =
\textcolor{skyblue}{f^{(k)}} \left(
\textcolor{skyblue}{W^{(k)}} \cdot \left[
\underbrace{
\sum_{u \in \mathcal{N}(v)} \alpha^{(k-1)}_{v,u}
\textcolor{violet}{h_{u}^{(k-1)}}
}_{\text{weighted mean of previous neighbour embeddings}} +
\underbrace{\alpha^{(k-1)}_{v,v}
\textcolor{orange}{h_{v}^{(k-1)}}}_{\text{previous embeddings}}
\right] \right) \forall v \in V
where
\alpha^{(k)}_{v,u} = \frac{
\textcolor{skyblue}{A^{(k)}}(
\textcolor{orange}{h_{v}^{(k)}},
\textcolor{violet}{h_{u}^{(k)}},
)
}{
\sum_{w \in \mathcal{N}(v)} \textcolor{skyblue}{A^{(k)}}(
\textcolor{orange}{h_{v}^{(k)}},
\textcolor{violet}{h_{w}^{(k)}},
)
} \forall (v, u) \in E
Graph Sample and Aggregate (GraphSAGE)
Graph Isomorphism Network (GIN)
\textcolor{orange}{h_{v}^{(k)}} =
\textcolor{skyblue}{f^{(k)}}
\left(
\sum_{u \in \mathcal{N}(v)}
\textcolor{violet}{h_{u}^{(k - 1)}} +
(
1 +
\textcolor{skyblue}{\epsilon^{(k)}}
) \cdot \textcolor{orange}{h_{v}^{(k - 1)}}
\right)
\forall v \in V