Revised Notes
This commit is contained in:
parent
770af61909
commit
8a023a6079
@ -4,19 +4,25 @@
|
||||
|
||||
- **Nodes**: Pieces of Information
|
||||
- **Edges**: Relationship between nodes
|
||||
- **Mutual**
|
||||
- **One-Sided**
|
||||
- **Mutual**
|
||||
- **One-Sided**
|
||||
- **Directionality**
|
||||
- **Directed**: We care about the order of connections
|
||||
- **Unidirectional**
|
||||
- **Bidirectional**
|
||||
- **Undirected**: We don't care about order of connections
|
||||
- **Directed**: We care about the order of connections
|
||||
- **Unidirectional**
|
||||
- **Bidirectional**
|
||||
- **Undirected**: We don't care about order of connections
|
||||
|
||||
Now, we can have attributes over
|
||||
|
||||
- **nodes**
|
||||
- identity
|
||||
- number of neighbours
|
||||
- **edges**
|
||||
- identity
|
||||
- weight
|
||||
- **master nodes** (a collection of nodes and edges)
|
||||
- number of nodes
|
||||
- longest path
|
||||
|
||||
for example images may be represented as a graph where each non edge pixel is a vertex connected to other 8 ones.
|
||||
Its information at the vertex is a 3 (or 4) dimensional vector (think of RGB and RGBA)
|
||||
@ -50,11 +56,20 @@ We want to predict relationships between nodes such as if they share an edge, or
|
||||
For this task we may start with a fully connected graph and then prune edges, as predictions go on, to come to a
|
||||
sparse graph
|
||||
|
||||
### Downsides of Graphs
|
||||
### Challenges of dealing with graphs
|
||||
|
||||
- They are not consistent in their structure and sometimes representing something as a graph is difficult
|
||||
- If we don't care about order of nodes, we need to find a way to represent this **node-order equivariance**
|
||||
- Graphs may be too large
|
||||
While graphs are very powerful at representing structures in a compact and
|
||||
natural way, they have several challenges.
|
||||
|
||||
**The number of nodes in a graph may change wildly**, making difficult to
|
||||
work with different graphs.
|
||||
|
||||
**Sometimes nodes have no meaningful order**, thus we need to treat
|
||||
different orders in the same way.
|
||||
|
||||
**Graphs can be very large**, thus take a lot of space. However they are,
|
||||
usually, sparse in nature, making it possible to find ways to compress their
|
||||
representation.
|
||||
|
||||
## Representing Graphs
|
||||
|
||||
@ -70,12 +85,12 @@ We store info about:
|
||||
|
||||
```python
|
||||
nodes: list[any] = [
|
||||
"forchetta", "spaghetti", "coltello", "cucchiao", "brodo"
|
||||
"fork", "spaghetti", "knife", "spoon", "soup"
|
||||
]
|
||||
|
||||
edges: list[any] = [
|
||||
"serve per mangiare", "strumento", "cibo",
|
||||
"strumento", "strumento", "serve per mangiare"
|
||||
"needed to eat", "cutlery", "food",
|
||||
"cutlery", "cutlery", "needed to eat"
|
||||
]
|
||||
|
||||
adj_list: list[(int, int)] = [
|
||||
@ -83,15 +98,22 @@ adj_list: list[(int, int)] = [
|
||||
(0, 3), (2, 3), (3, 4)
|
||||
]
|
||||
|
||||
graph: any = "tavola"
|
||||
graph: any = "dining table"
|
||||
```
|
||||
|
||||
If we find some parts of the graph that are disconnected, we can just avoid storing and computing those parts
|
||||
|
||||
> [!CAUTION]
|
||||
> Even if in this example we used single values, edges, nodes and graphs may
|
||||
> be made of Tensors or Structured data
|
||||
|
||||
## Graph Neural Networks (GNNs)
|
||||
|
||||
At the simpkest form we take a **graph-in** and **graph-out** approach with MLPs separate for
|
||||
vertices, edges and master nodes that we apply **one at a time** over each element
|
||||
At the simpkest form we take a **graph-in** and **graph-out** approach with `MLPs`
|
||||
([Multi Layer Perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron)
|
||||
aka Neural Network)
|
||||
separate for vertices, edges and master
|
||||
nodes that we apply **one at a time** over each element
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
@ -101,20 +123,74 @@ $$
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
This also means that its output is a graph and we need further refining
|
||||
to get other kind of outputs. For example we can apply a classifier for each node
|
||||
embedding to get a class of that node.
|
||||
|
||||
### Pooling
|
||||
|
||||
> [!CAUTION]
|
||||
> This step comes after the embedding phase described above
|
||||
This is a step that allows us to take info from other graph element type, different
|
||||
from the ones we need. For example, we would like to have info coming from
|
||||
edges, but bring them over vertices.
|
||||
|
||||
This is a step that can be used to take info about other elements, different from what we were considering
|
||||
(for example, taking info from edges while making the computation over vertices).
|
||||
```python
|
||||
# Pseudo code for pooling
|
||||
def pool(items):
|
||||
|
||||
By using this approach we usually gather some info from edges of a vertex, then we concat them in a matrix and
|
||||
aggregate by summing them.
|
||||
# This is usually a tensor 1xD
|
||||
# where D is the embedding dimension
|
||||
pooled_value = init_from(items.type)
|
||||
|
||||
for item in items:
|
||||
|
||||
pooled_value += item.value
|
||||
|
||||
return pooled_value
|
||||
```
|
||||
|
||||
However, to make it use of parallel computation, we can also think of doing this
|
||||
|
||||
```python
|
||||
# Parallel pseudo code
|
||||
def parallel_pool(items):
|
||||
|
||||
# NxD matrix
|
||||
# Each row of this matrix is an embedding
|
||||
item_embedding_matrix = items.concat()
|
||||
|
||||
# Sum over rows
|
||||
return sum(item_embedding_matrix, dim=0)
|
||||
```
|
||||
|
||||
This, at its core, is useful when we lack properties about a portion of data,
|
||||
for example edges or hypnernode, and we use data coming from other parts to
|
||||
enable us to do computations.
|
||||
|
||||
### Message Passing
|
||||
|
||||
Take all node embeddings that are in the neighbouroud and do similar steps as the pooling function.
|
||||
However, if we already have some info, we can use [`pooling`](#pooling) to augment
|
||||
those info, taking into account connectivity of the graph by taking **only
|
||||
adjacent info of the same type**.
|
||||
|
||||
This means that at each step, a node receives info from its neighbours in the
|
||||
same fashion. After $step_k$ our node will have received partial information
|
||||
of a node locates $k$ steps away
|
||||
|
||||
### Weaving
|
||||
|
||||
If we combine previous techniques together, we can merge info coming from
|
||||
many parts of the graph. However if embeddings are not over the same dimension,
|
||||
we need a linear layer to make them match dimensions.
|
||||
|
||||
As we are going to use a linear layer, this means that we are using a **learned**
|
||||
representation, **which must be trained**.
|
||||
|
||||
> [!CAUTION]
|
||||
> Because graphs may be very sparsely connected an the longest path may be over
|
||||
> our layer number, to bypass this, we weave info over the hypernode as well.
|
||||
>
|
||||
> The graph node will be used as if it were a node connected to all nodes, giving
|
||||
> each node an overview of all node information
|
||||
|
||||
### Special Layers
|
||||
|
||||
@ -134,68 +210,152 @@ D_{v,v} = \sum_{u} A_{v,u}
|
||||
|
||||
$$
|
||||
|
||||
In other words, $D_{v, v}$ is the number of nodes connected ot that one
|
||||
In other words, $D_{v, v}$ is the number of nodes connected to the node $v$. In
|
||||
fact, notice that we are summing over rows of the **Adjacency Matrix** $A$.
|
||||
|
||||
The **graph Laplacian** of the graph will be
|
||||
The [**Graph Laplacian**](https://en.wikipedia.org/wiki/Laplacian_matrix)
|
||||
$L$ of the graph will be
|
||||
|
||||
$$
|
||||
L = D - A
|
||||
$$
|
||||
|
||||
### Polynomials of Laplacian
|
||||
As we can see, $L$ will have all elements of $D$ untouched, as each node has no
|
||||
connection to itself in the adjacency matrix, and all elements of the
|
||||
adjacency matrix with opposite sign (which means all negative)
|
||||
|
||||
These polynomials, which have the same dimensions of $L$, can be though as being **filter** like in
|
||||
[CNNs](./../7-Convolutional-Networks/INDEX.md#convolutional-networks)
|
||||
### Convolution throught Polynomials of Laplacian
|
||||
|
||||
<!-- TODO: Study Laplacian Filters -->
|
||||
|
||||
Let's construct some polynomials by using the Laplacian Matrix:
|
||||
|
||||
$$
|
||||
p_{\vec{w}}(L) = w_{0}I_{n} + w_{1}L^{1} + \dots + w_{d}L^{d} = \sum_{i=0}^{d} w_{i}L^{i}
|
||||
$$
|
||||
|
||||
We then can get a ***filtered node*** by simply multiplying the polynomial with the node value
|
||||
Here each $w_i$ is a weight over a vector $\vec{w} = [w_0, \dots, w_n] \in \R^n$.
|
||||
We then can define the convolution as $p_{\vec{w}}(L) \times \vec{x}$, where
|
||||
**<ins>x is the vector of all stacked vertices embeddings</ins>**
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
\vec{x}' = p_{\vec{w}}(L) \vec{x}
|
||||
\vec{x} &\in \R^{l\times d}\\
|
||||
\vec{x}' &= p_{\vec{w}}(L) \vec{x}
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
> [!NOTE]
|
||||
> In order to extract new features for a single vertex, supposing only $w_1 \neq 0$
|
||||
>
|
||||
> Observe that we are only taking $L_{v}$
|
||||
To explain the ***convolutional effect***, let's see what happens for a polynomial
|
||||
of $\vec{w}_0 = [w_0, 0, \dots, 0] \in \R^n$:
|
||||
|
||||
$$
|
||||
\vec{x}' = p_{\vec{w}_0}(L) = w_0I_{l} \times \vec{x}
|
||||
$$
|
||||
|
||||
Here each item of $\vec{x}'$ is just a scaled version of itself. Now consider a
|
||||
polynomial of $\vec{w}_1 = [0, w_1, \dots, 0] \in \R^n$:
|
||||
|
||||
$$
|
||||
\vec{x}' = p_{\vec{w}_1}(L) = w_1L^1\times \vec{x}
|
||||
$$
|
||||
|
||||
Here vertices merge info coming from their connected vertices plus a contribute of
|
||||
themselves (scaled by the number of connections).
|
||||
|
||||
> [!TIP]
|
||||
> As this may have not been very clear from these formulas, let's make a practical
|
||||
> example, where vertex 0 is connected to both vertices 1 and 2:
|
||||
>
|
||||
> $$
|
||||
> \begin{aligned}
|
||||
> \vec{x}'_{v} &= (L\vec{x})_{v} \\
|
||||
> &= \sum_{u \in G} L_{v,u} \vec{x}_{u} \\
|
||||
> &= \sum_{u \in G} (D_{v,u} - A_{v,u}) \vec{x}_{u} \\
|
||||
> &= \sum_{u \in G} D_{v,u} \vec{x}_{u} - A_{v,u} \vec{x}_{u} \\
|
||||
> &= D_{v, v} \vec{x}_{v} - \sum_{u \in \mathcal{N}(v)} \vec{x}_{u}
|
||||
> \vec{x} &= \begin{bmatrix}
|
||||
> \vec{x}_0 \in \R^{1 \times d} \\
|
||||
> \vec{x}_1 \in \R^{1 \times d} \\
|
||||
> \vec{x}_2 \in \R^{1 \times d} \\
|
||||
> \end{bmatrix}
|
||||
> \\
|
||||
> L &= \begin{bmatrix}
|
||||
> 2 & -1 & -1 \\
|
||||
> -1 & 1 & 0 \\
|
||||
> -1 & 0 & 1
|
||||
> \end{bmatrix}
|
||||
> \\
|
||||
> \vec{x}' &= w_1 L \vec{x} = \begin{bmatrix}
|
||||
> 2w_1 & -w_1 & -w_1 \\
|
||||
> -w_1 & w_1 & 0 \\
|
||||
> -w_1 & 0 & w_1
|
||||
> \end{bmatrix}
|
||||
> \times
|
||||
> \begin{bmatrix}
|
||||
> \vec{x}_0 \\
|
||||
> \vec{x}_1 \\
|
||||
> \vec{x}_2
|
||||
> \end{bmatrix}
|
||||
> \\
|
||||
> &= \begin{bmatrix}
|
||||
> 2w_1\vec{x}_0 - w_1\vec{x}_1 - w_1 \vec{x}_2 \in \R^{1 \times d} \\
|
||||
> -w_1\vec{x}_0 + w_1\vec{x}_1 + 0\vec{x}_2 \in \R^{1 \times d} \\
|
||||
> -w_1\vec{x}_0 + 0\vec{x}_1 + w_1\vec{x}_2 \in \R^{1 \times d} \\
|
||||
> \end{bmatrix}
|
||||
> \end{aligned}
|
||||
> $$
|
||||
>
|
||||
> Where the last step holds as $D$ is a diagonal matrix, and in the summatory we are only considering the neighbours
|
||||
> of v
|
||||
> For simplicity we wrote vertices as vectors rather than decomposing into their
|
||||
> elements. As we can notice here, we combined adjacent vertices.
|
||||
|
||||
Now let's see what happens to the Laplacian Matrix for the power of 2:
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
L &= \begin{bmatrix}
|
||||
l_{0, 0} & l_{0, 1} & l_{0, 2} \\
|
||||
l_{1, 0} & l_{1, 1} & l_{1, 2} \\
|
||||
l_{2, 0} & l_{2, 1} & l_{2, 2} \\
|
||||
\end{bmatrix}
|
||||
\\
|
||||
L \times L &=
|
||||
\begin{bmatrix}
|
||||
l_{0, 0} & l_{0, 1} & l_{0, 2} \\
|
||||
l_{1, 0} & l_{1, 1} & l_{1, 2} \\
|
||||
l_{2, 0} & l_{2, 1} & l_{2, 2} \\
|
||||
\end{bmatrix}
|
||||
\times
|
||||
\begin{bmatrix}
|
||||
l_{0, 0} & l_{0, 1} & l_{0, 2} \\
|
||||
l_{1, 0} & l_{1, 1} & l_{1, 2} \\
|
||||
l_{2, 0} & l_{2, 1} & l_{2, 2} \\
|
||||
\end{bmatrix} = \\
|
||||
&= \begin{bmatrix}
|
||||
l_{0, 0}^2 + l_{0, 1}l_{1, 0} + l_{0, 2}l_{2, 0} &
|
||||
l_{0, 0}l_{0, 1} + l_{0, 1}l_{1, 1} + l_{0, 2}l_{2, 1} &
|
||||
l_{0, 0}l_{0, 2} + l_{0, 1}l_{1, 2} + l_{0, 2}l_{2, 2} \\
|
||||
l_{1, 0}l_{0, 0} + l_{1, 1}l_{1, 0} + l_{1, 2} l_{2, 0} &
|
||||
l_{1, 0}l_{0, 1} + l_{1, 1}^2 + l_{1, 2}l_{2, 1} &
|
||||
l_{1, 0}l_{0, 2} + l_{1, 1}l_{1, 2} + l_{1, 2}l_{2, 2} \\
|
||||
l_{2, 0}l_{0, 0} + l_{2, 1}l_{1, 0}+ l_{2, 2} l_{2, 0} &
|
||||
l_{2, 0}l_{0, 1} + l_{2, 1}l_{1, 1} + l_{2, 2}l_{2, 1} &
|
||||
l_{2, 0}l_{0, 2} + l_{2, 1}l_{1, 2} + l_{2, 2}^2
|
||||
\end{bmatrix}
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
As we can see, information from neighbour connections leak into the Laplacian
|
||||
graph. Over the extreme cases, it's possible to demonstrate that if the distance
|
||||
between 2 nodes is more than the power of the Laplacian Matrix, they are
|
||||
considered deatached up to their distance.
|
||||
|
||||
This means that by manipulating the degree of the polynomial, we can choose the
|
||||
diffusion distance for infomation. For example, if we want info from vertices
|
||||
at max 3 steps away, we will use a polynomial of 3rd degree.
|
||||
|
||||
Another interesting properties of these polynomials is that they do not depend on
|
||||
the order of connections, but only over their existence, so they are order
|
||||
equivariant.
|
||||
|
||||
> [!TIP]
|
||||
> Go [here](./python-experiments/laplacian_graph.ipynb) to see some experiments.
|
||||
>
|
||||
> It can be demonstrated that in any graph
|
||||
>
|
||||
> $$
|
||||
> dist_{G}(v, u) > i \rightarrow L_{v, u}^{i} = 0
|
||||
> $$
|
||||
>
|
||||
> More in general it holds
|
||||
>
|
||||
> $$
|
||||
> \begin{aligned}
|
||||
> \vec{x}'_{v} = (p_{\vec{w}}(L)\vec{x})_{v} &= (p_{\vec{w}}(L))_{v} \vec{x} \\
|
||||
> &= \sum_{i = 0}^{d} w_{i}L_{v}^{i} \vec{x} \\
|
||||
> &= \sum_{i = 0}^{d} w_{i} \sum_{u \in G} L_{v,u}^{i}\vec{x}_{u} \\
|
||||
> &= \sum_{i = 0}^{d} w_{i} \sum_{\substack{u \in G \\ dist_{G}(v, u) \leq i}} L_{v,u}^{i}\vec{x}_{u} \\
|
||||
> \end{aligned}
|
||||
> $$
|
||||
>
|
||||
> So this shows that the degree of the polynomial decides the max number of hops
|
||||
> to be included during the filtering stage, like if it were defining a [kernel](./../7-Convolutional-Networks/INDEX.md#filters)
|
||||
> In particular, see the second one
|
||||
|
||||
### ChebNet
|
||||
|
||||
@ -210,23 +370,65 @@ T_{i} &= cos(i\theta) \\
|
||||
$$
|
||||
|
||||
- $T_{i}$ is Chebischev first kind polynomial
|
||||
- $\theta$ are the weights to be learnt[^chebnet]
|
||||
- $\tilde{L}$ is a reduced version of $L$ because we divide for its max eigenvalue,
|
||||
keeping it in range $[-1, 1]$. Moreover $L$ ha no negative eigenvalues, so it's
|
||||
positive semi-definite
|
||||
|
||||
These polynomials are more stable as they do not explode with higher powers
|
||||
|
||||
> [!NOTE]
|
||||
>
|
||||
> Even though we said that we could control the radius of neighbours leaking info,
|
||||
> technically speaking, if we stack many GCN together, we get info from nodes at
|
||||
> $N \cdot hops$, assuming each layer takes from max $hops$.
|
||||
>
|
||||
> So, let's say we have 2 layers taking from distance 3, after the computation a
|
||||
> node gets info from nodes at 6 hops distance.
|
||||
|
||||
### Embedding Computation
|
||||
|
||||
<!-- TODO: Read PDF 14 Anelli from 81 to 83 -->
|
||||
Whenever we are computing over nodes, we firstly need to compute their embedding.
|
||||
From there on, all embeddings will be treated as inputs for our networks.
|
||||
|
||||
## Other methods
|
||||
$$
|
||||
\begin{aligned}
|
||||
g(x) &\coloneqq \text{Any non linear function} \\
|
||||
\vec{x}' &= g\left(p_{\vec{w}}(L) \times \vec{x}\right)
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
- <span style="color:skyblue">Learnable parameters</span>
|
||||
For each layer the same weights are applied, like in
|
||||
[CNNs](./../7-Convolutional-Networks/INDEX.md#convolutional-layer)
|
||||
|
||||
## Real Graph Networks Configurations
|
||||
|
||||
To sum up all we have seen until now, to make computations over graphs we
|
||||
usually take info from neighbours (Node Aggregation) and transform old info
|
||||
of our node and combine them (Node Update).
|
||||
|
||||
- <span style="color:skyblue">(Potentially) Learnable parameters</span>
|
||||
- <span style="color:orange">Embeddings of node v</span>
|
||||
- <span style="color:violet">Embeddings of neighbours of v</span>
|
||||
|
||||
### Graph Convolutional Networks
|
||||
### Graph Neural Network
|
||||
|
||||
$$
|
||||
\textcolor{orange}{h_v^{(k)}} =
|
||||
\textcolor{skyblue}{f_2^{(k)}} \left(
|
||||
\underbrace{
|
||||
\sum_{n\in\mathcal{N}} \textcolor{skyblue}{f_1^k}(
|
||||
\textcolor{orange}{h_n^{(k-1)}},
|
||||
\textcolor{orange}{h_i^{(k-1)}},
|
||||
\textcolor{orange}{e_{i, n}}
|
||||
),
|
||||
}_{\text{message passing of edges to nodes}}\,\,
|
||||
\textcolor{orange}{h_v^{(k -1)}}
|
||||
\right)
|
||||
$$
|
||||
|
||||
### Graph Convolutional Network
|
||||
|
||||
$$
|
||||
\textcolor{orange}{h_{v}^{(k)}} =
|
||||
@ -241,6 +443,11 @@ $$
|
||||
\right) \forall v \in V
|
||||
$$
|
||||
|
||||
> [!TIP]
|
||||
>
|
||||
> $\textcolor{skyblue}{f^{(k)}}$ here is a
|
||||
> **<ins>Non-Linear Activation function</ins>**
|
||||
|
||||
### Graph Attention Networks
|
||||
|
||||
$$
|
||||
@ -256,25 +463,101 @@ $$
|
||||
\right] \right) \forall v \in V
|
||||
$$
|
||||
|
||||
where
|
||||
where $\alpha^{(k)}_{v,u}$ are weights generated by an attention mechanism
|
||||
$A^{(k)}$ that is normalized to make all sums of $\alpha^{(k)}_{v,u}$ be 1 for each
|
||||
node.
|
||||
|
||||
$$
|
||||
\alpha^{(k)}_{v,u} = \frac{
|
||||
\textcolor{skyblue}{A^{(k)}}(
|
||||
\textcolor{orange}{h_{v}^{(k)}},
|
||||
\textcolor{violet}{h_{u}^{(k)}},
|
||||
\textcolor{violet}{h_{u}^{(k)}}
|
||||
)
|
||||
}{
|
||||
\sum_{w \in \mathcal{N}(v)} \textcolor{skyblue}{A^{(k)}}(
|
||||
\textcolor{orange}{h_{v}^{(k)}},
|
||||
\textcolor{violet}{h_{w}^{(k)}},
|
||||
\textcolor{violet}{h_{w}^{(k)}}
|
||||
)
|
||||
} \forall (v, u) \in E
|
||||
$$
|
||||
|
||||
> [!TIP]
|
||||
> To make computations faster, we can just compute all
|
||||
> $\textcolor{skyblue}{A^{(k)}}(
|
||||
> \textcolor{orange}{h_{v}^{(k)}},
|
||||
> \textcolor{violet}{h_{u}^{(k)}}
|
||||
> )$ and then sum them up at a later stage to compute $\alpha^{(k)}_{v,u}$
|
||||
>
|
||||
> Also, this system (GAT) is flexible enough to make us change attention
|
||||
> mechanisms and include more heads, as the one above is done with just an
|
||||
> attention head
|
||||
|
||||
### Graph Sample and Aggregate (GraphSAGE)
|
||||
|
||||
<!-- TODO: See PDF 14 Anelli from 98 to 102 -->
|
||||
Here, instead of taking all neighbours, we take a fixed size of neighbour, called
|
||||
`pool`, taken at random, increasing variance but allowing this method to be
|
||||
applied to large graphs
|
||||
|
||||
$$
|
||||
\textcolor{orange}{h_v^{(k)}} =
|
||||
\textcolor{skyblue}{f^{(k)}} \left(
|
||||
\textcolor{skyblue}{W^{(k)}}
|
||||
\cdot
|
||||
\underbrace{
|
||||
\left[
|
||||
\underbrace{
|
||||
\textcolor{skyblue}{AGGR}_{u\in \mathcal{N}(v)}(
|
||||
\textcolor{violet}{h_u^{(k-1)}}
|
||||
)
|
||||
}_{\text{Aggregation of v's neighbours}}
|
||||
,
|
||||
\textcolor{orange}{h_v^{(k-1)}}
|
||||
\right]
|
||||
}_{\text{Concatenation of aggr. and prev. embedding}}
|
||||
|
||||
\right) \forall v \in V
|
||||
$$
|
||||
|
||||
where
|
||||
$\textcolor{skyblue}{AGGR}_{u\in \mathcal{N}(v)}(
|
||||
\textcolor{violet}{h_u^{(k-1)}}
|
||||
)$ may be one of the followings:
|
||||
|
||||
#### Mean
|
||||
|
||||
This is similar to what we had on GCN above
|
||||
$$
|
||||
\textcolor{skyblue}{AGGR}_{u\in \mathcal{N}(v)}(
|
||||
\textcolor{violet}{h_u^{(k-1)}}
|
||||
) = \textcolor{skyblue}{W_{pool}^{(k)}} \cdot
|
||||
\frac{
|
||||
\textcolor{orange}{h_v^{(k-1)}}+
|
||||
\sum_{u\in \mathcal{N}(v)}\textcolor{violet}{h_u^{(k-1)}}
|
||||
|
||||
}{
|
||||
1 + |\mathcal{N(v)}|
|
||||
}
|
||||
$$
|
||||
|
||||
#### Dimension Wise Maximum
|
||||
|
||||
$$
|
||||
\textcolor{skyblue}{AGGR}_{u\in \mathcal{N}(v)}(
|
||||
\textcolor{violet}{h_u^{(k-1)}}
|
||||
) = \max_{u \in \mathcal{N}(v)} \left\{
|
||||
\sigma(
|
||||
\textcolor{skyblue}{W_{pool}^{(k)}} \cdot
|
||||
\textcolor{violet}{h_u^{(k-1)}} +
|
||||
\textcolor{skyblue}{b}
|
||||
)
|
||||
\right\}
|
||||
$$
|
||||
|
||||
#### LSTM Aggregator
|
||||
|
||||
This is another network based on
|
||||
[LSTM](./../8-Recurrent-Networks/INDEX.md#long-short-term-memory--lstm). This
|
||||
methos **requires ordering the sequence of neighbours**
|
||||
|
||||
### Graph Isomorphism Network (GIN)
|
||||
|
||||
@ -290,4 +573,6 @@ $$
|
||||
) \cdot \textcolor{orange}{h_{v}^{(k - 1)}}
|
||||
\right)
|
||||
\forall v \in V
|
||||
$$
|
||||
$$
|
||||
|
||||
[^chebnet]: [Data Warrior | Graph Convolutional Neural Network (Part II) | 9th November 2025](https://datawarrior.wordpress.com/2018/08/12/graph-convolutional-neural-network-part-ii/)
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user