18. Limitations of GNN ============================ Recap `GNN `_ : **Aggregate Neighbors** **Computation Graph** **Graph Pooling**. Many model variants have been proposed with different choice of neural networks. * GCN [Kipf & Welling ICLR'2017] * GraphSAGE [Hamilton+ NeurIPS’2017] Applications: * Node classification [Kipf+ ICLR’2017] * Graph Classification [Ying+ NeurIPS’2018] * Link Prediction [Zhang+ NeurIPS’2018] 18.1 Limitation : Graph Structure ---------------------------- **Graph Structure** Some simple graph structures cannot be distinguished by conventional GNNs. -> **graph isomorphism test problem** . .. image:: images/limit_1.PNG :align: center :width: 80% Node representation captures rooted subtree structure : Most discriminative GNNs map different subtrees into different node representations. -> **Injectivity** [1]_ of the funtion to map to different graph structure. -> **Injective Neighbor Aggregation** . Entire neighbor aggregation is injective if every step of neighbor aggregation is injective. **Multi-set Functions** : Neighbor aggregation is essentially a function over multi-set (set with repeating elements). Discriminative Power of GNNs can be characterized by that of **multi-set functions** . * GCN (mean pooling) will fail to distinguish proportionally equivalent multi-sets. * GraphSAGE (max pooling) will even fail to distinguish multi-set with the same distinct elements. **Injective Multi-set Functions** Theorem: Any injective multi-set function can be expressed by, where 𝜙 and 𝑓 are non-linear functions : .. math:: \phi(\sum_{x\in S}f(x)) We can model 𝜙 and 𝑓 using Multi-Layer-Perceptron (MLP) (Note: MLP is a universal approximator). .. image:: images/mlp.PNG :align: center :width: 60% Graph Isomorphism Network (GIN) [Xu+ ICLR’2019], using MLP + sum pooling. So far: GIN achieves maximal discriminative power by using injective neighbor aggregation. GIN fits training data much better than GCN, GraphSAGE. GIN is closely related to Weisfeiler-Lehman (WL) [2]_ Graph Isomorphism Test (1968). -> GIN is as discriminative as the WL test. .. [1] Function is injective if it maps different elements into different outputs. .. [2] WL test has been known to distinguish most of the graphs, except for some corner cases. 18.2 Limitation : Graph Noise -------------------- **Graph Noise** GNNs are not robust to noise in graph data. .. image:: images/limit_2.PNG :align: center :width: 100% Adversaries are very common in applications of graph neural networks, e.g., search engines, recommender systems, social networks, etc. **Semi-supervised node classification** using Graph Convolutional Neural Networks (GCN) [Kipf+ ICLR’2017] .. image:: images/attack.PNG :align: center :width: 80% **Net Attack** : Zügner+, Adversarial Attacks on Neural Networks for Graph Data, KDD’18. *Maximize (Change of predicted labels of target node) Subject to (Limited noise in the graph)* . .. image:: images/attack_math.PNG :align: center :width: 60% * Some heuristics have been proposed to efficiently obtain an approximate solution (see the original paper). * The GCN prediction is easily manipulated by only 5 modifications of graph structure (|V|=~2k, |E|=~5k) * GCN does neighbor aggregation : only local attack will work. **GNNs are not robust to adversarial attacks!** 18.2 Future Directions ------------------------ Science Domains: * Chemistry: Molecular graphs. * Biology: Protein-Protein Interaction Networks. Challenges of Applying GNN: * Scarcity of labeled data : Labels require expensive experiments -> Models overfit to small training datasets. * Out-of-distribution prediction : Test examples are very different from training in scientific discovery -> Models typically perform poorly. Pre-training GNNs [Hu+ 2019] : Pre-train GNNs on relevant, easy to obtain graph data. 19. Applications of GNN ========================== 19.1 GNN recommendation (PinSage) ---------------------- `Recap ML Bipartite `_ * Given : Users interacts with items. * Goal: Learn what items are related -> Recommend items users might like. **Define similarity** : * Content-based: User and item features, in the form of images, text, categories, etc. * Graph-based (collaborative filtering): User-item interactions, in the form of graph/network structure (Estimate what X will like based on what similar others like). **Recommendations via Embeddings** : Goal: Learn embeddings for items. Find the closest embedding by nearest neighbor. Recommend it. **PinSage** graph convolutional network : Goal: Generate embeddings for nodes (e.g., pins) in the Pinterest graph containing billions of objects. Key Idea: Borrow information from nearby nodes. **Data Set** : Positive pair: Two pins that are consecutively saved into the same board within a time interval (1 hour) ; Negative pair: A random pair of 2 pins. **Max-margin loss** : .. math:: L = \sum_{(u,v)\in D} \max(0, -z^{T}_{u}z_{v} + z^{T}_{u}z_{n} + \Delta) **Four key innovations** : * On-the-fly graph convolutions : Perform a localized graph convolution around a particular node, Does not need the entire graph during training. * Selecting neighbors via random walks, as performing aggregation on all neighbors is infeasible. Personalized PageRank can help, and Define Importance pooling. * Efficient MapReduce inference to avoid repeated computation. Problem: Many repeated computation if using localized graph convolution at inference step. * Use harder and harder negative samples (using random walk) : Force model to learn subtle distinctions between pins. 19.2 Heterogeneous GNN (Decagon) --------------------------- *Model and predict side effects of drug pairs* `http://snap.stanford.edu/decagon/ `_ **Polypharmacy** : use multiple drugs for a disease. *High risk of side effects due to interactions* . Difficult to identify manually. Idea: Computationally screen/predict polypharmacy side effects. .. image:: images/polypharmacy-graph.png :align: center :width: 80% **Heterogeneous (multimodal) graphs** : graphs with different node types and/or edge types. Goal: Given a partially observed graph, predict labeled edges between drug nodes. **Polypharmacy heterogenous graph** : constructs a large two-layer multimodal graph of protein-protein interactions, drug-protein interactions, and drug-drug interactions (i.e. side effects). * There is a wide range in how frequently certain side effects occur in drug combinations. (more than 53% of the side effects occur in less than 3% of the drug combinations) * Polypharmacy side effects do not appear independently of one another in co-prescribed drug paris, suggesting that joint modeling over multiple side effects can aid in the prediction task. **Decagon** : * *heterogenous graph -> node embeddings* : Compute GNN messages from each edge type, then aggregate across different edge types. .. math:: h_{i}^{(k+1)} = \phi(\sum_{r}\sum_{j\in N_{i, r}}c_{r}^{ij}W_{r}^{k}h_{j}^{(k)} + c_{r}^{i}h_{i}^{k}) * *Node embeddings of query drug pairs -> predicted edges* : Use pair of computed node embeddings to make edge predictions. .. math:: g(v_{i},r,v_{j}) = \begin{cases} z_{i}^{T}D_{r}RD_{r}z_{j}& if\ v_{i}\ and\ v_{j}\ are \ drugs \\ z_{i}^{T}M_{r}z_{j} & otherwise (both\ proteins,\ or\ one\ protein\ one\ drug) \end{cases} Where, :math:`D_{r}` modeling the importance of each dimension in the node embedding towards the side effect r (tensor factorization). And R models the interaction. .. image:: images/Decagon.PNG :align: center :width: 90% 19.3 Goal-directed generation (GCPN) ------------------------------- `Recap Graph RNN `_ [You et al., ICML 2018]. Generating graphs via sequentially adding nodes and edges. `code github `_ Including “Black-box” in ML: (Environment is a blackbox to the agent) Objectives like drug-likeness are governed by physical law, which are assumed to be unknown to us! -> **Solution : Reinforcement Learning** : A ML agent observes the environment, takes an action to interact with the environment, and receives positive or negative reward. The agent then learns from this loop. Policy: Agent behavior, which maps observation to action. Policy-based RL: An agent directly learns an optimal policy from data. **Graph Convolutional Policy Network** combines graph representation + RL: * Graph Neural Network captures complex structural information, and enables validity check in each state transition (Valid). * Reinforcement learning optimizes intermediate/final rewards (High scores). * Adversarial training imitates examples in given datasets (Realistic). .. image:: images/GCPN.PNG :align: center * (a) Insert nodes/scaffolds * (b) Compute state via GCN * (c) Sample next action * (d) Take action (check chemical validity) * (e, f) Compute reward