5. Spectral Clustering ================================= For community detection using eigenvectors and eigenvalues. * Pre-processing : matrix representation : (noramlized) **Laplacian matrix**. * Decomposition : **second** eigenvalue and eigenvector. * Grouping : take the **sign** / maximize **eigengap**. 5.1 Concepts ----------------- **Cut** : :math:`cut(A,B) =\sum_{i\in A, j\in B} w_{i,j}`, but it sometimes could not capture the inner properties of the groups. **Conductance** : :math:`\phi(A,B) = cut(A,B)/ \min(vol(A), vol(B))`, will produce more balanced partitions. While optimize it is NP-hard. **Spectral graph partitioning** : using spectral properties to have approximation to conductance optimization problem. 5.2 Spectral Clustering ------------------------- * The property of the adjacency matrix A, with the label vector x. * Introudce the **Laplacian matrix** :math:`L = D-A`. * L has degrees in the disagonal part, and negative edge weight at the other part. * Non-negative real eigenvalues, and real eigenvectors. * We have :math:`L \mathbb{1} = 0`, :math:`\lambda_{1} = 0`, and :math:`\mathbb{1}` is its corresponding eigenvector. * Use normalized laplacian matrix :math:`\bar{L} = D^{-1/2}LD^{-1/2}` for better performance. (see the proof in HW1-Q4) The problem could be reformed to : .. math:: \begin{align*} &minimize \quad x^{T}Lx \\ &subject\ to \quad x^{T}x = 1, \ x^{T}\mathbb{1} = 0 \end{align*} We could prove that its optimal value is :math:`\lambda_{2}`, and we also have :math:`x^{T}Lx = \sum_{(i,j)\in E}(x_{i}-x_{j})^{2}`. (See the details in my hand writing notes or in the course slides.) Then the problem reduce to find the second eigenvalue and eigenvector of the Laplacian matrix. **Relationship with modularity** see in HW1-Q4. **Questions** : The upper problem is actually a L2-minimization problem. While the problem is better modeled as a cardinality problem, should we use l1 heuristic to get a better result? 5.3 K Clustering -------------------- * **Find K**: maximize the **eigengap** :math:`\Delta k = \mid \lambda_{k} - \lambda_{k-1}\mid` * Select the K eigenvectors, then apply K-mean clustering. 5.4 Motif-based spectral clustering ----------------------------- Generalize the 'edge' definitation to motif level. * Motif cut. * Motif count. .. math:: \phi_{M}(S) = \frac{\#motif\ cut}{motif\ volume} Example in the gene regulation show the **feedward loops**, exactly the same as shown in the course Human Behavioral Biology BIO150. 5.5 HW1 - q4 ------------------ `Hand writing homework `_ : **Normalized laplacian matrix** and **Relationship with modularity**. 6. Message Passing and Node Classification =================== Start of **Machine Learning** with Graphs. Message Passing and Node Classification : Given a network with labels on some nodes, assign labels to all other nodes in the network. * Relational classification * Iterative classification * Belief propagation 6.1 Correlations -------------------- * Homophily : the tendency of individuals to associate and bond with similar others. * Influence : social connections can influence the individual characteristics of a person. * Cofounding : both of the upper two acting together. Leverage the correlation : **Guilt-by-association** If I am connected to a node with label 𝑋, then I am likely to have label 𝑋 as well. Motivation: the close relationship of *Lable*, *Feature*, and *Neighborhood*. Problem: Given the adjacency matrix and the initial label vector, predict the full label. 6.2 Collectivce classification ----------------------- **Collectivce classification** (based on Markov assumption) the exact inference is NP-hard, so we will look at approximation inference. * Local classifier : initial assignment. * Relation Classifier : capture correlations between nodes (use the graph information). * Collective inference : propagate correlations through network. Apply to each node iteratively, for minimize the inconsistency between neighboring nodes' labels. 6.3 Probabilistic Relational Classifier ------------------------------ A flow process driven by the potential energy. .. math:: P(Y_{i} = c) = \frac{1}{\mid N_{i}\mid}\sum_{(i,j)\in E}W(i,j)P(Y_{j}=c) * The convergence not guaranteed. * Cannot use node feature information, only use the graph information. See HW2 Q1.1 for an example. 6.4 Iterative Classification --------------------------- Classify also based on the attributes of neighbor set. (plays a key role in Graph Neural Network) Each node maintains a vector of neighborhood labels. * **Bootstrap phase** , a warm start. * **Iteration phase** , repeatly update relational features and classify. * However, convergence is also not guaranteed. Example, `Fake Review Spam `_ . Using the property that *graph structure of the reviews is hard to fake*. 6.5 Brief Propagation ----------------------------- **Message Passing** .. image:: images/message_passing.png :align: center :width: 50% .. math:: m_{i\to j}(Y_{j}) = \alpha \sum_{Y_{i}\in L}\psi(Y_{i}, Y_{j})\phi_{i}(Y_{i}) \prod_{k\in N_{i} \setminus j} m_{k\to i}(Y_{i}) It contains the message of ith node multiplied by the edge weight, and all the messages node i received. (See HW2 Q1.2 for an example.) .. math:: b_{i}(Y_{i}) = \alpha \phi_{i}(Y_{i}) \prod_{j\in N_{i}}m_{j\to i }(Y_{i}), \ \forall Y_{i}\in L Summary of all states, the production of label-label potential, the prior, and all messages sent by the neigbors from previous round. Message Loop could cause bad effect, and convergence is also not guaranteed. `Fraud Detection in Online Auction Networks `_