KDD2020

InfiniteWalk: Deep Network Embeddings as Laplacian Embeddings with a Nonlinearity

Sudhanshu Chanpuriya, Cameron Musco

被引用 25 次

摘要

The skip-gram model for learning word embeddings [21] has been widely popular, and DeepWalk [25] , among other methods, has extended the model to learning node representations from networks. Recent work of Qiu et al. [26] provides a closed-form expression for the DeepWalk objective, obviating the need for sampling for small datasets and improving accuracy. In these methods, the "window size" T within which words or nodes are considered to co-occur is a key hyperparameter. We study the objective in the T → ∞ limit, which allows us to simplify the expression from [26] . We prove that this limiting objective corresponds to factoring a simple transformation of the pseudoinverse of the graph Laplacian, linking DeepWalk to extensive prior work in spectral graph embeddings. Further, we show that by applying a simple nonlinear entrywise transformation to this pseudoinverse, we recover a good approximation of the finite-T objective and embeddings that are competitive with those from DeepWalk and other skip-gram methods in multilabel classification. Surprisingly, we find that even simple binary thresholding of the Laplacian pseudoinverse is often competitive, suggesting that the core advancement of recent methods is a nonlinearity on top of the classical spectral embedding approach. CCS CONCEPTS • Information systems → Data mining; • Computing methodologies → Dimensionality reduction and manifold learning.