VLDB2024

A Sampling-based Framework for Hypothesis Testing on Large Attributed Graphs

Yun Wang, Chrysanthi Kosyfaki, Sihem Amer-Yahia, Reynold Cheng

Abstract

Hypothesis testing is a statistical method used to draw conclusions about populations from sample data, typically represented in tables. With the prevalence of graph representations in real-life applications, hypothesis testing on graphs is gaining importance. In this work, we formalize node, edge, and path hypotheses on attributed graphs. We develop a sampling-based hypothesis testing framework, which can accommodate existing hypothesis-agnostic graph sampling methods. To achieve accurate and time-efficient sampling, we then propose a Path-Hypothesis-Aware SamplEr, PHASE, an m -dimensional random walk that accounts for the paths specified in the hypothesis. We further optimize its time efficiency and propose PHASE opt . Experiments on three real datasets demonstrate the ability of our framework to leverage common graph sampling methods for hypothesis testing, and the superiority of hypothesis-aware sampling methods in terms of accuracy and time efficiency.