SIGMOD2025

Incremental Rule Discovery in Response to Parameter Updates

Haoxian Chen, Wenfei Fan, Jiaye Zheng

被引用 1 次

摘要

This paper studies incremental rule discovery. Given a dataset D, rule discovery is to mine the set of the rules on D such that their supports and confidences are above thresholds 𝜎 and 𝛅 , respectively. We formulate incremental problems in response to updates Δ𝜎 and/or Δ𝛅, to compute rules added and/or removed with respect to 𝜎 + Δ𝜎 and 𝛅 + Δ𝛅. The need for studying the problems is evident since practitioners often want to adjust their support and confidence thresholds during discovery. The objective is to minimize unnecessary recomputation during the adjustments, not to restart the costly discovery process from scratch. As a testbed, we consider entity enhancing rules, which subsume popular data quality rules as special cases. We develop three incremental algorithms, in response to Δ𝜎 , Δ𝜎 and both. We show that relative to a batch discovery algorithm, these algorithms are bounded, i.e., they incur the minimum cost among all incrementalizations of the batch one, and parallelly scalable, i.e., they guarantee to reduce runtime when given more processors. Using real-life data, we empirically verify that the incremental algorithms outperform the batch counterpart by up to 658× when Δ𝜎 and Δ𝜎 are either positive or negative.