SIGMOD2025
Incremental Rule Discovery in Response to Parameter Updates
Haoxian Chen, Wenfei Fan, Jiaye Zheng
1 citation
Abstract
This paper studies incremental rule discovery. Given a dataset D, rule discovery is to mine the set of the rules on D such that their supports and confidences are above thresholds ๐ and ๐ , respectively. We formulate incremental problems in response to updates ฮ๐ and/or ฮ๐ , to compute rules added and/or removed with respect to ๐ + ฮ๐ and ๐ + ฮ๐ . The need for studying the problems is evident since practitioners often want to adjust their support and confidence thresholds during discovery. The objective is to minimize unnecessary recomputation during the adjustments, not to restart the costly discovery process from scratch. As a testbed, we consider entity enhancing rules, which subsume popular data quality rules as special cases. We develop three incremental algorithms, in response to ฮ๐ , ฮ๐ and both. We show that relative to a batch discovery algorithm, these algorithms are bounded, i.e., they incur the minimum cost among all incrementalizations of the batch one, and parallelly scalable, i.e., they guarantee to reduce runtime when given more processors. Using real-life data, we empirically verify that the incremental algorithms outperform the batch counterpart by up to 658ร when ฮ๐ and ฮ๐ are either positive or negative.