VLDB2020

The Computation of Optimal Subset Repairs

Dongjing Miao, Zhipeng Cai, Jianzhong Li, Xiangyu Gao, Xianmin Liu

23 citations

Abstract

Computing an optimal subset repair of an inconsistent database is becoming a standalone research problem and has a wide range of applications. However, it has not been well-studied yet. A tight inapproximability bound of the problem computing optimal subset repairs is still unknown, and there is still no existing algorithm with a constant approximation factor better than two. In this paper, we prove a new tighter inapproximability bound of the problem computing optimal subset repairs. We show that it is usually NP-hard to approximate it within a factor better than 17/16. An algorithm with an approximation ratio (2 - 1/2σ-1)is developed, where σ is the number of functional dependencies. It is the current best algorithm in terms of approximation ratio. The ratio can be further improved if there are a large amount of quasi-Turán clusters in the input database. Plenty of experiments are conducted on real data to examine the performance and the effectiveness of the proposed approximation algorithms in real-world applications.