VLDB2024

SDEcho: Efficient Explanation of Aggregated Sequence Difference

Fei Ye, Zikang Liu, Xi Zhang, Yinan Jing, Zhenying He, Yuxin Che, Haoran Xiong, Kai Zhang, X. Sean Wang

Abstract

Understanding the reasons behind differences between aggregated sequences derived from SQL queries is crucial for data scientists. However, existing methods often suffer from being labor-intensive, lacking scalability, providing only approximate solutions, and inadequately supporting sequence difference explanations. In response, we introduce SDEcho, a novel framework designed to automate the explanation searching for sequence differences in high-dimensional and high-volume datasets. SDEcho utilizes advanced pruning techniques, considering pattern, order, and dimension perspectives, as well as their interactions, to prune the entire explanation space while maintaining explanations accurate and concise. This hybrid pruning approach significantly accelerates the explanation searching process, making SDEcho a valuable tool for data analysis tasks. Extensive experiments on synthetic and real-world datasets, along with a case study, demonstrate that SDEcho outperforms existing methods in terms of both effectiveness and efficiency.