VLDB2025
Data Disovery in Data Lakes: Operations, Indexes, Systems
Ziawasch Abedjan, Mahdi Esmailoghli, Sainyam Galhorta
2 citations
Abstract
Data discovery has gained significant traction in the database community resulting in various discovery operations, index schemes, and discovery systems. This tutorial explores the architecture and components of data discovery systems, focusing on indexing structures and scalable algorithms for typical operations, such as join and union discovery. While giving insights into individual algorithms, we point out open challenges for holistic systems, data discovery evaluation, and discovery in federated setups.