VLDB2021

Hyperspace: The Indexing Subsystem of Azure Synapse

Rahul Potharaju, Terry Kim, Eunjin Song, Wentao Wu, Lev Novik, Apoorve Dave, Pouria Pirzadeh, Andrew Fogarty, Gurleen Dhody, Jiying Li, Vidip Acharya, Sinduja Ramanujam, Nicolas Bruno, César A. Galindo-Legaria, Vivek R. Narasayya, Surajit Chaudhuri, Anil Nori, Tomas Talius, Raghu Ramakrishnan

被引用 10 次

摘要

Microsoft recently introduced Azure Synapse Analytics, which offers an integrated experience across data ingestion, storage, and querying in Apache Spark and T-SQL over data in the lake, including files and warehouse tables. In this paper, we present our experiences with designing and implementing Hyperspace, the indexing subsystem underlying Synapse. Hyperspace enables users to build multiple types of secondary indexes on their data, maintain them through a multi-user concurrency model, and leverage them automatically-without any change to their application codefor query/workload acceleration. Many requirements of Hyperspace are based on feedback from several enterprise customers. We present the details of Hyperspace's underlying design, the userfacing APIs, its concurrency control protocol for index access, its index-aware query processing techniques, and its maintenance mechanisms for handling index updates. Evaluations over standard industry benchmarks and real customer workloads show that Hyperspace can accelerate query execution by up to 10x and in certain real-world workloads, even up to two orders of magnitude.