VLDB2021

AutoExecutor: Predictive Parallelism for Spark SQL Queries

Rathijit Sen, Abhishek Roy, Alekh Jindal, Rui Fang, Jeff Zheng, Xiaolei Liu, Ruiping Li

11 citations

Abstract

Right-sizing resources for query execution is important for cost-efficient performance, but estimating how performance is affected by resource allocations, upfront, before query execution is difficult. We demonstrate AutoExecutor , a predictive system that uses machine learning models to predict query run times as a function of the number of allocated executors, that limits the maximum allowed parallelism, for Spark SQL queries running on Azure Synapse.