ASE2021

VizSmith: Automated Visualization Synthesis by Mining Data-Science Notebooks

Rohan Bavishi, Shadaj Laddad, Hiroaki Yoshida, Mukul R. Prasad, Koushik Sen

7 citations

Abstract

Visualizations are widely used to communicate findings and make data-driven decisions. Unfortunately creating bespoke and reproducible visualizations requires the use of procedural tools such as matplotlib. These tools present a steep learning curve as their documentation often lacks sufficient usage examples to help beginners get started or accomplish a specific task. Forums such as StackOverflow have long helped developers search for code online and adapt it for their use. However, developers still have to sift through search results and understand the code before adapting it for their use. We built a tool called VIZSMITH which enables code reuse for visualizations by mining visualization code from Kaggle notebooks and creating a database of 7176 reusable Python functions. Given a dataset, columns to visualize and a text query from the user, VIZSMITH searches this database for appropriate functions, runs them and displays the generated visualizations to the user. At the core of VIZSMITH is a novel metamorphic testing based approach to automatically assess the reusability of functions, which improves end-to-end synthesis performance by 10% and cuts the number of execution failures by 50%. def visualization(df, col1, col2): import pandas as pd import matplotlib.pyplot as plt import seaborn as sns sns.set() tab = pd.crosstab(df0[col1], df0[col2]) tab.div(tab.sum(1).astype(float), axis=0) .plot(kind="bar", stacked=True) plt.ylabel('Text-1') plt.title('Text-2') visualization(df=df, col1='Operator', col2='Call Drop Category') A B C Create new cell with code Full Screen View