ACL2025

NBDESCRIB: A Dataset for Text Description Generation from Tables and Code in Jupyter Notebooks with Guidelines

Xuye Liu, Tengfei Ma, Yimu Wang, Fengjie Wang, Jian Zhao

被引用 1 次

摘要

Generating cell-level descriptions for Jupyter Notebooks, which is a major resource consisting of codes, tables, and descriptions, has been attracting increasing research attention. However, existing methods for Jupyter Notebooks mostly focus on generating descriptions from code snippets or table outputs independently. On the other side, descriptions should be personalized as users have different purposes in different scenarios while previous work ignored this situation during description generation. In this work, we formulate a new task, personalized description generation with code, tables, and user-written guidelines in Jupyter Notebooks. To evaluate this new task, we collect and propose a benchmark, namely NBDESCRIB, containing code, tables, and user-written guidelines as inputs and personalized descriptions as targets. Extensive experiments show that while existing models of text generation are able to generate fluent and readable descriptions, they still struggle to produce factually correct descriptions without user-written guidelines. CodeT5 achieved the highest scores in Orientation (1.27) and Correctness (-0.43) among foundation models in human evaluation, while the ground truth scored higher in Orientation (1.45) and Correctness (1.19). Common error patterns involve misalignment with guidelines, incorrect variable values, omission of important code information, and reasoning errors. Moreover, ablation studies show that adding guidelines significantly enhances performance. both qualitatively and quantitatively. Table passengerid pclass sex age fare cabin embarked y_pred y_scores 0 892 3 male 34.5 7.8292 nan q 0 0.108682 1 893 3 female 47.0 7.0000 nan s 1 0.516569 2 894 2 male 62.0 9.6875 nan q 0 0.130859 3 895 3 male 27.0 8.6625 nan s 0 0.116366 4 896 3 female 22.0 12.2875 nan s 1 0.553491 Guideline Description Value The 'y_pred' column in the output table represents the binary prediction (0 or 1) generated by the 'model_consumer' for each passenger，such as passenger 893's prediction being 1 (indicating survival) Goal Builds a prediction pipeline and and applies it to a 'df_test' dataset,generating predictions and scores for further analysis Association It is clear that fare has nothing to do with age in the titanic dataset Outlier Cabin feature has some missing values in the test set Aggregation There are two classes for Embarked feature Reason Analyze the data like passenger class, gender, age, and fare to generate predictions and scores with prediction pipeline Code Only Table Only It sets up a machine learning prediction pipeline with initial preprocessing and model consumption stages and applies it to test data from a CSV file The table lists passenger data with features like class, sex, age, fare, and predictions with scores for a Titanic dataset.