NeurIPS2023
SituatedGen: Incorporating Geographical and Temporal Contexts into Generative Commonsense Reasoning
Yunxiang Zhang, Xiaojun Wan
10 citations
Abstract
Recently, commonsense reasoning in text generation has attracted much attention. Generative commonsense reasoning is the task that requires machines, given a group of keywords, to compose a single coherent sentence with commonsense plausibility. While existing datasets targeting generative commonsense reasoning focus on everyday scenarios, it is unclear how well machines reason under specific geographical and temporal contexts. We formalize this challenging task as SITUATEDGEN, where machines with commonsense should generate a pair of contrastive sentences given a group of keywords including geographical or temporal entities. We introduce a corresponding English dataset consisting of 8,268 contrastive sentence pairs, which are built upon several existing commonsense reasoning benchmarks with minimal manual labor. Experiments show that state-of-the-art generative language models struggle to generate sentences with commonsense plausibility and still lag far behind human performance. Our dataset is publicly available at https://github.com/yunx-z/situated_gen . Introduction In recent years, there has been substantial growth in new benchmarks evaluating commonsense reasoning for natural language processing (NLP) models, especially large-scale Pretrained Language Models (PLMs). Most existing commonsense reasoning benchmarks adopt natural language understanding formats due to easy evaluation (e.g., accuracy), including multiple-choice question answering [44, 41, 20, 24] , natural language inference [4], and detecting true/false statements [33, 43] . However, datasets measuring commonsense knowledge in natural language generation are still relatively scarce. We aim to fill this research gap with a novel benchmark since real-world users of NLP systems would expect the generated outputs from LMs to be not only grammatically correct but also adhere to commonsense knowledge. COMMONGEN [25], a generative commonsense reasoning challenge, has attracted wide attention recently. Given a set of keywords (e.g., dog, frisbee, catch, throw), the task requires models to compose a plausible sentence describing everyday scenario using all the provided keywords (e.g., "The dog catches the frisbee when the boy throws it."). While COMMONGEN focuses on social and physical commonsense in everyday life, it is unclear how well current commonsense generation models reason with factual knowledge about specific entities, which is referred to as entity commonsense [33] . In this work, we mainly consider geographical and temporal entities, as they provide extra-linguistic contexts [52] for commonsense reasoning and appear in a significant proportion of existing commonsense benchmarks (Section 4.2). To the best of our knowledge, we are the first to incorporate these situations into generative commonsense reasoning. 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks.