EMNLP2024
What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
Akshay Paruchuri, Jake Garrison, Shun Liao, John Hernandez, Jacob E. Sunshine, Tim Althoff, Xin Liu, Daniel McDuff
被引用 3 次
摘要
Language models (LM) are capable of remarkably complex linguistic tasks; however, numerical reasoning is an area in which they frequently struggle. An important but rarely evaluated form of reasoning is understanding probability distributions. In this paper, we focus on evaluating the probabilistic reasoning capabilities of LMs using idealized and real-world statistical distributions. We perform a systematic evaluation of state-of-the-art LMs on three tasks: estimating percentiles, drawing samples, and calculating probabilities. We evaluate three ways to provide context to LMs 1) anchoring examples from within a distribution or family of distributions, 2) real-world context, 3) summary statistics on which to base a Normal approximation. Models can make inferences about distributions, and can be further aided by the incorporation of real-world context, example shots and simplified assumptions, even if these assumptions are incorrect or misspecified. To conduct this work, we developed a comprehensive benchmark distribution dataset with associated question-answer pairs that we have released publicly. * Work completed during an internship at Google. ## Consider the following distribution: Type: Log-Normal Distribution Characteristics: This distribution models values that are the result of the multiplicative product of many independent random variables, such as income levels, stock prices, or city sizes. Log Mean (mu): 3.543 Log Sigma (sigma): 0.677 These parameters mean that the natural logarithm of the values follows a normal distribution with the speci�ed mean and standard deviation. Your task is to estimate the percentile of a given average exercise minutes count value for a population that regularly uses Fitbit devices and is active on a daily basis. The data is �ltered for individuals aged 18-65. The data is age-balanced and gender-balanced, and pe�ains to the U.S. population only. Consider the following parameters that describe a normal distribution: