ACL2024

When is a Language Process a Language Model?

Li Du, Holden Lee, Jason Eisner, Ryan Cotterell

Abstract

A language model may be viewed as a Σvalued stochastic process for some alphabet Σ. However, in some pathological situations, such a stochastic process may "leak" probability mass onto the set of infinite strings and hence is not equivalent to the conventional view of a language model as a distribution over ordinary (finite) strings. Such ill-behaved language processes are referred to as non-tight in the literature. In this work, we study conditions of tightness through the lens of stochastic processes. In particular, by regarding the EOS symbol as marking a stopping time and using results from martingale theory, we give characterizations of tightness that generalize our previous work (Du et al., 2023) .