EMNLP2025

Analyzing and Modeling LLM Response Lengths with Extreme Value Theory: Anchoring Effects and Hybrid Distributions

Liuxuan Jiao, Chen Gao, Yiqian Yang, Chenliang Zhou, YiXian Huang, Xinlei Chen, Yong Li

摘要

Accurate modeling and control of response length is essential for optimizing large language model (LLM) deployment, impacting computational efficiency, user experience, and system reliability. We develop a statistical framework based on extreme value theory, analyzing 14,301 GPT-4o responses across temperature settings and prompting strategies, with cross-validation on Qwen and DeepSeek architectures. Our analysis reveals that response lengths follow Weibull-type generalized extreme value (GEV) distributions, exhibiting heavier tails under stochastic generation conditions. The key contributions include: (1) a novel GEV-generalized Pareto (GPD) hybrid model that achieves superior tail fit (R 2 CDF = 0.9993 vs standalone GEV's 0.998) while preserving architectural generalizability; (2) quantitative characterization of prompt anchoring effects, showing reduced dispersion but increased outlier propensity under randomization; and (3) identification of temperaturedependent response patterns that remain consistent across architectures, where higher temperatures amplify length variability while maintaining the underlying extreme-value mechanisms. The proposed hybrid model's adaptive threshold selection enables precise verbosity control in production systems, regardless of the specific LLM architecture employed. These findings provide both theoretical insights into LLM generation patterns and practical tools for response length optimization.