ICLR2025

Can LLMs Understand Time Series Anomalies?

Zihao Zhou, Rose Yu

摘要

Large Language Models (LLMs) have gained popularity in time series forecasting, but their potential for anomaly detection remains largely unexplored. Our study investigates whether LLMs can understand and detect anomalies in time series data, focusing on zero-shot and few-shot scenarios. Inspired by conjectures about LLMs' behavior from time series forecasting research, we formulate key hypotheses about LLMs' capabilities in time series anomaly detection. We design and conduct principled experiments to test each of these hypotheses. Our investigation reveals several surprising findings about LLMs for time series: (1) LLMs understand time series better as images rather than as text, (2) LLMs do not demonstrate enhanced performance when prompted to engage in explicit reasoning about time series analysis. (3) Contrary to common beliefs, LLMs' understanding of time series does not stem from their repetition biases or arithmetic abilities. (4) LLMs' behaviors and performance in time series analysis vary significantly across different models. This study provides the first comprehensive analysis of contemporary LLM capabilities in time series anomaly detection. Our results suggest that while LLMs can understand trivial time series anomalies, we have no evidence that they can understand more subtle real-world anomalies. Many common conjectures based on their reasoning capabilities do not hold. All synthetic dataset generators, final prompts, and evaluation scripts have been made available in https://github.com/rose-stl-lab/anomllm . Published as a conference paper at ICLR 2025 claims about LLMs' time series understanding. Our findings reveal a more nuanced understanding of LLMs' capabilities and limitations in time series data, including: • Visual Advantage: LLMs perform significantly better in anomaly detection when processing time series data as images rather than text tokens. • Limited Reasoning: LLMs do not benefit from chain-of-thought reasoning when analyzing time series data. Their performance often decreases when prompted to explain their reasoning. • Alternative Processing Mechanisms: Contrary to common beliefs, LLMs' understanding of time series does not stem from their repetition biases or arithmetic abilities, challenging prevailing assumptions about how these models process numerical data. • Model Heterogeneity: Time series understanding and anomaly detection capabilities differ across various LLM architectures, highlighting the importance of model selection. In summary, while LLMs can detect simple anomalies, their abilities to understand and reason about numerical time series are considerably limited. Our research calls for more caution in applying LLMs to time series and for controlled studies to examine LLMs' behavior for other data modalities. RELATED WORK LLMs for Time Series Analysis. LLMs have been applied to various time series analysis tasks, with recent literature establishing strong claims about their capabilities. Gruver et al. (2023) showed that LLMs such as GPT-3 and LLaMA-2 possess broad pattern extrapolation capabilities, enabling zero-shot time series forecasting by encoding time series as strings of numerical digits and achieving comparable performance to purpose-built models. Building on these claims, Liu et al. (2024c) proposed a Cross-Modal LLM Fine-Tuning framework, suggesting that LLMs can provide interpretable predictions while addressing the distribution discrepancy between textual and temporal input tokens in multivariate time series forecasting. Liu et al. (2024b) introduced Time-MMD, a multi-domain, multimodal time series dataset for LLM finetuning. For anomaly detection, Liu et al. (2024a) proposed AnomalyLLM, a knowledge distillation-based approach using GPT-2 as the teacher network. Zhang et al. (2024) provided a comprehensive survey of LLM applications in time series analysis. In the financial domain, Wimmer & Rekabsaz (2023) used vision language models, i.e., CLIP, but not M-LLMs to process visualizations of stock data for market change prediction. However, these prevailing beliefs about LLMs' pattern extrapolation capabilities and interpretable predictions remain controversial. Zeng et al. (2023) argued that the permutation-invariant nature of self-attention mechanisms may lead to loss of critical temporal information. Tan et al. (2024) found that removing the LLM component or replacing it with a basic attention layer often improved performance in popular LLM-based forecasting methods, challenging the assumed benefits of LLMs' pattern recognition abilities. The interpretability of LLMs in time series analysis remains a challenge, as their reasoning capabilities are often opaque and difficult to interpret. Time Series Anomaly Detection. Time series anomaly detection is a critical task in various domains, including finance, healthcare, and cybersecurity. Traditional methods rely on statistical techniques, while recent work has focused on developing deep learning-based a