ASE2025

Exploring Static Taint Analysis in LLMs: A Dynamic Benchmarking Framework for Measurement and Enhancement

Haoran Zhao, Lei Zhang, Keke Lian, Fute Sun, Bofei Chen, Yongheng Liu, Zhiyu Wu, Yuan Zhang, Min Yang

Abstract

LLMs offer a promising avenue to overcome the limitations of traditional taint analysis techniques, with a growing number of studies leveraging LLMs for taint analysis and its downstream applications. However, these studies lack a systematic understanding of LLMs’ taint analysis capabilities, limiting their transferability and reliability. To bridge this gap and better apply LLMs to static taint analysis, we aim to comprehensively measure and understand LLMs’ taint analysis capabilities.Using existing benchmarks is a straightforward approach, but they are unsuitable due to issues such as training data leakage, not accounting for LLMs’ features, and improper assessment criteria. Manually constructing new benchmarks is not only labor-intensive but also struggles to remain effective as LLMs evolve. To address these, we propose LLMCapLens, a dynamic benchmark generation framework to systematically measure and enhance LLMs’ capabilities. LLMCapLens models influencing factors of LLMs’ taint analysis capabilities, employing a Basic Unit-Based generation method and a lightweight dynamic taint analysis-based verification method to implement the automated generation of targeted benchmarks, ensuring both diversity and correctness. Furthermore, LLMCapLens proposes a measurement-driven, training-free, model-specific enhancement approach.We apply LLMCapLens to 10 mainstream LLMs, revealing how they perform under various influencing factors and identifying unique characteristics, such as the underlying error causes for each model. Notably, our enhancement approach significantly improves LLM performance—GPT-4 Turbo, for instance, achieved improvements across 16 out of 19 factors, with an average True Negative Rate increase of 21.29%. Finally, we validate the real-world impact of our method by applying enhanced LLMs to vulnerability detection, demonstrating a substantial improvement over prior approaches.