ICLR2025

How efficient is LLM-generated code? A rigorous & high-standard benchmark

Ruizhong Qiu, Weiliang Will Zeng, James Ezick, Christopher Lott, Hanghang Tong

Abstract

Example problem (from [1] ): computing the 𝑛-th Fibonacci number. • While all correct, the three implementations above have different efficiencies. • Efficiency is crucial in real-world applications but is largely overlooked in existing benchmarks. • How does LLM-generated code compare with expert-written code in terms of efficiency? [1] Chen et al. Evaluating large language models trained on code. arXiv: 2107.03374, 2021.