EMNLP2025

Easy as PIE? Identifying Multi-Word Expressions with LLMs

Kai Golan Hashiloni, Ofri Hefetz, Kfir Bar

摘要

We investigate the identification of idiomatic expressions-a semantically noncompositional subclass of multiword expressions (MWEs)-in running text using large language models (LLMs) without any fine-tuning. Instead, we adopt a prompt-based approach and evaluate a range of prompting strategies, including zero-shot, few-shot, and chain-of-thought variants, across multiple languages, datasets, and model types. Our experiments show that, with well-crafted prompts, LLMs can perform competitively with supervised models trained on annotated data. These findings highlight the potential of prompt-based LLMs as a flexible and effective alternative for idiomatic expression identification.