EMNLP2025

Can LLMs be Literary Companions?: Analysing LLMs on Bengali Figures of Speech Identification

Sourav Das, Kripabandhu Ghosh

Abstract

Despite Bengali being among the most spoken languages bearing cultural importance and richness, the NLP endeavors on it, remain relatively limited. Figures of speech (FoS) not only contribute to the phonetic and semantic nuances of a language, but they also exhibit aesthetics, expression, and creativity in literature. To our knowledge, in this paper, we present the first ever Bengali figures of speech classification dataset, BengFoS, on works of six renowned poets of Bengali literature. We deploy state-of-the-art (SoTA) models to this dataset, improve them, and finally dissect them, revealing novel insights on the intrinsic behavior of two open-source LLMs (Llama and DeepSeek) in FoS detection. Though we focused on Bengali, the experimental framework can be reproduced for English as well as for other low-resource languages. 1