EMNLP2025
Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs
Dawid Jan Kopiczko, Tijmen Blankevoort, Yuki M. Asano
Abstract
Decoder-only large language models typically rely solely on masked causal attention, which limits their expressiveness by restricting information flow to one direction. We propose Bitune, a method that enhances pretrained decoder-only LLMs by incorporating bidirectional attention into prompt processing. We evaluate Bitune in instruction-tuning and question-answering settings, showing significant improvements in performance on commonsense reasoning, arithmetic, and language understanding tasks. Furthermore, extensive ablation studies validate the role of each component of the method, and demonstrate that Bitune is compatible with various parameterefficient finetuning techniques and full model finetuning.