AAAI2026

SASST: Leveraging Syntax-Aware Chunking and LLMs for Simultaneous Speech Translation

Zeyu Yang, Lai Wei, Roman Koshkin, Xi Chen, Satoshi Nakamura

被引用 3 次

摘要

This work proposes a grammar-based chunking strategy that segments input streams into semantically complete units by parsing dependency relations (e.g., noun phrase boundaries, verb-object structures) and punctuation features. The method ensures chunk coherence and minimizes semantic fragmentation. Building on this mechanism, we present SASST (Syntax-Aware Simultaneous Speech Translation), an endto-end framework integrating frozen Whisper encoder and decoder-only LLM. The unified architecture dynamically outputs translation tokens or <WAIT> symbols to jointly optimize translation timing and content, with target-side reordering addressing word-order divergence. Experiments on CoV-oST2 multilingual corpus (En→De, Zh, Ja) demonstrate significant translation quality improvements across languages and validate the effectiveness of syntactic structures in LLMdriven SimulST systems.