EMNLP2025

PoseStitch-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation

Abhinav Joshi, Vaibhav Sharma, Sanjeet Singh, Ashutosh Modi

摘要

Sign language translation remains a challenging task due to the scarcity of large-scale, sentence-aligned datasets. Prior arts have focused on various feature extraction and architectural changes to support neural machine translation for sign languages. We propose POSESTITCH-SLT, a novel pre-training scheme that is inspired by linguistic-templatesbased sentence generation technique. With translation comparison on two sign language datasets, How2Sign and iSign, we show that a simple transformer-based encoder-decoder architecture outperforms the prior art when considering template-generated sentence pairs in training. We achieve BLEU-4 score improvements from 1.97 to 4.56 on How2Sign and from 0.55 to 3.43 on iSign, surpassing prior state-ofthe-art methods for pose-based gloss-free translation. The results demonstrate the effectiveness of template-driven synthetic supervision in low-resource sign language settings. Keypoints Frames Face Hands Body Concatenated Pose Vector Sequence of Stitched Poses Transformer Architecture "I hope you're having fun." Stitched pose Video Keypoints Pose Stitching Pose Stitching (b) (c) (d) English Tokens Common Vocabulary (CISLR, BLIMP,WLASL) BLIMP Linguistic templates Generated Sentence (I hope you're having fun) (a)