ACL2024

MODDP: A Multi-modal Open-domain Chinese Dataset for Dialogue Discourse Parsing

Chen Gong, Dexin Kong, Suxian Zhao, Xingyu Li, Guohong Fu

被引用 1 次

摘要

Dialogue discourse parsing (DDP) aims to capture the relations between utterances in the dialogue. In everyday real-world scenarios, dialogues are typically multi-modal and cover open-domain topics. However, most existing widely used benchmark datasets for DDP contain only textual modality and are domainspecific. This makes it challenging to accurately and comprehensively understand the dialogue without multi-modal clues, and prevents them from capturing the discourse structures of the more prevalent daily conversations. This paper proposes MODDP, the first multi-modal Chinese discourse parsing dataset derived from open-domain daily dialogues, consisting 864 dialogues and 18,114 utterances, accompanied by 12.7 hours of video clips. We present a simple yet effective benchmark approach for multi-modal DDP. Through extensive experiments, we present several benchmark results based on MODDP. The significant improvement in performance from introducing multimodalities into the original textual unimodal DDP model demonstrates the necessity of integrating multi-modalities into DDP.