ACL2021

IgSEG: Image-guided Story Ending Generation

Qingbao Huang, Chuan Huang, Linzhang Mo, Jielong Wei, Yi Cai, Ho-fung Leung, Qing Li

Abstract

In this work, we propose a new task called Image-guided Story Ending Generation (IgSEG). Given a multi-sentence story plot and an ending-related image, IgSEG aims to generate a story ending that conforms to the contextual logic and the relevant visual concepts. In contrast to the story ending generation task, which generates open-ended endings, the major challenges of IgSEG are to comprehend the given context and image sufficiently, and mine the appropriate semantics from the image to make the generated story ending informative, reasonable, and coherent. To address the challenges, we propose a Multi-layer Graph convolution and Cascade-LSTM (MGCL) based model which mainly comprises of two collaborative modules: i) a multi-layer graph convolutional network to learn the dependency relations of sentences and the logical clue of the context; ii) a multiple context-image attention module to generate the story endings by gradually incorporating textual and visual semantic concepts. Our MGCL is thus capable of building logically consistent and semantically rich story endings. To evaluate the proposed model, we modify the existing VIST dataset to obtain the VIST-Ending dataset. Empirically, our MGCL outperforms all the strong baselines on both automatic and human evaluation.