KDD2023

A Sequence-to-Sequence Approach with Mixed Pointers to Topic Segmentation and Segment Labeling

Jinxiong Xia, Houfeng Wang

3 citations

Abstract

Topic segmentation is the process of dividing a text into semantically coherent segments, and segment labeling involves assigning a topic label to each of these segments. Previous work on this task has included the use of sequence labeling, segment-extraction, and generative models. While these methods have yielded impressive results, existing generative models have struggled to accurately generate strings of segment boundaries, limiting their competitiveness in this area. In this paper, we present a novel Sequence-to-Sequence approach with Mixed Pointers (Seq2Seq-MP). Seq2Seq-MP employs an encoder-decoder architecture with the pointer mechanism to generate both segment boundaries and topics, which allows for a more robust performance than string-generation models and can handle long-range dependencies better than sequence labeling and segment-extraction models. Additionally, we introduce the pairwise type encoding and type-aware relative position encoding to improve the fusion of type and position information, enhancing the interactions between sentences and topics in the encoder and decoder. Our experiments on public datasets show that Seq2Seq-MP outperforms the current state-of-the-art, with up to 2.9% and 4.0% improvements in Pk and F1, respectively.