EMNLP2023

Spoiler Detection as Semantic Text Matching

Ryan Tran, Canwen Xu, Julian J. McAuley

被引用 2 次

摘要

Engaging with discussion of TV shows online often requires individuals to refrain from consuming show-related content for extended periods to avoid spoilers. While existing research on spoiler detection shows promising results in safeguarding viewers from general spoilers, it fails to address the issue of users abstaining from show-related content during their watch. This is primarily because the definition of a spoiler varies depending on the viewer's progress in the show, and conventional spoiler detection methods lack the granularity to capture this complexity. To tackle this challenge, we propose the task of spoiler matching, which involves assigning an episode number to a spoiler given a specific TV show. We frame this task as semantic text matching and introduce a dataset comprised of comments and episode summaries to evaluate model performance. Given the length of each example, our dataset can also serve as a benchmark for longrange language models. 1 2 * Equal contribution. 1 Code and model weights are publicly available at https: //github.com/bobotran/spoiler-matching 2 The data is available at https://huggingface.co/ datasets/bobotran/spoiler-matching ! Summary ". . . After thinking back to Yor's training, Anya uses her "killer move" and throws the ball at Bill. However, the ball hits the ground and bounces toward Bill, who throws the ball right back and hits her. Bill and his team were excited, thinking he was going to get a Stella Star. However, Henry informs them that they do not give out Stella Stars for a simple P.E. game..." Comments Relevant -Same episode: "Anya: 'Finisher strike: Star Catch Arrow!!' Ball: 'nah, i don't really feel like it' " Relevant -Different Episode: "The dog finally has a name. Borf!" Irrelevant: "This episode was fun. Just joy from start to end." Irrelevant: "Haven't been this hyped over a dodgeball game since Hunter x Hunter.