NeurIPS2020

CogLTX: Applying BERT to Long Texts

Ming Ding, Chang Zhou, Hongxia Yang, Jie Tang

154 citations

Abstract

BERT is incapable of processing long texts due to its quadratically increasing memory and time consumption. The most natural ways to address this problem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufficient long-range attentions or need customized CUDA kernels. The maximum length limit in BERT reminds us the limited capacity (5∼ 9 chunks) of the working memory of humans --then how do human beings Cognize Long TeXts? Founded on the cognitive theory stemming from Baddeley [2], the proposed CogLTX 1 framework identifies key sentences by training a judge model, concatenates them for reasoning, and enables multi-step reasoning via rehearsal and decay. Since relevance annotations are usually unavailable, we propose to use interventions to create supervision. As a general algorithm, CogLTX outperforms or gets comparable results to SOTA models on various downstream tasks with memory overheads independent of the length of text. BERT (reasoner) [CLS] Q yes no [SEP] z Start/End Span Q: Who is the director of the 2003 film which has scenes in it filmed at the Quality Cafe in Los Angeles? Long text x: The Quality Cafe (aka. Quality Diner) is a now-defunct diner … as a location featured in a number of Hollywood films, including "Training Day", "Old School"… Old School is a 2003 American comedy film released by DreamWorks and directed by Todd Phillips MemRecall ([Q], x) BERT (reasoner) [CLS] [SEP] z MLP Long text x: LOS ANGELES --The pilot flying Kobe Bryant and seven others to a youth basketball tournament did not have alcohol or drugs in his system, and all nine sustained immediately fatal injuries when their helicopter slammed into a hillside outside Los Angeles in January, according to autopsies released Friday. … MemRecall ([], x) 0.6 0.3 … Probabilty for each class BERT (reasoner) [CLS] x[i] [SEP] z Long text x MemRecall ([x[i]], x) NN IN DT Token-wise result of x[i] … x[0]:Confidence in the pound is widely expected to take another sharp dive if trade figures for September, due for release tomorrow, x[1]:fail to show a substantial improvement from July and August's near-record deficits" is considered, … Decompose into sub-sequences x[0]… x[n]