NeurIPS2020
CogLTX: Applying BERT to Long Texts
Ming Ding, Chang Zhou, Hongxia Yang, Jie Tang
被引用 154 次
摘要
BERT is incapable of processing long texts due to its quadratically increasing memory and time consumption. The most natural ways to address this problem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufficient long-range attentions or need customized CUDA kernels. The maximum length limit in BERT reminds us the limited capacity (5∼ 9 chunks) of the working memory of humans --then how do human beings Cognize Long TeXts? Founded on the cognitive theory stemming from Baddeley [2], the proposed CogLTX 1 framework identifies key sentences by training a judge model, concatenates them for reasoning, and enables multi-step reasoning via rehearsal and decay. Since relevance annotations are usually unavailable, we propose to use interventions to create supervision. As a general algorithm, CogLTX outperforms or gets comparable results to SOTA models on various downstream tasks with memory overheads independent of the length of text. BERT (reasoner) [CLS] Q yes no [SEP] z Start/End Span Q: Who is the director of the 2003 film which has scenes in it filmed at the Quality Cafe in Los Angeles? Long text x: The Quality Cafe (aka. Quality Diner) is a now-defunct diner … as a location featured in a number of Hollywood films, including "Training Day", "Old School"… Old School is a 2003 American comedy film released by DreamWorks and directed by Todd Phillips MemRecall ([Q], x) BERT (reasoner) [CLS] [SEP] z MLP Long text x: LOS ANGELES --The pilot flying Kobe Bryant and seven others to a youth basketball tournament did not have alcohol or drugs in his system, and all nine sustained immediately fatal injuries when their helicopter slammed into a hillside outside Los Angeles in January, according to autopsies released Friday. … MemRecall ([], x) 0.6 0.3 … Probabilty for each class BERT (reasoner) [CLS] x[i] [SEP] z Long text x MemRecall ([x[i]], x) NN IN DT Token-wise result of x[i] … x[0]:Confidence in the pound is widely expected to take another sharp dive if trade figures for September, due for release tomorrow, x[1]:fail to show a substantial improvement from July and August's near-record deficits" is considered, … Decompose into sub-sequences x[0]… x[n]