KDD2022

BE3R: BERT based Early-Exit Using Expert Routing

Sourab Mangrulkar, Ankith M. S, Vivek Sembium

3 citations

Abstract

Pre-trained language models like BERT have reported state-of-the-art performance on several Natural Language Processing (NLP) tasks, but high computational demands hinder its widespread adoption for large scale NLP tasks. In this work, we propose a novel routing based early exit model called BE3R (BERT based Early-Exit using Expert Routing), where we learn to dynamically exit in the earlier layers without needing to traverse through the entire model. Unlike the exiting early-exit methods, our approach can be extended to a batch inference setting. We consider the specific application of search relevance filtering in Amazon India marketplace services (a large e-commerce website). Our experimental results show that BE3R improves the batch inference throughput by 46.5% over the BERT-Base model and 35.89% over the DistilBERT-Base model on large dataset with 50 Million samples without any trade-off on the performance metric. We conduct thorough experimentation using various architectural choices, loss functions and perform qualitative analysis. We perform experiments on public GLUE Benchmark and demonstrate comparable performance to corresponding baseline models with 23% average throughput improvement across tasks in batch inference setting.