EMNLP2021

Language Models are Few-Shot Butlers

Vincent Micheli, François Fleuret

2 citations

Abstract

Pretrained language models demonstrate strong performance in most NLP tasks when fine-tuned on small task-specific datasets. Hence, these autoregressive models constitute ideal agents to operate in text-based environments where language understanding and generative capabilities are essential. Nonetheless, collecting expert demonstrations in such environments is a time-consuming endeavour. We introduce a two-stage procedure to learn from a small set of demonstrations and further improve by interacting with an environment. We show that language models fine-tuned with only 1.2% of the expert demonstrations and a simple reinforcement learning algorithm achieve a 51% absolute improvement in success rate over existing methods in the ALFWorld environment. Goal: Rinse the egg to put it in the microwave. Obs: Looking quickly around you, you see a cabinet, a garbagecan, a coffeemachine, [...], a stoveburner, a sinkbasin and a microwave. Action: go to sinkbasin Obs: You arrive at sinkbasin. You see a butterknife, a potato, a spoon and a tomato. Action: go to garbagecan Obs: You arrive at garbagecan. You see an egg. Action: take egg from garbagecan Obs: You pick up the egg from the garbagecan. Action: go to sinkbasin Obs: You arrive at sinkbasin. You see a butterknife, a potato, a spoon and a tomato. Action: clean egg with sinkbasin Obs: You clean the egg using the sinkbasin. Action: go to microwave Obs: You arrive at microwave. The microwave is closed. Action: open microwave Obs: You open the microwave.