AAAI2023
Med-EASi: Finely Annotated Dataset and Models for Controllable Simplification of Medical Texts
Chandrayee Basu, Rosni Vasu, Michihiro Yasunaga, Qian Yang
29 citations
Abstract
Automatic medical text simplification can assist providers with patient-friendly communication and make medical texts more accessible, thereby improving health literacy. But curating a quality corpus for this task requires the supervision of medical experts. In this work, we present Med-EASi (Medical dataset for Elaborative and Abstractive Simplification), a uniquely crowdsourced and finely annotated dataset for supervised simplification of short medical texts. Its expert-layman-AI collaborative annotations facilitate controllability over text simplification by marking four kinds of textual transformations: elaboration, replacement, deletion, and insertion. To learn medical text simplification, we fine-tune T5-large with four different styles of input-output combinations, leading to two control-free and two controllable versions of the model. We add two types of controllability into text simplification, by using a multi-angle training approach: position-aware, which uses in-place annotated inputs and outputs, and position-agnostic, where the model only knows the contents to be edited, but not their positions. Our results show that our fine-grained annotations improve learning compared to the unannotated baseline. Furthermore, our position-aware control enhances the model's ability to generate better simplification than the position-agnostic version. The data and code are available at https://github.com/Chandrayee/CTRL-SIMP.