EMNLP2021

Flexible Generation of Natural Language Deductions

Kaj Bostrom, Xinyu Zhao, Swarat Chaudhuri, Greg Durrett

1 citation

Abstract

An interpretable system for open-domain reasoning needs to express its reasoning process in a transparent form. Natural language is an attractive representation for this purpose -it is both highly expressive and easy for humans to understand. However, manipulating natural language statements in logically consistent ways is hard: models must cope with variation in how meaning is expressed while remaining precise. In this paper, we describe PARAPATTERN, a method for building models to generate deductive inferences from diverse natural language inputs without direct human supervision. We train BART-based models (Lewis et al., 2020) to generate the result of applying a particular logical operation to one or more premise statements. Crucially, we develop a largely automated pipeline for constructing suitable training examples from Wikipedia. We evaluate our models using out-of-domain sentence compositions from the QASC (Khot et al., 2020) and EntailmentBank (Dalvi et al., 2021) datasets as well as targeted perturbation sets. Our results show that our models are substantially more accurate and flexible than baseline systems. PARAPATTERN achieves 85% validity on examples of the 'substitution' operation from EntailmentBank without the use of any in-domain training data, matching the performance of a model fine-tuned for EntailmentBank. The full source code for our method is publicly available. 1 . Substitution Premises: Staphylococcus epidermis is a microorganism. Microorganisms colonize the skin surface. Paraphrased: Staphylococcus epidermidis is a microorganism. Microbiological colonization of the skin surface. "Staphylococcus Epidermidis is a Microorganism." The skin surface is colonized by micro organisms. Conclusion: Staphylococcus epidermis colonizes the skin surface. Premises: During the undergraduate years, seminarians learn the ancient language courses. Latin is an ancient language course. Paraphrased: The seminars know the ancient language courses. Latin is an old language course. Seminarians learn ancient language during their undergraduate years. Latin is a language. Conclusion: During the undergraduate years, seminarians learn Latin. Contraposition Premise: As such, rivers that have headwaters in the mountains provide water for irrigation in the surrounding lands. Paraphrased: In order for water to be used in the surrounding lands, the rivers in the mountains must have their headwaters there. Conclusion: As such, rivers that do not provide water for irrigation in the surrounding lands do not have headwaters in the mountains. Premise: Dogs that are especially dirty or hungry are not able to participate in contests. Paraphrased: To participate in a contest, dogs that are dirty or hungry, must be turned away. Conclusion: Dogs that are able to participate in contests are not especially dirty or hungry. Substitution -Control Premises: RSA is a cryptographic system. Cryptographic systems let people exchange messages securely. Conclusion: RSA lets people exchange messages securely. Predicted: RSA lets people exchange messages securely. Link NP mismatch Premises: RSA is a cryptographic system. Encryption protocols let people exchange messages securely. Conclusion: RSA lets people exchange messages securely. Predicted: RSA allows people to exchange messages securely. Identity VP mismatch Premises: Dominant cryptographic systems include RSA. Cryptographic systems let people exchange messages securely. Conclusion: RSA lets people exchange messages securely. Predicted: RSA allows people to exchange messages securely. NP + VP mismatch Premises: Dominant encryption protocols include RSA. Cryptographic systems let people exchange messages securely. Conclusion: RSA lets people exchange messages securely. Predicted: RSA allows people to exchange messages securely. Number agreement Premises: RSA is a cryptographic system. Cryptographic systems shield web traffic from surveillance and let people communicate securely. Conclusion: RSA shields web traffic from surveillance and lets people communicate securely. Predicted: RSA shields web traffic from surveillance and let people communicate securely.