KDD2020

Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

Xin Luna Dong, Hannaneh Hajishirzi, Colin Lockard, Prashant Shiralkar

被引用 8 次

摘要

How do we surface the large amount of information present in HTML documents on the Web, from news articles to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for information extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.