EMNLP2025
Improving Online Job Advertisement Analysis via Compositional Entity Extraction
Kai Krüger, Johanna Binnewitt, Kathrin Ehmann, Stefan Winnige, Alan Akbik
Abstract
We propose a compositional entity modeling framework for requirement extraction from Online Job Advertisements (OJAs). To more accurately capture the structure of requirements in OJAs, we reframe the task from identifying single-span annotations to modeling complex, tree-like structures that connect atomic entity types via typed relationships. Based on this schema, we introduce GOJA, a high-quality dataset of 500 German job ads. GOJA captures the internal semantics of job requirements, including roles, tools, experience levels, attitudes, and their functional context. We describe the annotation process, report strong inter-annotator agreement, and benchmark transformer models to demonstrate the feasibility of training on this structure. To illustrate the analytical potential of our approach, we present a focused case study on AI-related job requirements. We show how our proposed compositional representation enables new types of labor market analyses.