NeurIPS2022
A Survey and Datasheet Repository of Publicly Available US Criminal Justice Datasets
Miri Zilka, Bradley Butcher, Adrian Weller
被引用 11 次
摘要
Predictive tools are becoming widely used in police, courts, and prison systems worldwide. Criminal justice is thus an increasingly important application domain for machine learning and algorithmic fairness. A few benchmark datasets have received significant attention—e.g., COMPAS [1]—but often without proper consideration of the domain context [2]. We conduct a survey of publicly available criminal justice datasets, highlight their potential uses, discuss context, and identify limitations and gaps in the current landscape. We provide datasheets [3] for 15 datasets, and make them available via a public repository. We compare the surveyed datasets across several dimensions, including size, population coverage, and potential use, highlighting possible concerns. We hope this work provides a useful starting point for researchers looking for appropriate datasets related to criminal justice, and wish to further grow the repository in a broader community effort.