ACL2025

Fairness Beyond Performance: Revealing Reliability Disparities Across Groups in Legal NLP

T. Y. S. S. Santosh, Irtiza Chowdhury

摘要

Fairness in NLP must extend beyond performance parity to encompass equitable reliability across groups. This study exposes a critical blind spot: models often make less reliable or overconfident predictions for marginalized groups, even when overall performance appears fair. Using the FairLex benchmark as a case study in legal NLP, we systematically evaluate both performance and reliability disparities across demographic, regional, and legal attributes spanning four jurisdictions. We show that domain-specific pre-training consistently improves both performance and reliability, especially for underrepresented groups. However, common bias mitigation methods frequently worsen reliability disparities, revealing a trade-off not captured by performance metrics alone. Our results call for a rethinking of fairness in high-stakes NLP: To ensure equitable treat-ment, models must not only be accurate, but also reliably self-aware across all groups.