NeurIPS2023

Two-Stage Learning to Defer with Multiple Experts

Anqi Mao, Christopher Mohri, Mehryar Mohri, Yutao Zhong

72 citations

Abstract

We study a two-stage scenario for learning to defer with multiple experts, which is crucial in practice for many applications. In this scenario, a predictor is derived in a first stage by training with a common loss function such as cross-entropy. In the second stage, a deferral function is learned to assign the most suitable expert to each input. We design a new family of surrogate loss functions for this scenario both in the score-based and the predictor-rejector settings and prove that they are supported by H-consistency bounds, which implies their Bayes-consistency. Moreover, we show that, for a constant cost function, our two-stage surrogate losses are realizable H-consistent. While the main focus of this work is a theoretical analysis, we also report the results of several experiments on CIFAR-10 and SVHN datasets. L hp exp 1 hp(x)=y ∑ ne i=1 e h(x,n+i)-maxy∈Y h(x,y) + ∑ ne j=1 c j (x, y) ∑ ne i=1,i≠j e h(x,n+i)-h(x,n+j) + e maxy∈Y h(x,y)-h(x,n+j) log -1 hp(x)=y log e max y∈Y h(x,y) e max y∈Y h(x,y) +∑ ne i=1 e h(x,n+i) -∑ ne j=1 c j (x, y) log e h(x,n+j) e max y∈Y h(x,y) +∑ ne i=1 e h(x,n+i) gce 1 hp(x)=y 1 α 1 -e max y∈Y h(x,y) e max y∈Y h(x,y) +∑ ne i=1 e h(x,n+i) α + ∑ ne j=1 c j (x, y) 1 α 1 -e h(x,n+j) e max y∈Y h(x,y) +∑ ne i=1 e h(x,n+i) α mae 1 hp(x)=y 1 -e max y∈Y h(x,y) e max y∈Y h(x,y) +∑ ne i=1 e h(x,n+i) + ∑ ne j=1 c j (x, y) 1 -e h(x,n+j) e max y∈Y h(x,y) +∑ ne i=1 e h( x,n+i)