ICLR2025

Optimal Learning of Kernel Logistic Regression for Complex Classification Scenarios

Hongwei Wen, Annika Betken, Hanyuan Hang

Abstract

Problem Setting • Standard Classification Scenario: we observe i.i.d. data D ≔ (𝑋 𝑖 , 𝑌 𝑖 ) 𝑖=1 𝑛 drawn from an unknown distribution P, where 𝑋 𝑖 denotes the input and 𝑌 𝑖 represents the output. The goal is to predict the output Y for the input X. • Complex Classification Scenarios: labeled samples 𝐷 𝑝 ≔ (𝑋 𝑖 , 𝑌 𝑖 ) 𝑖=1 𝑛 𝑝 drawn from a distribution P, while inference is required for a different distribution Q on the same space. • Label shift assumption: Two distributions P and Q share the same conditional probability but has different class probabilities, i.e., p(x|y) = q(x|y) but p(y) ≠ q(y). • Goals: To estimate the class conditional probability (CCP) estimator , where the class probability ratio 𝑤 * ≔ (𝑤 𝑦 * ) 𝑦∈[𝐾] between q(y) and p(y) is given by 𝑤 𝑦 * := q(y)/p(y), y ∈ [K]. We then induce the plug-in classifier defined as 𝑎𝑟𝑔𝑚𝑎𝑥 𝑘∈[𝐾] ො 𝑞(𝑦|𝑥).