ICLR2025

Optimal Learning of Kernel Logistic Regression for Complex Classification Scenarios

Hongwei Wen, Annika Betken, Hanyuan Hang

Abstract

Problem Setting โ€ข Standard Classification Scenario: we observe i.i.d. data D โ‰” (๐‘‹ ๐‘– , ๐‘Œ ๐‘– ) ๐‘–=1 ๐‘› drawn from an unknown distribution P, where ๐‘‹ ๐‘– denotes the input and ๐‘Œ ๐‘– represents the output. The goal is to predict the output Y for the input X. โ€ข Complex Classification Scenarios: labeled samples ๐ท ๐‘ โ‰” (๐‘‹ ๐‘– , ๐‘Œ ๐‘– ) ๐‘–=1 ๐‘› ๐‘ drawn from a distribution P, while inference is required for a different distribution Q on the same space. โ€ข Label shift assumption: Two distributions P and Q share the same conditional probability but has different class probabilities, i.e., p(x|y) = q(x|y) but p(y) โ‰  q(y). โ€ข Goals: To estimate the class conditional probability (CCP) estimator , where the class probability ratio ๐‘ค * โ‰” (๐‘ค ๐‘ฆ * ) ๐‘ฆโˆˆ[๐พ] between q(y) and p(y) is given by ๐‘ค ๐‘ฆ * := q(y)/p(y), y โˆˆ [K]. We then induce the plug-in classifier defined as ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ ๐‘˜โˆˆ[๐พ] เทœ ๐‘ž(๐‘ฆ|๐‘ฅ).