ICLR2026

Counterfactual Explanations on Robust Perceptual Geodesics

Eslam Zaher, Dr Maciej Trzaskowski, Quan Nguyen, Fred Roosta

被引用 1 次

摘要

Latent-space optimization methods for counterfactual explanations—framed as minimal semantic perturbations that change model predictions—inherit the ambiguity of Wachter et al.’s objective: the choice of distance metric dictates whether perturbations are meaningful or adversarial. Existing approaches adopt flat or misaligned geometries, leading to off-manifold artifacts, semantic drift, or adversarial collapse. We introduce Perceptual Counterfactual Geodesics (PCG), a method that constructs counterfactuals by tracing geodesics under a perceptually Riemannian metric induced from robust vision features. This geometry aligns with human perception and penalizes brittle directions, enabling smooth, on-manifold, semantically valid transitions. Experiments on three vision datasets show that PCG outperforms baselines and reveals failure modes hidden under standard metrics.