ICML2025
MissScore: High-Order Score Estimation in the Presence of Missing Data
Wenqin Liu, Haoze Hou, Erdun Gao, Biwei Huang, Qiuhong Ke, Howard D. Bondell, Mingming Gong
Abstract
The first order derivative (score) of data density, typically estimated via denoising score matching, has emerged as an effective tool for modeling data distribution and generating synthetic data. Extending this concept to higher-order scores could uncover more detailed local information of the data distribution, enabling new applications. However, learning these high-order scores usually requires complete data, which is often unavailable in real-world scenarios such as healthcare and finance due to privacy and cost constraints. In this work, we introduce MissScore, a novel score-based framework for learning high-order scores from observations with missing data. We derive objective functions for estimating high-order scores under different missing data mechanisms and propose a new algorithm to handle missing data effectively. Our empirical results demonstrate that MissScore efficiently and accurately approximates high-order scores with missing data, while enhancing sampling speed and data quality, as validated through several downstream tasks, including data generation and causal discovery. Under review as a conference paper at ICLR 2025 Theorem 3 If the missing mechanism of x is MCAR, with the missing probability of every element lying between 0 and 1, i.e., p(m i = 1) ∈ [0, 1) for all i ∈ 1, 2 . . . , d. We denote the objective J DSM (θ) = E x,m E x|x,m s 1 (x; θ) + 1 σ 2 (x -x) ⊙ (1 -m)