NeurIPS2024

Sample Complexity of Interventional Causal Representation Learning

Emre Acartürk, Burak Varici, Karthikeyan Shanmugam, Ali Tajer

摘要

Consider a data-generation process that transforms low-dimensional latent causally-related variables to high-dimensional observed variables. Causal representation learning (CRL) is the process of using the observed data to recover the latent causal variables and the causal structure among them. Despite the multitude of identifiability results under various interventional CRL settings, the existing guarantees apply exclusively to the infinite-sample regime (i.e., infinite observed samples). This paper establishes the first sample-complexity analysis for the finite-sample regime, in which the interactions between the number of observed samples and probabilistic guarantees on recovering the latent variables and structure are established. This paper focuses on general latent causal models, stochastic soft interventions, and a linear transformation from the latent to the observation space. The identifiability results ensure graph recovery up to ancestors and latent variables recovery up to mixing with parent variables. Specifically, O ((log 1 δ ) 4 ) samples suffice for latent graph recovery up to ancestors with probability 1 − δ , and O (( 1 ϵ log 1 δ ) 4 ) samples suffice for latent causal variables recovery that is ϵ close to the identifiability class with probability 1 − δ .