ICLR2026

Conditional Independent Component Analysis for Estimating Causal Structure with Latent Variables

Yewei Xia, Zhengming Chen, Haoyue Dai, Fuhong Wang, Yixin Ren, Yiqing Li, Kun Zhang, Shuigeng Zhou

摘要

Independent Component Analysis (ICA) aims to recover independent latent variables from observed mixtures thereof. Causal Representation Learning (CRL) aims instead to infer causally related (thus often statistically dependent) latent variables, together with the unknown graph encoding their causal relationships. We introduce an intermediate problem termed Causal Component Analysis (CauCA). CauCA can be viewed as a generalization of ICA, modelling the causal dependence among the latent components, and as a special case of CRL. In contrast to CRL, it presupposes knowledge of the causal graph, focusing solely on learning the unmixing function and the causal mechanisms. Any impossibility results regarding the recovery of the ground truth in CauCA also apply for CRL, while possibility results may serve as a stepping stone for extensions to CRL. We characterize CauCA identifiability from multiple datasets generated through different types of interventions on the latent causal variables. As a corollary, this interventional perspective also leads to new identifiability results for nonlinear ICA-a special case of CauCA with an empty graph-requiring strictly fewer datasets than previous results. We introduce a likelihood-based approach using normalizing flows to estimate both the unmixing function and the causal mechanisms, and demonstrate its effectiveness through extensive synthetic experiments in the CauCA and ICA setting. * Shared last author. Code available at https://github.com/akekic/causal-component-analysis . 37th Conference on Neural Information Processing Systems (NeurIPS 2023). • We derive sufficient and necessary conditions for identifiability of CauCA from different types of interventions (Thm. 4.2, Prop. 4.3, Thm. 4.5). • We prove additional results for the special case with an empty graph, which corresponds to a novel ICA model with interventions on the latent variables (Prop. 4.6, Prop. 4.7, Corollary 4.8, Prop. 4.9). • We show in synthetic experiments in both the CauCA and ICA settings that our normalizing flow-based estimation procedure effectively recovers the latent causal components ( § 5). Preliminaries Notation. We use P to denote a probability distribution, with density function p. Uppercase letters X, Y, Z denote unidimensional and bold uppercase X, Y, Z denote multidimensional random variables. Lowercase letters x, y, z denote scalars in R and x, y, z denote vectors in R d . We use i, j to denote the integers from i to j, and [d] denotes the natural numbers from 1 to d. We use common graphical notation, see App. A for details. The ancestors of i in a graph are the nodes j in G such that there is a directed path from j to i, and they are denoted by anc(i). The closure of the parents (resp. ancestors) of i is defined as pa(i) := pa(i) ∪ i (resp. anc(i) := anc(i) ∪ i). A key definition connecting directed acyclic graphs (DAGs) and probabilistic models is the following. Definition 2.1 (Distribution Markov relative to a DAG [41]). A joint probability distribution P is Markov relative to a DAG G if it admits the factorization P(Z 1 , . . . , Z d ) = d i=1 P i (Z i |Z pa(i) ). Defn. 2.1 is a key assumption in directed graphical models, where a distribution being Markovian relative to a graph implies that the graph encodes specific independences within the distribution, which can be exploited for efficient computation or data storage [43, §6.5]. Causal Bayesian networks and interventions. Causal systems induce multiple distributions corresponding to different interventions. Causal Bayesian networks [CBNs; 41] can be used to represent how these interventional distributions are related. In a CBN with associated graph G, arrows signify causal links among variables, and the conditional probabilities P i Z i | Z pa(i) in the corresponding Markov factorization are called causal mechanisms. 2 where P 0 is the unintervened, or observational, distribution, and P k are interventional distributions. Remark 2.3. The joint probabilities P k in (1) are uniquely factorized into causal mechanisms according to G. We therefore use the equivalent notation (G, (P k , τ k ) k∈ 0,K ), where P k is defined as in (1). Problem Setting The main object of our study is a latent variable model termed latent causal Bayesian network (CBN). Definition 3.1 (Latent CBN). A latent CBN is a tuple (G, f , (P k , τ k ) k∈ 0,K ), where f : R d → R d is a diffeomorphism (i.e. invertible with both f and f -1 differentiable).