ICML2025

Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions

Fabiola Ricci, Lorenzo Bardone, Sebastian Goldt

Abstract

CONTENTS Problems Appendix proofs 9 ICA by Maximum Likelihood Estimation 9.1 The likelihood of the ICA model 9.1.1 Deriving the likelihood 9.1.2 Estimation of the densities 9.2 Algorithms for maximum likelihood estimation 9.2.1 Gradient algorithms 9.2.2 A fast fixed-point algorithm 9.3 The infomax principle 9.4 Examples 9.5 Concluding remarks and references Problems Appendix proofs 10 ICA by Minimization of Mutual Information 10.1 Defining ICA by mutual information 10.1.1 Information-theoretic concepts 10.1.2 Mutual information as measure of dependence 10.2 Mutual information and nongaussianity 10.3 Mutual information and likelihood 10.4 Algorithms for minimization of mutual information 10.5 Examples 10.6 Concluding remarks and references Problems 11 ICA by Tensorial Methods 11.1 Definition of cumulant tensor 11.2 Tensor eigenvalues give independent components 11.3 Tensor decomposition by a power method 11.4 Joint approximate diagonalization of eigenmatrices 11.5 Weighted correlation matrix approach 11.5.1 The FOBI algorithm 11.5.2 From FOBI to JADE 11.6 Concluding remarks and references Problems CONTENTS xi 12 ICA by Nonlinear Decorrelation and Nonlinear PCA 12.1 Nonlinear correlations and independence 12.2 The Hérault-Jutten algorithm 12.3 The Cichocki-Unbehauen algorithm 12.4 The estimating functions approach * 12.5 Equivariant adaptive separation via independence 12.6 Nonlinear principal components 12.7 The nonlinear PCA criterion and ICA 12.8 Learning rules for the nonlinear PCA criterion 12.8.1 The nonlinear subspace rule 12.8.2 Convergence of the nonlinear subspace rule * 12.8.3 Nonlinear recursive least-squares rule 12.9 Concluding remarks and references Problems 13 Practical Considerations 13.1 Preprocessing by time filtering 13.1.1 Why time filtering is possible 13.1.2 Low-pass filtering 13.1.3 High-pass filtering and innovations 13.1.4 Optimal filtering 13.2 Preprocessing by PCA 13.2.1 Making the mixing matrix square 13.2.2 Reducing noise and preventing overlearning 13.3 How many components should be estimated? 13.4 Choice of algorithm 13.5 Concluding remarks and references Problems 14 Overview and Comparison of Basic ICA Methods 14.1 Objective functions vs. algorithms 14.2 Connections between ICA estimation principles 14.2.1 Similarities between estimation principles 14.2.2 Differences between estimation principles 14.3 Statistically optimal nonlinearities 14.3.1 Comparison of asymptotic variance * 14.3.2 Comparison of robustness * 14.3.3 Practical choice of nonlinearity xii CONTENTS CONTENTS xiii