NeurIPS2025

Partial Correlation Network Estimation by Semismooth Newton Methods

Dongwon Kim, Sungdong Lee, Joong-Ho Won

Abstract

We develop a scalable second-order algorithm for a recently proposed ℓ 1regularized pseudolikelihood-based partial correlation network estimation framework. While the latter method admits statistical guarantees and is inherently scalable compared to likelihood-based methods such as graphical lasso, the currently available implementations rely only on first-order information and require thousands of iterations to obtain reliable estimates even on high-performance supercomputers. In this paper, we further investigate the inherent scalability of the framework and propose locally and globally convergent semismooth Newton methods. Despite the nonsmoothness of the problem, these second-order algorithms converge at a locally quadratic rate, and require only a few tens of iterations in practice. Each iteration reduces to solving linear systems of small dimensions or linear complementary problems of smaller dimensions, making the computation also suitable for less powerful computing environments. Experiments on both simulated and real-world genomic datasets demonstrate the superior convergence behavior and computational efficiency of the proposed algorithm, which position our method as a promising tool for massive-scale network analysis sought for in, e.g., modern multi-omics research. * Equal contribution 39th Conference on Neural Information Processing Systems (NeurIPS 2025). is by far the most popular choice. Here, S p denotes the space of p-dimensional symmetric matrices, and S = (1/n)X T X is the sample covariance matrix of the data X = [x 1 , . . . , x n ] ⊤ ∈ R n×p . The Θ -D denotes a matrix that takes the off-diagonal elements of Θ with zero diagonal elements. The ℓ 1 norm ∥ • ∥ 1 sums up the absolute values of the elements of its input matrix. The solution Θ to (1) estimates the target precision matrix Θ * , such that x i iid ∼ N (0, (Θ * ) -1 ). However, many methods that directly solve (1) suffer from computational bottleneck when p reaches a few thousands, which is not desirable for analyzing modern massive-scaled data such as arising from multi-omics. The root cause is that the Karush-Kuhn-Tucker (KKT) optimality condition of ( 1 ) where ∂∥Θ -D ∥ 1 denotes the subdifferential of the ℓ 1 norm at Θ -D , involves matrix inversion. Direct computation of the inverse Θ -1 requires O(p 3 ) arithmetic operations. Since most existing algorithms for solving (1) (e.g., [5, 8, 20, 13] ) can be understood as solving (2) in an iterative fashion, such a costly computation essentially occurs every iteration. Worse, matrix inversion is difficult to parallelize or distribute, limiting scalability. A recently proposed framework called ACCORD [19] overcomes this limitation by employing a computation-friendly loss function in a one-to-one transformation of the precision matrix variable Θ. It solves a convex optimization problem where Ω D denotes the diagonal matrix that takes the diagonal elements of Ω. Precision matrix Θ and the optimization variable Ω are related by Ω = Θ -1/2 D Θ and Θ = Ω D Ω. Note Ω may not be symmetric. However, the sparsity patterns of Ω and Θ coincide with each other, so the ℓ 1 penalty on Ω -D is legitimate. Under some regulatory conditions, the estimate Ω that uniquely minimizes (3) is a consistent estimator of Ω * := Θ * -1/2 D