AAAI2025

Federated Binary Matrix Factorization Using Proximal Optimization

Sebastian Dalleiger, Jilles Vreeken, Michael Kamp

1 citation

Abstract

Identifying informative components in binary data is an essential task in many research areas, including life sciences, social sciences, and recommendation systems. Boolean matrix factorization (BMF) is a family of methods that performs this task by efficiently factorizing the data. In real-world settings, the data is often distributed across stakeholders and required to stay private, prohibiting the straightforward application of BMF. To adapt BMF to this context, we approach the problem from a federated-learning perspective, while building on a state-of-the-art continuous binary matrix factorization relaxation to BMF that enables efficient gradient-based optimization. We propose to only share the relaxed component matrices, which are aggregated centrally using a proximal operator that regularizes for binary outcomes. We show the convergence of our federated proximal gradient descent algorithm and provide differential privacy guarantees. Our extensive empirical evaluation demonstrates that our algorithm outperforms, in terms of quality and efficacy, federation schemes of state-of-the-art BMF methods on a diverse set of real-world and synthetic data. introduced by Zhang et al. [44], and advanced by Araujo et al. [1] based on thresholding, and by Hess et. al [16, 17] using a proximal operator. Combining ideas from the two complementary regularization strategies of Hess et al. [17] and Zhang et al. [44], Dalleiger and Vreeken [8] recently removed the need for post-processing via a proximal operator for an elastic-net-based regularizer. With regards to federated factorization in general, 'parallel' algorithms for matrix factorization [43] as well as binary matrix factorization [22] seek computational efficiency without addressing privacy concerns. The problem of matrix factorization for privacy-sensitive distributed data has been addressed by the federated-learning community with approaches for federated matrix factorization [10] and federated non-negative matrix factorization [28] . These methods are, however, not specialized to Boolean matrices. In this work, we seek to close the research gap, addressing the need for a federated, privacy-preserving binary (or Boolean) matrix factorization algorithm. Recent advances in federated learning involve techniques like FedProx [27] and SCAFFOLD [21] . FedProx, an extension of FedAvg [30] , introduces a proximity penalty term to stabilizing the training process across different clients. SCAFFOLD enhances federated learning by correcting client drift using variance reduction techniques, thereby improving convergence rates and model accuracy compared to traditional methods like FedAvg, while ProxSkip [32] uses randomization to reduce the computational cost of proximal operators which are significantly more expensive than our operators. Despite these advances, most research focuses on training deep neural networks using stochasticgradient-based local optimization schemes. These approaches are often not ideal for factorizing matrices and are unsuitable for our case, as they neither incorporate constraint-penalties, nor do they handle alternating optimization problems, thereby achieving suboptimal empirical convergence towards infeasible non-Boolean solutions.