NeurIPS2024

Separation and Bias of Deep Equilibrium Models on Expressivity and Learning Dynamics

Zhoutong Wu, Yimu Zhang, Cong Fang, Zhouchen Lin

Abstract

The deep equilibrium model (DEQ) generalizes the conventional feedforward 1 neural network by fixing the same weights for each layer block and extending 2 the number of layers to infinity. This novel model directly finds the fixed points 3 of such a forward process as features for prediction. Despite empirical evidence 4 showcasing its efficacy compared to feedforward neural networks, a theoretical 5 understanding for its separation and bias is still limited. In this paper, we take a 6 step by proposing some separations and studying the bias of DEQ in its expressive 7 power and learning dynamics. The results include: (1) A general separation is 8 proposed, showing the existence of a width-m DEQ that any fully connected neural 9 networks (FNNs) with depth O ( m α ) for α ∈ (0 , 1) cannot approximate unless 10 its width is sub-exponential in m ; (2) DEQ with polynomially bounded size and 11 magnitude can efficiently approximate certain steep functions (which has very large 12 derivatives) in L ∞ norm, whereas FNN with bounded depth and exponentially 13 bounded width cannot unless its weights magnitudes are exponentially large; (3) 14 The implicit regularization caused by gradient flow from a diagonal linear DEQ 15 is characterized, with specific examples showing the benefits brought by such 16 regularization. From the overall study, a high-level conjecture from our analysis 17 and empirical validations is that DEQ has potential advantages in learning certain 18 high-frequency components. 19