NeurIPS2023

On the Implicit Bias of Linear Equivariant Steerable Networks

Ziyu Chen, Wei Zhu

被引用 4 次

摘要

We study the implicit bias of gradient flow on linear equivariant steerable networks in group-invariant binary classification. Our findings reveal that the parameterized predictor converges in direction to the unique group-invariant classifier with a maximum margin defined by the input group action. Under a unitary assumption on the input representation, we establish the equivalence between steerable networks and data augmentation. Furthermore, we demonstrate the improved margin and generalization bound of steerable networks over their non-invariant counterparts. Preprint. Under review. els under the same norm constraint on the network parameters [Sokolic et al., 2017 , Sannai et al., 2021] . Nevertheless, it remains unclear whether or why a GD-trained steerable network can achieve a minimizer with a parameter norm comparable to that of its non-equivariant counterpart. Consequently, the effectiveness of such complexity-measure-based arguments to explain the generalization enhancement of steerable networks in group symmetric learning tasks may not be directly applicable. In light of the above issues, in this work, we aim to fully characterize the implicit bias of the training algorithm on linear equivariant steerable networks in group-invariant binary classification. Our result shows that when trained under gradient flow (GF), i.e., GD with an infinitesimal step size, the steerable-network-parameterized predictor converges in direction to the unique group-invariant classifier attaining a maximum margin with respect to a norm defined by the input group representation. This result has three important implications: under a unitary input group action, • a linear steerable network trained on the original data set converge in the same direction as a linear fully-connected network trained on the group-augmented data set. This suggests the equivalence between training with linear steerable networks and data augmentation; • when trained on the same original data set, a linear steerable network always attains a wider margin on the group-augmented data set compared to a fully-connected network; • when the underlying distribution is group-invariant, a GF-trained linear steerable network achieves a tighter generalization bound compared to its non-equivariant counterpart. This improvement in generalization is not necessarily dependent on the group size, but rather it depends on the support of the invariant distribution. Before we end this section, we note that a similar topic has recently been explored by Lawrence et al. [2021] in the context of linear Group Convolutional Neural Networks (G-CNNs), a special case of the equivariant steerable networks considered in this work. However, we point out that the models they studied were not truly group-invariant, and thus their implicit bias result does not explain the improved generalization of G-CNNs. We will further elaborate on the comparison between our work and [Lawrence et al., 2021] in Section 2. Related work Implicit biases: Recent studies have shown that for linear regression with the logistic or exponential loss on linearly separable data, the linear predictor under GD/SGD converges in direction to the max-L 2 -margin SVM [Soudry et al., 2018 , Nacson et al., 2019 , Gunasekar et al., 2018a]. These results are extended to linear fully-connected networks and linear Convolutional Neural Networks (CNNs) by Gunasekar et al. [2018b] under the assumption of directional convergence and alignment of the network parameters, which are later proved by Ji and Telgarsky [2019a,b], Lyu and Li [2020], Ji and Telgarsky [2020]. The implicit regularization of gradient flow (GF) is further generalized to linear tensor networks by Yun et al. [2021]. For overparameterized nonlinear networks in the infinite-width regime, rigorous analysis on the optimization of DNNs has also been studied from the neural tangent kernel [