ICLR2026

Divergence-Free Neural Networks with Application to Image Denoising

Sébastien Herbreteau, Etienne Meunier

Abstract

In many information processing systems, it may be desirable to ensure that any change of the input, whether by shifting or scaling, results in a corresponding change in the system response. While deep neural networks are gradually replacing all traditional automatic processing methods, they surprisingly do not guarantee such normalization-equivariance (scale + shift) property, which can be detrimental in many applications. To address this issue, we propose a methodology for adapting existing neural networks so that normalization-equivariance holds by design. Our main claim is that not only ordinary convolutional layers, but also all activation functions, including the ReLU (rectified linear unit), which are applied elementwise to the pre-activated neurons, should be completely removed from neural networks and replaced by better conditioned alternatives. To this end, we introduce affine-constrained convolutions and channel-wise sort pooling layers as surrogates and show that these two architectural modifications do preserve normalizationequivariance without loss of performance. Experimental results in image denoising show that normalization-equivariant neural networks, in addition to their better conditioning, also provide much better generalization across noise levels. 2. channel-wise sort pooling: all activation functions that apply element-wise, such as the ReLU, are substituted with higher-dimensional nonlinearities, namely two by two sorting along channels that constitutes a fast and efficient normalization-equivariant alternative. Despite strong architectural constraints, we show that these simple modifications do not degrade performance and, even better, increase robustness to noise levels in image denoising both in practice and in theory. Related Work A non-exhaustive list of application fields where equivariant neural networks were studied includes graph theory, point cloud analysis and image processing. Indeed, graph neural networks are usually expected to equivary, in the sense that a permutation of the nodes of the input graph should permute the output nodes accordingly. Several specific architectures were investigated to guarantee such a property [3, 21, 38] . In parallel, rotation and translation-equivariant networks for dealing with point cloud data were proposed in a recent line of research [6, 13, 40] . A typical application is the ability for these networks to produce direction vectors consistent with the arbitrary orientation of the input point clouds, thus eliminating the need for data augmentation. Finally, in the domain of image processing, it may be desirable that neural networks produce outputs that equivary with regard to rotations of the input image, whether these outputs are vector fields [31] , segmentation maps [41, 43] , or even bounding boxes for object tracking [16] . In addition to their better conditioning, equivariant neural networks by design are expected to be more robust to outliers. A spectacular example has been revealed by S. Mohan et al. [33] in the field of image denoising. By simply removing the additive constant ("bias") terms in neural networks with ReLU activation functions, they showed that a much better generalization at noise levels outside the training range was ensured. Although they do not fully elucidate why biases prevent generalization, and their removal allows it, the authors establish some clues that the answer is probably linked to the scale-equivariant property of the resulting encoded function: rescaling the input image by a positive constant value rescales the output by the same amount. 3 Overview of normalization-equivariance Definitions and properties of three types of fundamental equivariances We start with formal definitions of the different types of equivariances studied in this paper. Please note that our definition of "scale" and "shift" may differ from the definitions given by some authors in the image processing literature. Definition 1 A function f : R n → R m is said to be: where addition with the scalar shift µ is applied element-wise. Note that the scale-equivariance property is more often referred to as positive homogeneity in pure mathematics. Like linear maps that are completely determined by their values on a basis, the above described equivariant functions are actually entirely characterized by the values their take on specific subsets of R n , as stated by the following lemma (see proof in Appendix C.1). Lemma 1 (Characterizations) f : R n → R m is entirely determined by its values on the: • unit sphere S of R n if it is scale-equivariant, • orthogonal complement of Span(1 n ), i.e. Span(1 n ) ⊥ , if it is shift-equivariant, • intersection S ∩ Span(1 n ) ⊥ if it is normalization-equivariant, where 1 n denotes the all-ones vector of R n . Finally, Lemma 2 highlights three basic equivariance-preserving mathematical operations that can be used as building blocks for designing neural network architectures (see proof in Appendix C.1