NeurIPS2023

SmoothHess: ReLU Network Feature Interactions via Stein's Lemma

Max Torop, Aria Masoomi, Davin Hill, Kivanç Köse, Stratis Ioannidis, Jennifer G. Dy

7 citations

Abstract

Several recent methods for interpretability model feature interactions by looking at the Hessian of a neural network. This poses a challenge for ReLU networks, which are piecewise-linear and thus have a zero Hessian almost everywhere. We propose SmoothHess, a method of estimating second-order interactions through Stein's Lemma. In particular, we estimate the Hessian of the network convolved with a Gaussian through an efficient sampling algorithm, requiring only network gradient calls. SmoothHess is applied post-hoc, requires no modifications to the ReLU network architecture, and the extent of smoothing can be controlled explicitly. We provide a non-asymptotic bound on the sample complexity of our estimation procedure. We validate the superior ability of SmoothHess to capture interactions on benchmark datasets and a real-world medical spirometry dataset. Related Work Feature Importance and First-Order Methods: Methods that quantify feature importance fall into two categories: (i) perturbation-based methods (e.g., [53, 66, 17] ), which evaluate the change in model outputs with respect to perturbed inputs, and (ii) gradient-based methods (e.g., [72, 76, 81] ), which leverage the natural interpretation of the gradient as infinitesimally local importance for a given sample. Most relevant to our work are gradient-based approaches. The saliency map, as defined in [72] , is simply the gradient of model output with respect to the input. Several variants are developed to address the shortcomings of the saliency maps. SmoothGrad [76] was developed * https://github.com/MaxTorop/SmoothHess * All ReLU network outputs, internal neurons, and SoftMax probabilities are Lipschitz continuous [29, 26] .