ICLR2026

The Lattice Geometry of Neural Network Quantization: A Short Equivalence Proof of GPTQ and Babai's Algorithm

Johann Birnick

6 citations

Abstract

We explain how data-driven quantization of a linear unit in a neural network corresponds to solving the closest vector problem for a certain lattice generated by input data. We prove that the GPTQ algorithm (Frantar et al., 2023) is equivalent to Babai's well-known nearest-plane algorithm (Babai, 1986) . We furthermore provide geometric intuition for both algorithms. Lastly, we note the consequences of these results, in particular hinting at the possibility of using lattice basis reduction for improved quantization. QUANTIZATION AND LATTICES Computations in neural networks are usually carried out in 32-bit or 16-bit floating point arithmetic. In particular, the parameters (weights) of the network are stored in this comparatively high precision. Quantization is the art of reducing precision, in favor of less memory consumption and faster computation, while keeping the accuracy as high as possible. In this paper, we are interested only in post-training quantization of the weights: We are handed a trained neural network, and our goal is to approximate (some of) the parameters of the network with a coarse numerical alphabet, while keeping the accuracy high. Commonly, this effort is focused on the linear parts of the network. That is, we are given a linear map R n → R m , represented by a weight matrix W ∈ R m×n , and we seek to find another m × n matrix V , whose entries have lower numerical precision and which "approximates W well". Concretely, this means: