CVPR2024

PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos

Qi Zhao, M. Salman Asif, Zhan Ma

被引用 11 次

摘要

The primary focus of Neural Representation for Videos (NeRV) is to effectively model its spatiotemporal consis-tency. However, current NeRV systems often face a signif-icant issue of spatial inconsistency, leading to decreased perceptual quality. To address this issue, we introduce the Pyramidal Neural Representation for Videos (PNeRV), which is built on a multi-scale information connection and comprises a lightweight rescaling operator, Kronecker Fully-connected layer (KFc), and a Benign Selective Mem-ory (BSM) mechanism. The KFc, inspired by the tensor de-composition of the vanilla Fully-connected layer, facilitates low-cost rescaling and global correlation modeling. BSM merges high-level features with granular ones adaptively. Furthermore, we provide an analysis based on the Univer-sal Approximation Theory of the NeRV system and vali-date the effectiveness of the proposed PNeRV. We conducted comprehensive experiments to demonstrate that PNeRV sur-passes the performance of contemporary NeRV models, achieving the best results in video regression on UVG and DAVIS under various metrics (PSNR, SSIM, LPIPS, and FVD). Compared to vanilla N eRV, P N eRV achieves <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> $a+4.49$ </tex> dB gain in PSNR and a 231% increase in FVD on UVG, along with <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> $a+3.28$ </tex> dB PSNR and 634% FVD increase on DAVIS.