ICCV2023
Deep Equilibrium Object Detection
Shuai Wang, Yao Teng, Limin Wang
被引用 1 次
摘要
Query-based object detectors and segmenters have made great progress in their respective tasks by employing an iterative refinement decoder. These query-based methods directly represent object instances with a set of learnable queries. These query vectors are progressively refined to stable, meaningful representations through a sequence of decoder layers, and then used to directly predict object locations (mask or box) and categories with customized heads. In this paper, we present a novel query-based object decoder design with infinite refinement (DEQ-Decoder) through a deep equilibrium model (DEQ). Our DEQ-Decoder models the query vector refinement as the fixed point solving of an <bold>implicit</bold> (DEQ) layer. To be more specific to query refinement, we use a two-step unrolled equilibrium equation to explicitly capture the query vector refinement. Accordingly, we are able to incorporate refinement awareness into the DEQ-Decoder training with the inexact gradient back-propagation (RAG). In addition, to stabilize the training of our DEQ-Decoder and improve its generalization ability, we devise a deep supervision scheme on the optimization path of DEQ-Decoder with refinement-aware perturbation (RAP). To demonstrate the effectiveness of DEQ-Decoder, we apply it to object detection and instance segmentation. For object detection, we propose DEQDet based on our DEQ-Decode. DEQDet converges faster, consumes less memory, and achieves better results than the baseline counterpart (AdaMixer). In particular, our DEQDet with ResNet50 backbone and 300 queries achieves the 49.6 <italic>mAP</italic> and 33.9 <italic>AP<inline-formula><tex-math notation="LaTeX"></tex-math><alternatives>mml:mathmml:msubmml:mrow/mml:mis</mml:mi></mml:msub></mml:math><inline-graphic xlink:href="wang-ieq1-3595380.gif"/></alternatives></inline-formula></italic> on the MS COCO benchmark under <inline-formula><tex-math notation="LaTeX"></tex-math><alternatives>mml:mathmml:mrowmml:mn2</mml:mn>mml:mo×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="wang-ieq2-3595380.gif"/></alternatives></inline-formula> training scheme (24 epochs). For instance segmentation, Our DEQSeg achieves much better box mAP metrics and slightly better mask metrics for different mask decoding branches.