CVPR2021

Efficient Object Embedding for Spliced Image Retrieval

Bor-Chun Chen, Zuxuan Wu, Larry S. Davis, Ser-Nam Lim

Abstract

Different spatial pooling techniques [7] and postprocessing steps such as dimensionality reduction [4] have been shown to greatly affect retrieval performance. We provide a detailed analysis of different parameters for selecting an effective feature extraction model. PCA and pooling. Given a convolutional feature map from conv5 3 layer F ∈ R W ×H×C , we consider the following spatial pooling functions P : R W ×H×C → R C : (1) sum pooling [1] (SPoC), (2) max-pooling [10] (Max), (3) regional max-pooling [13] (R-MAC), and (4) generalized mean pooling [9] (GeM). We also perform experiments while varying the number of dimensions in PCA from 64 to 2,048 with whitening. Figure 1 shows a detailed analysis of the effect of different pooling techniques and postprocessing steps. Figure 1 (a) shows retrieval performance of four benchmarks with different PCA dimensions. Even though the performance of all embeddings decreases as the feature dimension goes down, embeddings from the classification model (ResNet50) consistently perform the best for all dimensions, which further supports our observation in the paper. Figure 1 (b) shows the mAP for different pooling techniques. Here, ResNet50 embeddings again consistently achieve the best performance among embeddings from different pre-trained models on all datasets. Embeddings from different layers. Figure 2 shows the performance with embeddings extracted from different layers in ResNet50 backbone from conv4 1 to conv5 3. Note that for lower-level embeddings, detection models and classification models share similar performance, because they represent similar low-level texture features. However, their performance diverges for embeddings from high-level layers. This is an important observation since embeddings extracted from a higher level (conv5 x) achieve better retrieval performance across all datasets. This again supports the embeddings from classification models as being better suited for image retrieval.