NeurIPS2021
See More for Scene: Pairwise Consistency Learning for Scene Classification
Gongwei Chen, Xinhang Song, Bohan Wang, Shuqiang Jiang
5 citations
Abstract
Institute of Computing Technology, Chinese Academy of Sciences Object Images Scene Characteristics ◼ More semantic concepts ◼ No clear boundary ◼ Flexible spatial configuration Scene Images Institute of Computing Technology, Chinese Academy of Sciences ◼ Current scene classification methods Region Discovery: unspecific [1] or specific regions [2] Region Aggregation: statistical models [3] or relation modeling [4] methods ◼ Issues and challenges Incompatibility Inevitable computational consumption Digging into the CNN properties for meeting scene demands? Institute of Computing Technology, Chinese Academy of Sciences ◼ Comparisons of scene and object classification models 4 Empirical receptive field [2] Transfer results with Scales [3] Institute of Computing Technology, Chinese Academy of Sciences Motivation 5 The focus area Regions that consist of pixels with larger aggregated activation values than the mean value. Average Pooling Thresholding By the mean value Channel-wise Conv Maps Aggregated Map Focus Area Institute of Computing Technology, Chinese Academy of Sciences Motivation 6 of Computing Technology, Chinese Academy of Sciences 12 Experiments ◼ The analyses of the focus area Our method: Large focus area on more images Center line: median triangle: mean Institute of Computing Technology, Chinese Academy of Sciences 13 Experiments ◼ Comparisons of different loss items • Combing CPC and IPC yields a slightly better and robust model Classification Results on Places365-small • IPC: Superior ability of expanding the focus area. Institute of Computing Technology, Chinese Academy of Sciences Conclusions ◼ We investigated the CNN classification models in terms of the focus area and show the difference between scene and object networks. ◼ We proposed a new learning framework with a tailored loss to force CNN to expand the focus area for improving scene classification. ◼ Experiments on Places365 and ImageNet verify the effectiveness of our approach, and also indicate that it is specifically designed for scenes by capturing their unique attributes.