CVPR2023

Benchmarking Self-Supervised Learning on Diverse Pathology Datasets

Mingu Kang, Heon Song, Seonwook Park, Donggeun Yoo, Sérgio Pereira

Abstract

In this supplementary material, we describe the details of the downstream datasets adopted in the main paper and show some example images. This document also contains further implementation details regarding the pre-training and downstream training steps, including finetuning with limited labeled data. Last but not least, we provide further analyses, such as the effectiveness of pretraining for longer epochs and pre-training stability when using data from different magnifications. Note that the corresponding or relevant sections from the main paper are referenced in blue text in the section titles. A. Downstream Dataset Details (Section 4.2) In this section, we describe the details of the datasets used in our analysis. We use BACH, CRC, PCam, and MHIST for the image classification task, and CoNSeP for the nuclei instance segmentation task. We sample a few training images from each dataset and present them in Fig. A.1 and Fig. A.2. PCam. The PatchCamelyon (PCam) [21] dataset is derived from the Camelyon16 [3] dataset that contains 400 H&E stained WSIs from two hospitals: Radboud Univer-