CVPR2020

Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather

Mario Bijelic, Tobias Gruber, Fahim Mannan, Florian Kraus, Werner Ritter, Klaus Dietmayer, Felix Heide

Abstract

The proposed dataset has been described in Section 3 of the main document. In this section, we present additional details, including the preselection and multimodal annotation processes, as well as the controlled weather capture setup. We illustrate the diversity of our dataset in Figures 8, 9 , 10, and 11. Data Preselection Process Before annotation, we preselect images as many of them are not relevant due to low scene variance, sensor failures, wipers, or no objects, see Figure 4 . Frames with low scene variance usually contain scenes when waiting at a traffic light or following the same car at a long road. Sensor failures are caused either by technical problems or by sensors covered with snow or dirt. Specifically, images at a frame rate of 0.1 Hz were uniformly sampled from the dataset, delivering a total of 17,799 images. These images were annotated with the scene weather and the semantic content (discard/dispensable/appropriate/very interesting). In addition, we tagged images with interpolate if interesting content was found close to a sampled frame. 44.66 % of the selected images were annotated with discard or dispensable. To increase the number of sequences with interesting semantic content, we additionally exported sequences that contained frames with interpolate annotations at a frame rate of 1 Hz, leading to additional 4,561 samples. After this process, since the resulting subset was biased towards good weather data, we additionally exported sequences in adverse weather with at least one very interesting tag at a frame rate of 1 Hz and obtained additional 6,444 frames. In total, 28,804 frames were annotated with scene weather and semantic content classification. From these frames, we filter out frames where the weather annotations changed quickly along the recordings -we consider these annotations as noisy or ambiguous. Finally, we chose the most interesting scenes annotated at least with appropriate scene content, leading to 12,000 annotated frames for bounding box labeling.