NeurIPS2022

GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization

Xuhai Xu, Han Zhang, Yasaman S. Sefidgar, Yiyi Ren, Xin Liu, Woosuk Seo, Jennifer Brown, Kevin S. Kuehn, Mike A. Merrill, Paula S. Nurius, Shwetak N. Patel, Tim Althoff, Margaret E. Morris, Eve A. Riskin, Jennifer Mankoff, Anind K. Dey

72 citations

Abstract

Recent research has demonstrated the capability of behavior signals captured by smartphones and wearables for longitudinal behavior modeling. However, there is a lack of a comprehensive public dataset that serves as an open testbed for fair comparison among algorithms. Moreover, prior studies mainly evaluate algorithms using data from a single population within a short period, without measuring the cross-dataset generalizability of these algorithms. We present the first multi-year passive sensing datasets, containing over 700 user-years and 497 unique users' data collected from mobile and wearable sensors, together with a wide range of well-being metrics. Our datasets can support multiple cross-dataset evaluations of behavior modeling algorithms' generalizability across different users and years. As a starting point, we provide the benchmark results of 18 algorithms on the task of depression detection. Our results indicate that both prior depression detection algorithms and domain generalization techniques show potential but need further research to achieve adequate cross-dataset generalizability. We envision our multi-year datasets can support the ML community in developing generalizable longitudinal behavior modeling algorithms. The GLOBEM website can be found at the-globem.github.io Our datasets are available at physionet.org/content/globem Our codebase is open-sourced at github.com/UW-EXP/GLOBEM Related Research [20, 97, 101] Other Human Behavior Datasets WOODS [37] # of Subjects 705 (497 unique) 48 34 29 <400 9 Time Scale 3 months×4 years 10 weeks 2 years 4 weeks Months Hours×36 devices Open-source Yes Yes Yes Yes No Yes Domain Generalization Yes No No No No Yes across domains (e.g., [7, 34, 38]); 3) Learning strategy, which aims to utilize the training procedure to enhance model generalizability (e.g., [30, 108, 86]). Researchers have released multiple datasets such as PACS [56], VLCS [32] and Office-Home [89], and developed cross-dataset benchmark platforms such as DomainBed [46], DeepDG [94], and WILDS [50] to facilitate related studies. However, most existing domain generalization research focuses on the tasks of CV and NLP. Generalizable Time-Series Models. There are fewer studies about model robustness to distribution shift on time-series data [37]. AdaRNN proposes to characterize the temporal distribution shift of signals and reduce the mismatch with an RNN [31]. Godahewa et al. provided a dataset archive for general time-series forecasting algorithms evaluation [43]. As for generalizable sensor-based human behavior modeling, some researchers have explored short-term human action recognition [44, 57, 114]. However, these studies primarily rely on data collected in a controlled setting for a short period (minutes to hours) [73, 109]. There is little research focusing on in-the-wild longitudinal human behavior sensor data (months to years) that contains diverse and variable contexts of daily livings. Mobile Sensing and Behavior Modeling. Mobile sensing is one of the most widely available data sources for longitudinal human behavior modeling [21, 40, 52, 67, 68, 79]. Compared to traditional time-series data, mobile sensing data are much longer and uncontrolled (and thus have a high data missing rate [95]). Moreover, the ground truth is usually much more sparse (e.g., self-report mental health measures administered weekly or less frequently [18, 101]). Most existing human behavior modeling algorithms using mobile sensing data are not open-sourced and do not investigate crossdataset generalization [ 33, 59, 78, 97, 102, 106] . To date, there are only a few public longitudinal human behavior sensing datasets [4, 12, 41] . Table 1 summarizes and compares them against our multi-year datasets. Existing passive mobile sensing datasets contain fewer than 50 participants and cannot support cross-dataset analysis. They cannot serve as a golden benchmark for future proposed algorithms. We are the first to release multi-year mobile sensing datasets to support the ML community in investigating cross-dataset generalizable behavior modeling algorithms. Multi-Year Datasets We introduce the data collection procedure of our multi-year datasets (Sec. 3.1), together with the details of the survey data (Sec. 3.2) and passive mobile sensing data (Sec. 3.3). Study Procedure Our data collection studies were conducted at a Carnegie-classified R-1 university in the United States, inspired by the data collection model proposed in [95] . The study went through an IRB review and approval. Fig. 1 presents the overview of the data collection process. We recruited undergraduates via emails, flyers, and social posts from 2018 to 2021 [79] . After the first year, previous-year students were invited to join again. The study was conducted during Spring quarter (10 weeks) each year, so the impact of seasonal effects was controlled. Participants received up to $245 in compensation based on their compliance each year. S.A.1 provides mor