KDD2021
MULTIVERSE: Mining Collective Data Science Knowledge from Code on the Web to Suggest Alternative Analysis Approaches
Mike A. Merrill, Ge Zhang, Tim Althoff
3 citations
Abstract
Data analyses are based on a series of "decision points" including data filtering, feature operationalization and selection, model specification, and parametric assumptions. "Multiverse Analysis" research has shown that a lack of exploration of these decisions can lead to non-robust conclusions based on highly sensitive decision points. Importantly, even if myopic analyses are technically correct, analysts' focus on one set of decision points precludes them from exploring alternate formulations that may produce very different results. Prior work has also shown that analysts' exploration is often limited based on their training, domain, and personal experience. However, supporting analysts in exploring alternative approaches is challenging and typically requires expert feedback that is costly and hard to scale.