VLDB2021
Watermarks in Stream Processing Systems: Semantics and Comparative Analysis of Apache Flink and Google Cloud Dataflow
Edmon Begoli, Tyler Akidau, Slava Chernyak, Fabian Hueske, Kathryn Knight, Kenneth L. Knowles, Daniel Mills, Dan Sotolongo
44 citations
Abstract
Streaming data processing is an exercise in taming disorder: from oftentimes huge torrents of information, we hope to extract powerful and timely analyses. But when dealing with streaming data, the unbounded and temporally disordered nature of real-world streams introduces a critical challenge: how does one reason about the completeness of a stream that never ends? In this paper, we present a comprehensive definition and analysis of watermarks , a key tool for reasoning about temporal completeness in infinite streams. First, we describe what watermarks are and why they are important, highlighting how they address a suite of stream processing needs that are poorly served by eventually-consistent approaches: