VLDB2023
R3: Record-Replay-Retroaction for Database-Backed Applications
Qian Li, Peter Kraft, Michael J. Cafarella, Çagatay Demiralp, Goetz Graefe, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Xiangyao Yu, Matei Zaharia
10 citations
Abstract
Developers would benefit greatly from time travel: being able to faithfully replay past executions and retroactively execute modified code on past events. Currently, replay and retroaction are impractical because they require expensively capturing fine-grained timing information to reproduce concurrent accesses to shared state. In this paper, we propose practical time travel for database-backed applications , an important class of distributed applications that access shared state through transactions.
We present R 3 , a novel Record-Replay-Retroaction tool. R 3 implements a lightweight interceptor to record concurrency information for applications at transaction-level granularity, enabling replay and retroaction with minimal overhead. We address key challenges in both replay and retroaction. First, we design a novel algorithm for faithfully reproducing application requests running with snapshot isolation, allowing R 3 to support most production DBMSs. Second, we develop a retroactive execution mechanism that provides high fidelity with the original trace while supporting nearly arbitrary code modifications. We demonstrate how R 3 simplifies debugging for real, hard-to-reproduce concurrency bugs from popular open-source web applications. We evaluate R 3 using TPC-C and microservice workloads and show that R 3 always-on recording has a small performance overhead (<25% for point queries but <0.1% for complex transactions like in TPC-C) during normal application execution and that R 3 can retroactively execute bugfixed code over recorded traces within 0.11--0.78× of the original execution time.