SIGMOD2024
Live Patching for Distributed In-Memory Key-Value Stores
Michael Fruth, Stefanie Scherzinger
3 citations
Abstract
Providers of high-availability data stores need to roll out software updates without causing noticeable downtimes. For distributed data stores like Redis Cluster, the state-of-the-art is a rolling update, where the nodes are restarted in sequence. This requires preserving, restoring, and resynchronizing the database state, which can significantly prolong updates for larger memory states, and thus delay critical security fixes. In this article, we propose applying software updates directly in memory without restarting any nodes. We present the first fully operational live patching solution for Redis Cluster on Linux. We support both push- and pull-based distribution of patches, trading dissemination speed against cluster elasticity, the ability to allow nodes to dynamically join or leave the cluster. Our integration is very lightweight, as it piggybacks on the cluster-internal gossip protocol. Our experiments benchmark live patching against state-of-the-art rolling updates. In one scenario, live patching updates the entire cluster orders of magnitude faster, without unfavorable trade-offs regarding throughput, tail latencies, or network consumption. To showcase generalizability, we provide general guidelines on integrating live patching for distributed database systems and successfully apply them to a primary-replica PostgreSQL setup. Given our overall promising results, we discuss the opportunities of live patching in database DevOps.