STOC2022
Improved communication complexity of fault-tolerant consensus
Mohammad Taghi Hajiaghayi, Dariusz R. Kowalski, Jan Olkowski
被引用 5 次
摘要
Consensus is one of the most thoroughly studied problems in distributed computing, yet there are still complexity gaps that have not been bridged for decades. In particular, in the classical message-passing setting with processes' crashes, since the seminal works of Bar-Joseph and Ben-Or [1998] [8] and Aspnes and Waarts [1996, 1998] [6, 5] in the previous century, there is still a fundamental unresolved question about communication complexity of fast randomized Consensus against a (strong) adaptive adversary crashing processes arbitrarily online. The best known upper bound on the number of communication bits is Θ( n 3/2 √ log n ) per process, while the best lower bound is Ω(1). This is in contrast to randomized Consensus against a (weak) oblivious adversary, for which time-almost-optimal algorithms guarantee amortized O(1) communication bits per process [21] . We design an algorithm against adaptive adversary that reduces the communication gap by nearly linear factor to O( √ n • polylog n) bits per process, while keeping almost-optimal (up to factor O(log More surprisingly, we show this complexity indeed can be lowered further, but at the expense of increasing time complexity, i.e., there is a trade-off between communication complexity and time complexity. More specifically, our main Consensus algorithm allows to reduce communication complexity per process to any value from polylog n to O( √ n • polylog n), as long as Time × Similarly, reducing time complexity requires more random bits per process, i.e., Time × Randomness = O(n • polylog n). Our parameterized consensus solutions are based on a few newly developed paradigms and algorithms for crash-resilient computing, interesting on their own. The first one, called a Fuzzy Counting, provides for each process a number which is in-between the numbers of alive processes at the end and in the beginning of the counting. Our deterministic Fuzzy Counting algorithm works in O(log 3 n) rounds and uses only O(polylog n) amortized communication bits per process, unlike previous solutions to counting that required Ω(n) bits. This improvement is possible due to a new Fault-tolerant Gossip solution with O(log 3 n) rounds using only O(|R| • polylog n) communication bits per process, where |R| is the length of the rumor binary representation. It exploits distributed fault-tolerant divide-and-conquer idea, in which processes run a Bipartite Gossip algorithm for a considered partition of processes. To avoid passing many long messages, processes use a family of small-degree compact expanders for local signaling to their overlay neighbors if they are in a compact (large and well-connected) party, and switch to a denser overlay graph whenever local signalling in the current one is failed. Last but not least, all algorithms in this paper can be implemented in other distributed models such as the congest model in which messages are of length O(log n).