Yanyg - Software Engineer

ZZ - Distributed Consensus Reading List

目录

From https://github.com/heidihoward/distributed-consensus-reading-list

Since its inception in the 1980s, distributed consensus has been the subject of extensive academic research. Whilst definitions vary, distributed consensus (or equivalently, atomic broadcast) most often refers to the problem of how to decide an ordered sequence of values between a set of distributed nodes. This can be used to implement an append-only replicated log which can be utilized either directly or indirectly, to provide services such as primary backup replication or state machine replication. These abstractions can, in turn, form the building blocks of new abstractions, such as a distributed key-value store. Some consensus algorithms instead decide only a single value or a partially ordered sequence of values. What unifies distributed consensus algorithms is the fact that they are always safe, regardless of delays and crashes (though they are not necessarily Byzantine fault tolerance), and are guaranteed to make progress provided sufficient liveness.

This is a long list of papers relating to distributed consensus. Many of the papers listed below fit into more than one section. However, for simplicity, each paper is listed only in the most relevant section. Where possible, open access links for each paper have been provided.

Contributions via pull requests are welcome.

Key: acmdl = ACM Digital Library

The sections are as follows:

1 Distributed Consensus

1.1 Theoretical results

This section lists theoretical results relating to distributed consensus.

  • [star] Time, Clocks, and the Ordering of Events in a Distributed System, CACM 1978 [acmdl,pdf]
    • Easily one of the most influential papers in distributed computing. Introduces the "happens-before" relation and Lamport clocks.
  • The implementation of reliable distributed multiprocess systems, Computer Networks 1978 [pdf]
    • Precursor to Paxos, notes on achieving fault-tolerance with some degree of clock synchronization.
  • Impossibility of Distributed Consensus with One Faulty Process, JACM 1985 [acmdl,pdf]
    • The famous FLP result, proving that distributed consensus algorithms need synchrony to guarantee progress.
  • On the Minimal Synchronism Needed for Distributed Consensus, JACM 1987 [acmdl,pdf]
    • Follow up to the FLP result with some extra considerations
  • Unreliable Failure Detectors for Reliable Distributed Systems, JACM 1996 [acmdl,pdf]
  • The Weakest Failure Detector for Solving Consensus, JACM 1996 [acmdl,pdf]
  • Omega Meets Paxos: Leader Election and Stability without Eventual Timely Links, DISC 2005 [acmdl,pdf]
  • Lower Bounds for Asynchronous Consensus, Distributed Computing 2006 [acmdl,pdf]
  • The Heard-Of Model: Computing in Distributed Systems with Benign Failures, Distributed Computing 2009 [acmdl,pdf]
    • Featured in the morning paper
  • Virtually Synchronous Methodology for Dynamic Service Replication, MS Tech report 2010 [pdf]

1.2 surveys

NOT PROCESSED:


Since its inception in the 1980s, [distributed consensus](https://en.wikipedia.org/wiki/Consensus_(computer_science)) has been the subject of extensive academic research. Whilst definitions vary, [distributed consensus](https://en.wikipedia.org/wiki/Consensus_(computer_science)) (or equivalently, [atomic broadcast](https://en.wikipedia.org/wiki/Atomic_broadcast)) most often refers to the problem of how to decide an ordered sequence of values between a set of distributed nodes. This can be used to implement an append-only replicated log which can be utilized either directly or indirectly, to provide services such as primary backup replication or [state machine replication](https://en.wikipedia.org/wiki/State_machine_replication). These abstractions can, in turn, form the building blocks of new abstractions, such as a distributed [key-value store](https://en.wikipedia.org/wiki/Key%E2%80%93value_database). Some consensus algorithms instead decide only a single value or a partially ordered sequence of values. What unifies distributed consensus algorithms is the fact that they are always safe, regardless of delays and crashes (though they are not necessarily [Byzantine fault tolerance](https://en.wikipedia.org/wiki/Byzantine_fault)), and are guaranteed to make progress provided sufficient liveness.

This is a long list of papers relating to distributed consensus. Many of the papers listed below fit into more than one section. However, for simplicity, each paper is listed only in the most relevant section. Where possible, open access links for each paper have been provided.

Contributions via pull requests are welcome.

⭐️ Influential papers - If you are looking for a starting point, a subset of the most influential papers on distributed consensus are highlighted using a yellow star. ⭐️

💎 Hidden gems - Papers which I personally love but are not as highly cited as the influential papers 💎

Key: acmdl = [ACM Digital Library](https://dl.acm.org)

The sections are as follows:

2 [Distributed consensus](#distributed-consensus)

  • [Theoretical results](#theoretical-results)
  • [Surveys](#surveys)
  • [Algorithms for consensus](#algorithms-for-consensus)
  • [Consensus for specialist hardware](#consensus-for-specialist-hardware)
  • [Consensus for geo-distributed systems](#consensus-for-geo-distributed-systems)
  • [Consensus in production](#consensus-in-production)
  • [Implementations of consensus](#implementations-of-consensus)
  • [Evaluations of consensus](#evaluations-of-consensus)
  • [State machine replication](#state-machine-replication)
  • [Reconfiguration](#reconfiguration)

3 [Related Topics](#related-topics)

  • [Weaker consistency models](#weaker-consistency-models)
  • [Failures](#failures)
  • [Clocks](#clocks)
  • [Correctness of consensus algorithms](#correctness-of-consensus-algorithms)
  • [Quorum systems](#quorum-systems)
  • [Byzantine fault tolerance](#byzantine-fault-tolerance)
    • [BFT surveys](#bft-surveys)
    • [BFT in theory](#bft-in-theory)
    • [BFT in practice](#bft-in-practice)
  • [Alternative fault models in distributed consensus](#alternative-fault-models-in-distributed-consensus)
  • [Misc](#misc)

4 [Future reading list](#future-reading-list)

  • [Blogroll](#blogroll)
  • [Reading lists](#reading-lists)
  • [Academic conferences & symposiums](#academic-conferences–symposiums)
  • [Academic workshops](#academic-workshops)
  • [Academic journals & magazines](#academic-journals–magazines)

## Distributed Consensus

### Theoretical results This section lists theoretical results relating to distributed consensus.

5 ⭐️ Time, Clocks, and the Ordering of Events in a Distributed System, CACM 1978 [[acmdl](https://dl.acm.org/citation.cfm?id=359563),[pdf](https://lamport.azurewebsites.net/pubs/time-clocks.pdf)]

6 The implementation of reliable distributed multiprocess systems, Computer Networks 1978 [[pdf](https://www.microsoft.com/en-us/research/publication/implementation-reliable-distributed-multiprocess-systems/)]

  • Precursor to Paxos, notes on achieving fault-tolerance with some degree of clock synchronization.

7 ⭐️ Impossibility of Distributed Consensus with One Faulty Process, JACM 1985 [[acmdl](https://dl.acm.org/citation.cfm?id=214121),[pdf](https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf)]

  • The famous FLP result, proving that distributed consensus algorithms need synchrony to guarantee progress.

8 On the Minimal Synchronism Needed for Distributed Consensus, JACM 1987 [[acmdl](https://dl.acm.org/citation.cfm?id=7533),[pdf](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.456.4362&rep=rep1&type=pdf)]

  • Follow up to the FLP result with some extra considerations

9 Unreliable Failure Detectors for Reliable Distributed Systems, JACM 1996 [[acmdl](https://dl.acm.org/citation.cfm?id=226647),[pdf](https://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p225-chandra.pdf)]

10 The Weakest Failure Detector for Solving Consensus, JACM 1996 [[acmdl](https://dl.acm.org/citation.cfm?id=234549),[pdf](http://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/weakestfd.pdf)]

11 Omega Meets Paxos: Leader Election and Stability without Eventual Timely Links, DISC 2005 [[acmdl](https://dl.acm.org/citation.cfm?id=2162336),[pdf](https://www.microsoft.com/en-us/research/wp-content/uploads/2005/09/paxos-leader.pdf)]

12 Lower Bounds for Asynchronous Consensus, Distributed Computing 2006 [[acmdl](https://dl.acm.org/citation.cfm?id=3271328),[pdf](https://lamport.azurewebsites.net/pubs/lower-bound.pdf)]

13 The Heard-Of Model: Computing in Distributed Systems with Benign Failures, Distributed Computing 2009 [[acmdl](https://dl.acm.org/citation.cfm?id=3271165),[pdf](https://infoscience.epfl.ch/record/109375/files/HO-TR-2007.pdf)]

14 Virtually Synchronous Methodology for Dynamic Service Replication, MS Tech report 2010 [[pdf](https://www.microsoft.com/en-us/research/wp-content/uploads/2010/11/vs-submit.pdf)]

### Surveys This section lists surveys, tutorials, and systemization of knowledge papers covering distributed consensus algorithms.

15 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems, Tech Report 1994 [[acmdl](https://dl.acm.org/citation.cfm?id=866693),[pdf](http://csis.pace.edu/~marchese/CS865/Papers/hadzilacos_ps.ps)]

16 How to Build a Highly Available System Using Consensus, WDAG 1996 [[acmdl](https://dl.acm.org/citation.cfm?id=675640),[pdf](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.5429&rep=rep1&type=pdf)]

17 Revisiting the PAXOS algorithm, WDAG 1997 [[acmdl](https://dl.acm.org/citation.cfm?id=675657),[pdf](https://groups.csail.mit.edu/tds/papers/DePrisco/paxos-tcs.pdf)]

18 The ABCD’s of Paxos, PODC 2001 [[acmdl](https://dl.acm.org/citation.cfm?id=383969),[pdf](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.595.4829&rep=rep1&type=pdf)]

19 ⭐️ Paxos Made Simple, SIGACT News 2001 [[pdf](https://lamport.azurewebsites.net/pubs/paxos-simple.pdf)]

20 Deconstructing paxos, SIGACT News 2003 [[pdf](http://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/deconstr_paxos.pdf),[acmdl](https://dl.acm.org/citation.cfm?id=637447)]

21 Total order broadcast and multicast algorithms: Taxonomy and survey, CSUR 2004 [[acmdl](https://dl.acm.org/citation.cfm?id=1041682),[pdf](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.110.6701&rep=rep1&type=pdf)]

22 💎 Vive La Difference: Paxos vs. Viewstamped Replication vs. Zab, TDSC 2005 [[pdf](https://www.cs.cornell.edu/fbs/publications/vivaLaDifference.pdf)]

23 The Alpha of Indulgent Consensus, Comp Journal 2007 [[acmdl](https://dl.acm.org/citation.cfm?id=1188295),[pdf](https://infoscience.epfl.ch/record/89695/files/bxl046.pdf)]

24 The Paxos Register, SRDS 2007 [[acmdl](https://dl.acm.org/citation.cfm?id=1308227),[pdf](http://www.cs.cornell.edu/lorenzo/papers/li07Paxos.pdf)]

25 Classic Paxos vs. Fast Paxos: Caveat Emptor, HotDep 2007 [[acmdl](https://dl.acm.org/citation.cfm?id=1323158),[pdf](http://www.sysnet.ucsd.edu/sysnet/miscpapers/hotdep07.pdf)]

26 Tutorial Summary: Paxos Explained from Scratch, OPODIS 2013 [[acmdl](https://dl.acm.org/citation.cfm?id=2696603),[pdf](http://www.ux.uis.no/~meling/papers/2013-paxostutorial-opodis.pdf)]

27 💎 Paxos Made Moderately Complex, CSUR 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2673577),[pdf](http://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf)]

28 On the Parallels between Paxos and Raft, and how to Port Optimizations, PODC 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3331595),[pdf](http://mpaxos.com/pub/raft-paxos.pdf)]

29 Paxos vs Raft: Have we reached consensus on distributed consensus?, PaPoC 2020 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3380787.3393681),[arxiv](https://arxiv.org/abs/2004.05074)]

30 60 Years of Mastering Concurrent Computing through Sequential Thinking, SIGACT News 2020 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3406678.3406690)]

31 What's Live? Understanding Distributed Consensus, PODC 2021 [[acmdl](https://dl.acm.org/doi/10.1145/3465084.3467947),[arxiv](https://arxiv.org/abs/2001.04787)]

32 SoK: A Generalized Multi-Leader State Machine Replication Tutorial, JSys 2021 [[pdf](https://mwhittaker.github.io/publications/bipartisan_paxos.pdf)]

### Algorithms for consensus This section lists papers describing algorithms for distributed consensus. These papers tend to be theory papers (venues such as PODC, DISC, OPODIS) whereas the [Implementations of consensus](#implementations-of-consensus) section focuses on systems papers.

33 Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols, PODC 1983 [[acmdl](https://dl.acm.org/doi/10.1145/800221.806707)]

  • As known as the Ben-Or algorithm

34 Reliable communication in the presence of failures, TOCS 1987 [[acmdl](https://dl.acm.org/citation.cfm?id=7478),[pdf](https://pdos.csail.mit.edu/archive/6.824-2006/papers/isis87.pdf)]

35 ⭐️ Consensus in the Presence of Partial Synchrony, JACM 1988 [[acmdl](https://dl.acm.org/citation.cfm?id=42283),[pdf](https://groups.csail.mit.edu/tds/papers/Lynch/jacm88.pdf)]

36 Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems, PODC 1988 [[acmdl](https://dl.acm.org/citation.cfm?id=62549),[pdf](http://pmg.csail.mit.edu/papers/vr.pdf)]

37 Efficient Message Ordering in Dynamic Networks, PODC 1996 [[acmdl](https://dl.acm.org/citation.cfm?id=248062),[pdf](http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=A396429A7FB80EE184CC3AFC78347A86?doi=10.1.1.27.6092&rep=rep1&type=pdf)]

38 ⭐️ The Part-Time Parliament, TOCS 1998 [[acmdl](https://dl.acm.org/citation.cfm?id=279229),[pdf](https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf)]

39 Disk Paxos, DISC 2000 [[acmdl](https://dl.acm.org/citation.cfm?id=675967),[pdf](https://lamport.azurewebsites.net/pubs/disk-paxos.pdf)]

  • This paper describes how to replace acceptors in Paxos with disks
  • Each disk is divided into blocks, one for each proposer. Each proposer may only write to its own block and read from other blocks, which they do in each of the two usual Paxos phases
  • Each block contains the rough equivalent to last promised ballot number and last accepted proposal for the assigned proposer

40 Specifying and Using a Partitionable Group Communication Service, TOCS 2001 [[acmdl](https://dl.acm.org/citation.cfm?id=377776&dl=ACM&coll=DL),[pdf](https://groups.csail.mit.edu/tds/papers/Lynch/TOCS.pdf)]

41 Active Disk Paxos with infinitely many processes, PODC 2002 [[acmdl](https://dl.acm.org/citation.cfm?id=1146169),[pdf](https://groups.csail.mit.edu/tds/papers/Chockler/podc-02.pdf)]

  • This paper makes Disk Paxos more “Paxos like” by assuming the disks support more operations e.g. conditional write
  • ADP claims that Disk Paxos requires a fixed set of proposers and that ADP fixes this.

42 Cheap Paxos, DSN 2004 [[acmdl](https://dl.acm.org/citation.cfm?id=1009745),[pdf](https://lamport.azurewebsites.net/pubs/web-dsn-submission.pdf)]

43 Generalized Consensus and Paxos, Tech Report 2005 [[pdf](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2005-33.pdf)]

  • Introduces the idea of deciding a partial ordering of values instead of a total ordering

44 Fast Paxos, Distributed Computing 2006 [[pdf](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2005-112.pdf)]

  • Variant of Paxos where proposers can bypass the leader by allowing multiple values to be proposed in the same ballot. This requires stronger quorums intersection, e.g. fast paxos needs 3/4 of acceptors (instead of a simple majority) to provide the same liveness guarantees as classic Paxos.

45 Consensus on Transaction Commit, TODS 2006 [[acmdl](https://dl.acm.org/citation.cfm?id=1132867),[pdf](https://lamport.azurewebsites.net/video/consensus-on-transaction-commit.pdf)]

46 Multicoordinated Paxos, PODC 2007 [[acmdl](https://dl.acm.org/citation.cfm?id=1281150),[pdf](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.8831&rep=rep1&type=pdf)]

  • Variant of Paxos which replaces the one leader with a group of leaders. Clients send operations to all leaders and they all propose values to the acceptors. Acceptors only accept a value if they have received proposals from a quorum of leaders. Similar to the non-equivocation phase in BFT. Liveness now does not depend on the leader.

47 Stoppable Paxos, Tech Report 2008 [[pdf](https://www.microsoft.com/en-us/research/wp-content/uploads/2008/04/stoppableV9.pdf)]

48 Yet Another Visit to Paxos, Tech report 2009 [[pdf](https://dominoweb.draco.res.ibm.com/reports/rz3754.pdf)]

49 Dynamic atomic storage without consensus, JACM 2011 [[acmdl](https://dl.acm.org/citation.cfm?id=1944348),[pdf](https://dahliamalkhi.files.wordpress.com/2015/12/dynastore-podc2009.pdf)]

50 Fast Genuine Generalized Consensus, SRDS 2011 [[acmdl](https://dl.acm.org/citation.cfm?id=2085374),[pdf](https://pages.lip6.fr/Marc.Shapiro/papers/FGGC-SRDS-2011.pdf)]

51 💎 Viewstamped Replication Revisited, Tech Report 2012 [[pdf](http://pmg.csail.mit.edu/papers/vr-revisited.pdf)]

52 On Collision-fast Atomic Broadcast, AINA 2014 [[pdf](https://infoscience.epfl.ch/record/100857/files/CFAbcastTR.pdf)]

53 Paxos Quorum Leases: Fast Reads Without Sacrificing Writes, SOCC 2014 [[acmdl](https://dl.acm.org/citation.cfm?id=2671001),[pdf](https://www.cs.cmu.edu/~dga/papers/leases-socc2014.pdf)]

  • Extends the idea of master read leases to allow the master to promise to use a specified subset of acceptors in every majority quorum. Acceptors in this quorum can then serve reads locally.
  • Similar to master read leases, it relies on clock synchrony.

54 Consus: Taming the Paxi, Unpublished 2016 [[arxiv](https://arxiv.org/abs/1612.03457)]

55 Flexible Paxos: Quorum Intersection Revisited, OPODIS 2016 [[arxiv](https://arxiv.org/abs/1608.06696),[pdf](http://drops.dagstuhl.de/opus/volltexte/2017/7094/pdf/LIPIcs-OPODIS-2016-25.pdf)]

56 💎 CASPaxos: Replicated State Machines without logs, Unpublished 2018 [[arxiv](https://arxiv.org/pdf/1802.07000.pdf)]

57 Fast Flexible Paxos: Relaxing Quorum Intersection for Fast Paxos, ICDCN 2021 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3427796.3427815),[arxiv](https://arxiv.org/abs/2008.02671)]

58 Spire: A Cooperative, Phase-Symmetric Solution to Distributed Consensus, IEEE Access 2021 [[pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9481103)]

  • Consensus algorithm which permits multiple proposals in the same round (similar to Fast Paxos) but uses two phases instead of larger quorums.

59 Paxos Made Practical, Unpublished [[pdf](http://www.scs.stanford.edu/~dm/home/papers/paxos.pdf)]

60 Relaxed Paxos: Quorum intersection revisited (again), PaPoC 2022 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3517209.3524040),[arvix](https://arxiv.org/abs/2203.03058)]

### Consensus for specialist hardware This section lists papers describing consensus algorithms using specialist hardware such as [SDN](https://en.wikipedia.org/wiki/Software-defined_networking), [IP-multicast](https://en.wikipedia.org/wiki/IP_multicast), or [RDMA](https://en.wikipedia.org/wiki/Remote_direct_memory_access).

61 Ring Paxos: A high-throughput atomic broadcast protocol, DSN 2010 [[pdf](https://ieeexplore.ieee.org/document/5544272),[code](http://libpaxos.sourceforge.net/paxos_projects.php#ringpaxos)]

62 Multi-Ring Paxos, DSN 2012 [[acmdl](https://dl.acm.org/citation.cfm?id=2354410.2355144),[pdf](https://ieeexplore.ieee.org/document/6263916)]

63 NetPaxos: consensus at network speed, SOSR 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2774999),[pdf](https://mcanini.github.io/papers/netpaxos.sosr15.pdf)]

64 Taming uncertainty in distributed systems with help from the network, Eurosys 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2741976),[pdf](http://www.cs.utexas.edu/falcon/papers/albatross-eurosys2015.pdf)]

65 DARE: High-Performance State Machine Replication on RDMA Networks, HPDC 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2749267),[pdf](https://spcl.inf.ethz.ch/Research/Parallel_Programming/DARE/dare-TR.pdf)]

66 Paxos Made Switch-y, CCR 2016 [[acmdl](https://dl.acm.org/citation.cfm?id=2935638),[pdf](https://www.sigcomm.org/sites/default/files/ccr/papers/2016/April/0000000-0000002.pdf)]

67 Consensus in a Box: Inexpensive Coordination in Hardware, NSDI 2016 [[acmdl](https://dl.acm.org/citation.cfm?id=2930639),[pdf](https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-istvan.pdf)]

68 Distributed Consensus and Implications of NVM on Database Management Systems, ACM Queue 2016 [[acmdl](https://dl.acm.org/citation.cfm?id=2967618),[html](https://queue.acm.org/detail.cfm?id=2967618)]

69 AllConcur: Leaderless Concurrent Atomic Broadcast, HPDC 2017 [[acmdl](https://dl.acm.org/citation.cfm?id=3078598),[pdf](https://spcl.inf.ethz.ch/Publications/.pdf/poke2017allconcur.pdf)]

70 APUS: Fast and Scalable Paxos on RDMA, SoCC 2017 [[acmdl](https://dl.acm.org/citation.cfm?id=3128609),[pdf](https://i.cs.hku.hk/~heming/papers/socc17-apus.pdf)]

71 When Raft Meets SDN: How to Elect a Leader and Reach Consensus in an Unruly Network, APNet 2017 [[acmdl](https://dl.acm.org/citation.cfm?doid=3106989.3106999),[pdf](https://conferences.sigcomm.org/events/apnet2017/papers/raft-zhang.pdf)]

72 P4xos: Consensus as a Network Service, Tech Report 2018 [[pdf](http://web.inf.usi.ch/file/pub/105/p4xos.pdf)]

73 Derecho: Fast State Machine Replication for Cloud Services, TOCS 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3302258),[pdf](http://www.cs.cornell.edu/ken/derecho-tocs.pdf),[code](https://derecho-project.github.io)]

74 NetChain: Scale-Free Sub-RTT Coordination, NSDI 2018 [[acmdl](https://dl.acm.org/citation.cfm?id=3307445),[pdf](https://www.usenix.org/system/files/conference/nsdi18/nsdi18-jin.pdf)]

75 Kernel Paxos, SRDS 2018 [[pdf](https://www.inf.usi.ch/faculty/pedone/Paper/2018/2018SRDSa.pdf)]

76 Partitioned Paxos via the Network Data Plane, Tech Report 2019 [[pdf](https://www.inf.usi.ch/faculty/soule/pubs/usi-tr-2019-01.pdf)]

77 The Impact of RDMA on Agreement, PODC 2019 [[pdf](https://arxiv.org/abs/1905.12143)]

78 CCF: A Framework for Building Confidential Verifiable Replicated Service, Whitepaper 2019 [[pdf](https://github.com/microsoft/CCF/blob/main/CCF-TECHNICAL-REPORT.pdf),[code](https://github.com/microsoft/CCF)]

79 HovercRaft: Achieving Scalability and Fault-tolerance for microsecond-scale Datacenter Services, Eurosys 2020 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3342195.3387545)]

80 FLAIR: Accelerating Reads with Consistency-Aware Network Routing, NSDI 2020 [[acmdl](https://dl.acm.org/doi/10.5555/3388242.3388295),[pdf](https://www.usenix.org/conference/nsdi20/presentation/takruri)]

81 Microsecond Consensus for Microsecond Applications, OSDI 2020 [[arxiv](https://arxiv.org/abs/2010.06288)]

82 High availability in cheap distributed key value storage, SoCC 2020 [[acmdl](https://dl.acm.org/doi/pdf/10.1145/3419111.3421290)]

83 Odyssey: The Impact of Modern Hardware on Strongly-Consistent Replication Protocols, Eurosys 2021 [[acmdl](https://dl.acm.org/doi/10.1145/3447786.3456240), [pdf](https://homepages.inf.ed.ac.uk/vnagaraj/papers/eurosys21.pdf),[techreport](https://arxiv.org/abs/2103.14701),[thesis](https://vasigavr1.github.io/files/thesis.pdf)]

### Consensus for geo-distributed systems This section covers papers describing consensus algorithms for WANs and/or geo-replicated systems. Many of these algorithms (such as [EPaxos](https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf)) are leaderless and decide a partial-ordering over values instead of the more traditional total-ordering approach.

84 Mencius: Building Efficient Replicated State Machines for WANs, OSDI 2008 [[acmdl](https://dl.acm.org/citation.cfm?id=1855767),[pdf](https://www.usenix.org/legacy/event/osdi08/tech/full_papers/mao/mao.pdf)]

85 Scalable Consistency in Scatter, SOSP 2011 [[acmdl](https://dl.acm.org/citation.cfm?id=2043559),[pdf](https://homes.cs.washington.edu/~tom/pubs/scatter.pdf)]

86 MDCC: Multi-Data Center Consistency, Eurosys 2013 [[acmdl](https://dl.acm.org/citation.cfm?id=2465363),[pdf](http://mdcc.cs.berkeley.edu/mdcc.pdf)]

87 There Is More Consensus in Egalitarian Parliaments, SOSP 2013 [[acmdl](https://dl.acm.org/citation.cfm?id=2517350),[pdf](https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf)]

88 Geo-replicated storage with scalable deferred update replication, DSN 2013 [[acmdl](https://dl.acm.org/citation.cfm?id=2515164),[pdf](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.724.1706&rep=rep1&type=pdf)]

89 Low-Latency Multi-Datacenter Databases using Replicated Commit, VLDB 2013 [[acmdl](https://dl.acm.org/citation.cfm?id=2536366),[pdf](http://www.vldb.org/pvldb/vol6/p661-mahmoud.pdf)]

90 Be General and Don’t Give Up Consistency in Geo-Replicated Transactional Systems, OPODIS 2014 [[pdf](https://www.ssrg.ece.vt.edu/papers/opodis14-alvin.pdf)]

91 CalvinFS: Consistent WAN Replication and Scalable Metadata Management for Distributed File Systems, FAST 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2750482.2750483),[pdf](https://www.usenix.org/system/files/conference/fast15/fast15-paper-thomson.pdf)]

92 GlobalFS: A Strongly Consistent Multi-Site File System, SRDS 2016 [[pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7794339)]

93 Canopus: A Scalable and Massively Parallel Consensus Protocol, CoNEXT 2017 [[acmdl](https://dl.acm.org/citation.cfm?id=3143394),[pdf](https://cs.uwaterloo.ca/~bernard/Canopus.pdf)]

94 Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus, Tech report 2017 [[pdf](https://cse.buffalo.edu/tech-reports/2017-01.pdf)]

95 WPaxos: Wide Area Network Flexible Consensus, Unpublished 2017 [[pdf](https://arxiv.org/abs/1703.08905)]

96 Speeding up Consensus by Chasing Fast Decisions, DSN 2017 [[pdf](https://arxiv.org/pdf/1704.03319.pdf)]

  • Implements an optimization to EPaxos

97 Leader Set Selection for Low-Latency Geo-Replicated State Machine, IEEE TPDS 2017 [[pdf](https://ieeexplore.ieee.org/document/7774985)]

98 DPaxos: Managing Data Closer to Users for Low-Latency and Mobile Applications, SIGMOD 2018 [[acmdl](https://dl.acm.org/citation.cfm?id=3196928),[pdf](https://nawab.me/Uploads/Nawab_DPaxos_SIGMOD2018.pdf)]

99 SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines, SoCC 2018 [[acmdl](https://dl.acm.org/citation.cfm?id=3267837),[pdf](https://www.microsoft.com/en-us/research/publication/sdpaxos-building-efficient-semi-decentralized-geo-replicated-state-machines/)]

100 FleetDB: Follow-the-workload Data Migration for Globe-Spanning Databases, Tech report 2018 [[pdf](https://cse.buffalo.edu/tech-reports/2018-02.pdf)]

101 Geographic State Machine Replication, SRDS 2018 [[pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8613971)]

102 Session Guarantees with Raft and Hybrid Logical Clocks, ICDCN 2019 [[acmdl](https://dl.acm.org/doi/pdf/10.1145/3288599.3288619)]

103 Near-Optimal Latency Versus Cost Tradeoffs in Geo-Distributed Storage, NSDI 2020 [[pdf](https://www.usenix.org/system/files/nsdi20-paper-uluyol.pdf)]

104 State-Machine Replication for Planet-Scale Systems, Eurosys 2020 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3342195.3387543),[arxiv](https://arxiv.org/abs/2003.11789)]

105 Low-Latency Geo-Replicated State Machines with Guaranteed Writes, PaPoC 2020 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3380787.3393686)]

106 EPaxos Revisited, NSDI 2021 [[pdf](https://www.usenix.org/system/files/nsdi21-tollman.pdf)]

107 Efficient Replication via Timestamp Stability, Eurosys 2021 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3447786.3456236),[arxiv](https://arxiv.org/abs/2104.01142)]

  • Describes Tempo, a leaderless partial ordering protocol that uses timestamp ordering.

108 Reducing the Latency of Dependent Operations in Large-Scale Geo-Distributed Systems, PhD Thesis 2021 [[pdf](https://uwspace.uwaterloo.ca/bitstream/handle/10012/17639/Yan_Xinan.pdf?sequence=1&isAllowed=y)]

109 LEGOStore: A Linearizable Geo-Distributed Store Combining Replication and Erasure Coding, Preprint 2021 [[arxiv](https://arxiv.org/abs/2111.12009)]

### Consensus in production This section lists papers describing experiences of deploying distributed consensus in production.

110 ⭐️ The Chubby lock service for loosely-coupled distributed systems, OSDI 2006 [[acmdl](https://dl.acm.org/citation.cfm?id=1298487),[pdf](https://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf)]

111 ⭐️ Paxos Made Live - An Engineering Perspective, PODC 2007 [[acmdl](https://dl.acm.org/citation.cfm?id=1281103),[pdf](https://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/papers/paper2-1.pdf)]

112 ⭐️ ZooKeeper: Wait-free coordination for Internet-scale systems, ATC 2010 [[acmdl](https://dl.acm.org/citation.cfm?id=1855840.1855851),[pdf](https://www.usenix.org/legacy/event/atc10/tech/full_papers/Hunt.pdf)]

113 Windows Azure Storage: a highly available cloud storage service with strong consistency, SOSP 2011 [[acmdl](https://dl.acm.org/citation.cfm?id=2043571),[pdf](https://webcourse.cs.technion.ac.il/236802/Spring2018/ho/WCFiles/Azure_Cloud_Storage.pdf)]

114 Megastore: Providing Scalable, Highly Available Storage for Interactive Services, CIDR 2011 [[pdf](http://cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf)]

  • Megastore uses SMR with witnesses, replicas that participate in log replication but do not run a state machine and read-only replicas that only run a state machine. This paper seems to use an unusual definition of Multi-Paxos where each instance is district but the 1a/1b messages for slot i is piggybacked onto 2a2/b for slot i-1.

115 Zab: High-performance broadcast for primary-backup systems, DSN 2011 [[acmdl](https://dl.acm.org/citation.cfm?id=2056409),[pdf](https://knowably-attachments.s3.amazonaws.com/u/55b69a1ce4b00ab397d67250/7c8734d3cf02154499a9b3161ef9f575/Zab_2011.pdf)]

116 Large-scale cluster management at Google with Borg, Eurosys 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2741964),[pdf](https://pdos.csail.mit.edu/6.824/papers/borg.pdf)]

117 PaxosStore: High-availability Storage Made Practical in WeChat, VLDB 2017 [[acmdl](https://dl.acm.org/citation.cfm?id=3137778),[pdf](http://www.vldb.org/pvldb/vol10/p1730-lin.pdf)]

118 Bizur: A Key-value Consensus Algorithm for Scalable File-systems, Unpublished 2017 [[pdf](https://arxiv.org/pdf/1702.04242.pdf)]

119 SLOG: Serializable, Low-latency, Geo-replicated Transactions, VLDB 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3360377),[pdf](http://www.vldb.org/pvldb/vol12/p1747-ren.pdf)]

120 CockroachDB: The Resilient Geo-Distributed SQL Database, ICMD 2020 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3318464.3386134)]

121 Millions of Tiny Databases, NSDI 2020 [[pdf](https://www.usenix.org/system/files/nsdi20-paper-brooker.pdf)]

122 Virtual Consensus in Delos, OSDI 2020 [[pdf](https://www.usenix.org/system/files/osdi20-balakrishnan.pdf)]

123 Log-structured Protocols in Delos, SOSP 2021 [[pdf](https://maheshba.bitbucket.io/papers/delos-sosp2021.pdf)]

### Implementations of consensus This section lists papers describing implementations of distributed consensus algorithms.

124 Replication and fault-tolerance in the ISIS system, Tech Report 1985 [[acmdl](https://dl.acm.org/citation.cfm?id=866067),[pdf](https://ecommons.cornell.edu/handle/1813/6508)]

125 The ISIS project: real experience with a fault tolerant programming system, OSR 1991 [[acmdl](https://dl.acm.org/citation.cfm?id=122133),[pdf](http://web.eecs.utk.edu/~mbeck/classes/Fall04-distrsys/p103-birman.pdf)]

126 Replication in the Harp File System, SOSP 1991 [[acmdl](https://dl.acm.org/citation.cfm?id=121169),[pdf](http://www.pmg.csail.mit.edu/papers/harp.pdf)]

127 Boxwood: Abstractions as the Foundation for Storage Infrastructure, OSDI 2004 [[acmdl](https://dl.acm.org/citation.cfm?id=1251262),[pdf](https://www.usenix.org/legacy/event/osdi04/tech/full_papers/maccormick/maccormick.pdf)]

128 The Farsite Project: A Retrospective, OSR 2007 [[acmdl](https://dl.acm.org/citation.cfm?id=1243422),[pdf](https://www.microsoft.com/en-us/research/wp-content/uploads/2007/04/OSR2007-4aa.pdf)]

129 Paxos for System Builders: An Overview, LADIS 2008 [[acmdl](https://dl.acm.org/citation.cfm?id=1529979),[pdf](http://www.cnds.jhu.edu/pub/papers/psb_ladis_08.pdf)]

130 Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore, VLDB 2011 [[acmdl](https://dl.acm.org/citation.cfm?id=1938549),[pdf](https://arxiv.org/pdf/1103.2408.pdf)]

131 Paxos replicated state machines as the basis of a high-performance data store, NSDI 2011 [[acmdl](https://dl.acm.org/citation.cfm?id=1972472),[pdf](https://www.usenix.org/legacy/events/nsdi11/tech/full_papers/Bolosky.pdf)]

132 Granola: Low-Overhead Distributed Transaction Coordination, ATC 2012 [[acmdl](https://dl.acm.org/citation.cfm?id=2342821.2342842),[pdf](https://www.usenix.org/system/files/conference/atc12/atc12-final118.pdf)]

133 S-Paxos: Offloading the Leader for High Throughput State Machine Replication, SRDS 2012 [[acmdl](https://dl.acm.org/citation.cfm?id=2477529),[pdf](https://infoscience.epfl.ch/record/179912/files/2012_SPaxos-CameraReady.pdf)]

134 Calvin: Fast Distributed Transactions for Partitioned Database Systems, SIGMOD 2012 [[acmdl](https://dl.acm.org/citation.cfm?id=2213838),[pdf](http://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf)]

135 Commodifying Replicated State Machines with OpenReplica, Tech report 2012 [[pdf](https://ecommons.cornell.edu/handle/1813/29009)]

136 Optimizing Paxos with batching and pipelining, Theoretical Computer Science 2013 [[acmdl](https://dl.acm.org/citation.cfm?id=2514451),[pdf](https://infoscience.epfl.ch/record/189440/files/TCS.pdf)]

137 Tango: Distributed data structures over a shared log, SOSP 2013 [[acmdl](https://dl.acm.org/citation.cfm?id=2522732),[pdf](http://www.cs.cornell.edu/~taozou/sosp13/tangososp.pdf)]

138 CORFU: A Distributed Shared Log, TOCS 2013 [[acmdl](https://dl.acm.org/citation.cfm?id=2535930),[pdf](http://www.cs.yale.edu/homes/mahesh/papers/corfumain-final.pdf)]

139 Scalable State-Machine Replication, DSN 2014 [[acmdl](https://dl.acm.org/citation.cfm?id=2672426),[pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6903591)]

140 When Paxos Meets Erasure Code: Reduce Network and Storage Cost in State Machine Replication, HPDC 2014 [[acmdl](https://dl.acm.org/doi/10.1145/2600212.2600218)]

141 ⭐️ In Search of an Understandable Consensus Algorithm, ATC 2014 [[acmdl](https://dl.acm.org/citation.cfm?id=2643666),[pdf](https://raft.github.io/raft.pdf),[code](https://github.com/logcabin/logcabin),[thesis](https://github.com/ongardie/dissertation)]

142 Paxos made transparent, SOSP 2015 [[acmdl](https://dl.acm.org/citation.cfm?doid=2815400.2815427),[pdf](https://i.cs.hku.hk/~heming/papers/crane-sosp15.pdf)]

143 Designing Distributed Systems Using Approximate Synchrony in Data Center Networks, NSDI 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2789774),[pdf](https://syslab.cs.washington.edu/papers/specpaxos-nsdi15.pdf)]

144 No compromises: distributed transactions with consistency, availability, and performance, SOSP 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2815425),[pdf](https://pdos.csail.mit.edu/6.824/papers/farm-2015.pdf)]

145 Building Consistent Transactions with Inconsistent Replication, SOSP 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2815404),[pdf](https://irenezhang.net/papers/tapir-sosp15.pdf)]

146 MetaSync: File Synchronization Across Multiple Untrusted Storage Services, ATC 2015 [[pdf](https://www.usenix.org/system/files/conference/atc15/atc15-paper-han.pdf),[acmdl](https://dl.acm.org/doi/10.5555/2813767.2813774)]

147 Making Fast Consensus Generally Faster, DSN 2016 [[pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7579738)]

148 Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics, SIGMOD 2017 [[acmdl](https://dl.acm.org/citation.cfm?id=3056100),[pdf](http://www.cs.ucf.edu/~kienhua/classes/COP5711/Papers/MSazure2017.pdf)]

149 Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols, HotCloud 2017 [[pdf](https://www.usenix.org/system/files/conference/hotcloud17/hotcloud17-paper-arora.pdf),[acmdl](https://dl.acm.org/citation.cfm?id=3154594),[slides](https://www.usenix.org/sites/default/files/conference/protected-files/hotcloud17_slides_arora.pdf)]

150 Bolt-On Global Consistency for the Cloud, SoCC 2018 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3267809.3267835),[pdf](https://web.eecs.umich.edu/~harshavm/papers/socc18.pdf)]

151 Stable and Consistent Membership at Scale with Rapid, ATC 2018 [[pdf](https://www.usenix.org/system/files/conference/atc18/atc18-suresh.pdf)]

152 The FuzzyLog: A Partially Ordered Shared Log, OSDI 2018 [[pdf](https://www.usenix.org/system/files/osdi18-lockerman.pdf)]

153 Aegean: Replication beyond the client-server model, SOSP 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3359663)]

  • Also supports BFT

154 Exploiting Commutativity For Practical Fast Replication, NSDI 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3323240),[pdf](https://www.usenix.org/system/files/nsdi19-park.pdf),[thesis](https://web.stanford.edu/~ouster/cgi-bin/papers/ParkPhD.pdf)]

155 Unifying Consensus and Atomic Commitment for Effective Cloud Data Management, VLDB 2019 [[acmdl](https://dl.acm.org/doi/10.14778/3303753.3303765),[pdf](http://www.vldb.org/pvldb/vol12/p611-maiyya.pdf)]

156 Linearizable Quorum Reads in Paxos, HotStorage 2019 [[pdf](https://www.usenix.org/system/files/hotstorage19-paper-charapko.pdf),[slides](https://www.usenix.org/sites/default/files/conference/protected-files/hotstorage19_slides_charapko.pdf)]

  • A two phase quorum read algorithm which does not require the leader and does not rely on bounded clock drift like read leases.

157 RMWPaxos: Fault-Tolerant In-Place Consensus Sequences, Unpublished 2020 [[arxiv](https://arxiv.org/abs/2001.03362)]

158 Bipartisan Paxos: A Modular State Machine Replication Protocol, Unpublished [[pdf](https://mwhittaker.github.io/publications/compartmentalized_bipartisan_paxos.pdf)]

159 Scalog: Seamless Reconfiguration and Total Order in a Scalable Shared Log, NSDI 2020 [[pdf](https://www.usenix.org/system/files/nsdi20-paper-ding.pdf)]

160 Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol, ASPLOS 2020 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3373376.3378496),[arxiv](https://arxiv.org/abs/2001.09804),[thesis](https://arxiv.org/pdf/2112.02405.pdf)]

161 PigPaxos: Devouring the communication bottlenecks in distributed consensus, ICMD 2020 [[arxiv](https://arxiv.org/abs/2003.07760),[acmdl](https://dl.acm.org/doi/10.1145/3448016.3452834)]

162 CRaft: An Erasure-coding-supported Version of Raft for Reducing Storage Cost and Network Cost, FAST 2020 [[pdf](https://www.usenix.org/conference/fast20/presentation/wang-zizhong)]

163 Scaling Replicated State Machines with Compartmentalization, VLDB 2021 [[acmdl](https://dl.acm.org/doi/10.14778/3476249.3476273),[arxiv](https://arxiv.org/abs/2012.15762),[pdf](https://mwhittaker.github.io/publications/compartmentalized_consensus.pdf)]

164 Rabia: Simplifying State-Machine Replication Through Randomization, SOSP 2021 [[arxiv](https://arxiv.org/pdf/2109.12616.pdf)]

165 Boki: Stateful Serverless Computing with Shared Logs, SOSP 2021 [[pdf](https://www.cs.utexas.edu/~zjia/boki-sosp21.pdf)]

  • Latest edition in the line of papers on shared totally-ordered logs: Tango, Corfu, vCorfu, Scalog, Delos

166 Gossip Consensus, Middleware 2021 [[pdf](https://www.inf.usi.ch/faculty/pedone/Paper/2021/middleware2021b.pdf)]

  • Looks at using gossip to reduce communication overhead of Multi-Paxos. An unstructured version of PigPaxos/Canopus.

### Evaluations of consensus This section lists papers describing standalone evaluations of consensus algorithms.

167 The Performance of Paxos in the Cloud, SRDS 2014 [[acmdl](https://dl.acm.org/citation.cfm?id=2707675.2707801),[pdf](https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SRDS14.pdf)]

168 Consensus in the Cloud: Paxos Systems Demystified, Tech report 2016 [[pdf](https://cse.buffalo.edu/tech-reports/2016-02.pdf)]

169 Spectrum: A Framework for Adapting Consensus Protocols, Unpublished 2019 [[pdf](https://arxiv.org/abs/1902.05873)]

170 Dissecting the Performance of Strongly-Consistent Replication Protocols, SIGMOD 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3319893),[pdf](https://dl.acm.org/ft_gateway.cfm?id=3319893&type=pdf)]

171 Blockchains and Distributed Databases: a Twin Study [[arxiv](https://arxiv.org/abs/1910.01310)]

  • Performance anaylsis of 5 consensus systems, 3 non-byzantine algorithms (including etcd) and 2 byzantine consensus algorithms

172 Scalable but Wasteful: Current State of Replication in the Cloud, HotStorage 2021 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3465332.3470882)]

  • Study of the efficiency (CPU utilization) of Multi-Paxos vs EPaxos, finding that EPaxos provides better throughput than Multi-Paxos at the cost of much worse efficiency.

### State machine replication This section lists papers about the application of consensus to State Machine Replication (SMR/RSMs) and Linearizability.

173 ⭐️ Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial, CSUR 1990 [[acmdl](https://dl.acm.org/citation.cfm?id=98167),[pdf](https://www.cs.cornell.edu/fbs/publications/SMSurvey.pdf)]

174 ⭐️ Linearizability: A Correctness Condition for Concurrent Objects, TOPLAS 1990 [[acmdl](https://dl.acm.org/citation.cfm?id=78972),[pdf](https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf)]

175 Implementing Linearizability at Large Scale and Low Latency, SOSP 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2815416),[pdf](https://web.stanford.edu/~ouster/cgi-bin/papers/rifl.pdf)]

176 Cheap and Available State Machine Replication, ATC 2016 [[acmdl](https://dl.acm.org/citation.cfm?id=3026984),[pdf](https://www.usenix.org/system/files/conference/atc16/atc16_paper-shi.pdf)]

177 Fine-Grained Replicated State Machines for a Cluster Storage System, NSDI 2020 [[pdf](https://www.usenix.org/system/files/nsdi20spring_liu-ming_prepub_0.pdf)]

178 Rolis: A software approach to efficiently replicating multi-core transactions, Eurosys 2022 [[odf](https://www.cis.upenn.edu/~sga001/papers/rolis-eurosys22.pdf)]

  • Optimistic execution of transactions before ordering in CFT-SMR.

179 State Machine Replication Scalability Made Simple, Eurosys 2022 [[pdf](https://vukolic.com/eurosys22-final269.pdf),[extended version](https://arxiv.org/pdf/2203.05681.pdf)]

### Reconfiguration This section lists papers on reconfiguration & leader election.

180 The SMART Way to Migrate Replicated Stateful Services, EuroSys 2006 [[acmdl](https://dl.acm.org/citation.cfm?id=1217946),[pdf](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/eurosys2006.pdf)]

181 💎 Vertical Paxos and Primary-Backup Replication, PODC 2009 [[acmdl](https://dl.acm.org/citation.cfm?id=1582783),[pdf](https://www.microsoft.com/en-us/research/wp-content/uploads/2009/05/podc09v6.pdf)]

182 Reconfiguring a State Machine, SIGACT News 2010 [[acmdl](https://dl.acm.org/citation.cfm?id=1753191),[pdf](https://lamport.azurewebsites.net/pubs/reconfiguration-tutorial.pdf)]

183 Dynamic Reconfiguration of Primary/Backup Clusters, ATC 2012 [[acmdl](https://dl.acm.org/doi/10.5555/2342821.2342860),[pdf](https://www.usenix.org/system/files/conference/atc12/atc12-final74.pdf)]

184 Take me to your leader! Online Optimization of Distributed Storage Configurations, VLDB 2015 [[pdf](https://research.google/pubs/pub43999/)]

185 Unbounded Pipelining in Dynamically Reconfigurable Paxos Clusters, Unpublished 2016 [[pdf](http://tessanddave.com/paxos-reconf-902f8b7.pdf)]

186 Matchmaker Paxos: A Reconfigurable Consensus Protocol, JSys 2021 [[pdf](https://mwhittaker.github.io/publications/matchmaker_paxos.pdf),[arxiv](https://arxiv.org/abs/2007.09468)]

## Related Topics

### Weaker consistency models This section lists papers that discuss alternative consistency models to [linearizability](https://en.wikipedia.org/wiki/Linearizability) and/or systems that depend upon synchrony for correctness (not just liveness).

187 Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency, SOSP 1989 [[acmdl](https://dl.acm.org/citation.cfm?id=74870),[pdf](https://web.stanford.edu/class/cs240/readings/89-leases.pdf)]

  • This paper introduced the idea of leases for distributed caches. This idea is used in master leases and read quorum leases.

188 ⭐️ Towards Robust Distributed Systems, PODC 2000 [[acmdl](https://dl.acm.org/citation.cfm?id=343502),[pdf](https://people.eecs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf)]

189 Chain replication for supporting high throughput and availability, OSDI 2004 [[acmdl](https://dl.acm.org/citation.cfm?id=1251261),[pdf](https://www.cs.cornell.edu/home/rvr/papers/OSDI04.pdf)]

190 Dynamo: Amazon’s Highly Available Key-value Store, SOSP 2007 [[acmdl](https://dl.acm.org/citation.cfm?id=1294281),[pdf](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)]

191 Bigtable: A Distributed Storage System for Structured Data, TOCS 2008 [[acmdl](https://dl.acm.org/citation.cfm?id=1365816),[pdf](https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf)]

192 What consistency does your key-value store actually provide?, HotDep 2010 [[acmdl](https://dl.acm.org/citation.cfm?id=1924919),[pdf](https://www.usenix.net/legacy/events/hotdep10/tech/full_papers/Anderson.pdf)]

  • Offline consistency checking of key-value traces

193 Cassandra - A Decentralized Structured Storage System, OSR 2010 [[acmdl](https://dl.acm.org/citation.cfm?id=1773922),[pdf](https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf)]

194 Benchmarking Cloud Serving Systems with YCSB, SoCC 2010 [[acmdl](https://dl.acm.org/citation.cfm?id=1807152),[pdf](https://www2.cs.duke.edu/courses/fall13/compsci590.4/838-CloudPapers/ycsb.pdf),[code](https://github.com/brianfrankcooper/YCSB)]

  • Popular benchmarking tool for key-values stores, actively maintained with support for various data stores.

195 Spanner: Google’s Globally-Distributed Database, OSDI 2012 [[acmdl](https://dl.acm.org/citation.cfm?id=2387880.2387905),[pdf](https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf)]

196 TAO: Facebook’s Distributed Data Store for the Social Graph, ATC 2013 [[acmdl](https://dl.acm.org/citation.cfm?id=2535468),[pdf](https://www.usenix.org/system/files/conference/atc13/atc13-bronson.pdf)]

197 Highly Available Transactions: Virtues and Limitations, VLDB 2013 [[pdf](https://www.vldb.org/pvldb/vol7/p181-bailis.pdf)]

198 Eventual Consistency Today: Limitations, Extensions, and Beyond, ACM Queue 2013 [[acmdl](https://dl.acm.org/citation.cfm?id=2462076),[pdf](https://queue.acm.org/detail.cfm?id=2462076)]

199 Quantifying eventual consistency with PBS, CACM 2014 [[acmdl](https://dl.acm.org/citation.cfm?id=2632792),[pdf](http://www.bailis.org/papers/pbs-vldbj2014.pdf)]

200 Existential Consistency: Measuring and Understanding Consistency at Facebook, SOSP 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2815426),[pdf](http://sigops.org/s/conferences/sosp/2015/current/2015-Monterey/printable/240-lu.pdf)]

201 Minimizing coordination in replicated systems, PaPoC 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2745955),[pdf](http://staff.ustc.edu.cn/~chengli7/papers/a8-Li.pdf)]

202 Consistency in Non-Transactional Distributed Storage Systems, CSUR 2016 [[acmdl](https://dl.acm.org/citation.cfm?id=2926965),[pdf](http://www.vukolic.com/consistency-survey.pdf)]

203 Just say NO to Paxos Overhead: Replacing Consensus with Network Ordering, OSDI 2016 [[acmdl](https://dl.acm.org/citation.cfm?id=3026914),[pdf](https://www.usenix.org/system/files/conference/osdi16/osdi16-li.pdf)]

204 The many faces of consistency, DE 2016 [[pdf](http://sites.computer.org/debull/A16mar/p3.pdf)]

205 Spanner, TrueTime & The CAP Theorem, Tech Report 2017 [[pdf](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45855.pdf)]

206 Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases, SIGMOD 2017 [[acmdl](https://dl.acm.org/citation.cfm?id=3056101)]

207 Fine-grained consistency for geo-replicated systems, ATC 2018 [[pdf](https://www.usenix.org/system/files/conference/atc18/atc18-li_cheng.pdf)]

208 Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes, SIGMOD 2018 [[acmdl](https://dl.acm.org/citation.cfm?id=3183713.3196937)]

209 Sharding the Shards: Managing Datastore Locality at Scale with Akkio, OSDI 2018 [[acmdl](https://dl.acm.org/citation.cfm?id=3291201),[pdf](https://www.usenix.org/system/files/osdi18-annamalai.pdf)]

210 On mixing eventual and strong consistency: Bayou revisited, PODC 2019 [[arxiv](https://arxiv.org/abs/1905.11762),[pdf](https://dl.acm.org/citation.cfm?id=3331583)]

211 Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection, VLDB 2020 [[pdf](https://drkp.net/papers/harmonia-vldb20.pdf)]

212 Strong and Efficient Consistency with Consistency-Aware Durability, FAST 2020 [[pdf](http://pages.cs.wisc.edu/~ag/cad.pdf)]

213 Regular Sequential Serializability and Regular Sequential Consistency, SOSP 2021 [[pdf](https://arxiv.org/pdf/2109.08930.pdf)]

  • New consistency models which are invariant equivalent to linearizability.

214 Making CRDTs Byzantine Fault Tolerant, PaPoC 2022 [[pdf](https://martin.kleppmann.com/papers/bft-crdt-papoc22.pdf),[acmdl](https://dl.acm.org/doi/abs/10.1145/3517209.3524042)]

215 Stabilizer: Geo-Replication with User-defined Consistency, ISDCS 2022 [[pdf](https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/ICDCS_Stabilizer2022-camera_ready-0413.pdf)]

### Failures This section lists papers that analyze and/or handle real-world failures of distributed systems.

216 Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications, SIGCOMM 2011 [[acmdl](https://dl.acm.org/citation.cfm?id=2018477),[pdf](http://conferences.sigcomm.org/sigcomm/2011/papers/sigcomm/p350.pdf)]

217 The Network is Reliable: An informal survey of real-world communications failures, ACM Queue 2014 [[acmdl](https://dl.acm.org/citation.cfm?id=2655736),[pdf](https://queue.acm.org/detail.cfm?id=2655736)]

218 What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems, SOCC 2014 [[acmdl](https://dl.acm.org/citation.cfm?id=2670986),[pdf](https://ucare.cs.uchicago.edu/pdf/socc14-cbs.pdf)]

219 All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI 2014 [[acmdl](https://dl.acm.org/citation.cfm?id=2685082),[pdf](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-pillai.pdf)]

220 Gray Failure: The Achilles’ Heel of Cloud-Scale Systems, HotOS 2017 [[acmdl](https://dl.acm.org/doi/10.1145/3102980.3103005)]

221 Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions, FAST 2017 [[acmdl](https://dl.acm.org/citation.cfm?id=3129648),[pdf](https://www.usenix.org/system/files/conference/fast17/fast17-ganesan.pdf)]

222 An Analysis of Network-Partitioning Failures in Cloud Systems, OSDI 2018 [[acmdl](https://dl.acm.org/doi/10.5555/3291168.3291173),[pdf](https://cs.uwaterloo.ca/~amsalqur/neat/NEAT-OSDI18.pdf)]

223 CrashTuner: Detecting Crash-Recovery Bugs in Cloud Systems via Meta-Info Analysis, SOSP 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3359645)]

224 The Inflection Point Hypothesis: A Principled Debugging Approach for Locating the Root Cause of a Failure, SOSP 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3359650)]

225 Toward a Generic Fault Tolerance Technique for Partial Network Partitioning, OSDI 2020 [[pdf](https://www.usenix.org/system/files/osdi20-alfatafta.pdf)]

226 Tolerating Slowdowns in Replicated State Machines using Copilots, OSDI 2020 [[pdf](https://www.usenix.org/system/files/osdi20-ngo.pdf)]

227 Metastable Failures in Distributed Systems, HotOS 2021 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3458336.3465286)]

228 Immunizing Systems from Distant Failures by Limiting Lamport Exposure, HotNets 2021 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3484266.3487387)]

229 Cores That Don’t Count, HotOS 2021 [[acmdl](https://dl.acm.org/doi/10.1145/3458336.3465297),[pdf](https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s01-hochschild.pdf),[talk](https://youtu.be/QMF3rqhjYuM)]

### Clocks The liveness of distributed consensus depends on some degree of clock synchronization. The following section lists papers on the topic of clock synchronization.

230 IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems, Standard 1588-2008 [[ieee](https://ieeexplore.ieee.org/document/4579760)]

231 Globally Synchronized Time via Datacenter Networks, SIGCOMM 2016 [[acmdl](https://dl.acm.org/doi/10.1145/2934872.2934885),[pdf](http://fireless.cs.cornell.edu/publications/dtp_sigcomm16.pdf)]

232 Exploiting a Natural Network Effect for Scalable, Fine-grained Clock Synchronization, NSDI 2018 [[acmdl](https://dl.acm.org/doi/10.5555/3307441.3307449),[pdf](https://www.usenix.org/conference/nsdi18/presentation/geng)]

233 Sundial: Fault-tolerant Clock Synchronization for Datacenters, OSDI 2020 [[pdf](https://www.usenix.org/system/files/osdi20-li_yuliang.pdf)]

234 Systems Research is Running out of Time, HotOS 2021 [[acmdl](https://dl.acm.org/doi/10.1145/3458336.3465293),[pdf](https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s04-najafi.pdf),[talk](https://youtu.be/euBWOgfgZIo)]

  • Some great examples of things that can go wrong with clocks.

235 Graham: Synchronizing Clocks by Leveraging Local Clock Properties, NSDI 2022 [[pdf](https://www.usenix.org/system/files/nsdi22-paper-najafi_1.pdf)]

  • Substantially reduced clock drift when time synchronization (such as NTP, PTP, Sundial) fails. Uses only commodity hardware. Won best paper award.

### Correctness of consensus algorithms This section lists papers on proving or testing the correctness of consensus algorithms.

236 Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers, Book 2002 [[acmdl](https://dl.acm.org/citation.cfm?id=579617),[pdf](https://lamport.azurewebsites.net/tla/book-02-08-08.pdf),[website](https://lamport.azurewebsites.net/tla/book.html),[amazon](https://www.amazon.com/Specifying-Systems-Language-Hardware-Engineers/dp/032114306X)]

237 I Do Declare: Consensus in a Logic Language, NetDB 2009 [[pdf](https://dsf.berkeley.edu/papers/netdb09-idodeclare.pdf)]

238 A Proof of Correctness for Egalitarian Paxos, Tech report 2013 [[pdf](http://www.cs.cmu.edu/~imoraru/epaxos/tr.pdf)]

239 Verdi: A framework for implementing and formally verifying distributed systems, PLDI 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2737958),[pdf](https://homes.cs.washington.edu/~ztatlock/pubs/verdi-wilcox-pldi15.pdf)]

240 IronFleet: Proving Practical Distributed Systems Correct, SOSP 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2815428),[pdf](http://sigops.org/s/conferences/sosp/2015/current/2015-Monterey/printable/250-hawblitzel.pdf)]

241 Lineage-driven Fault Injection, SIGMOD 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2723711),[pdf](https://people.ucsc.edu/~palvaro/molly.pdf)]

242 How Amazon web services uses formal methods, CACM 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2699417),[html](https://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-services-uses-formal-methods/fulltext)]

243 PSYNC: A partially synchronous language for fault-tolerant distributed algorithms, POPL 2016 [[acmdl](https://dl.acm.org/citation.cfm?id=2837650),[pdf](https://www.di.ens.fr/~cezarad/popl16.pdf)]

244 Ivy: safety verification by interactive generalization, PLDI 2016 [[acmdl](https://dl.acm.org/citation.cfm?id=2908080.2908118),[pdf](http://www.cs.tau.ac.il/~odedp/pldi16-paper228.pdf),[code](http://apanda.github.io/ivy/)]

245 Brief Announcement: A Family of Leaderless Generalized-Consensus Algorithms, PODC 2016 [[acmdl](https://dl.acm.org/citation.cfm?id=2933072),[pdf](https://www.losa.fr/2016_podc.pdf)]

246 Paxos Made EPR: Decidable Reasoning about Distributed Protocols, OOPSLA 2017 [[acmdl](https://dl.acm.org/citation.cfm?doid=3152284.3140568),[pdf](https://www.cs.tau.ac.il/~odedp/paxos-made-epr-oopsla17.pdf)]

247 Growing a protocol, HotCloud 2017 [[acmdl](https://dl.acm.org/citation.cfm?id=3154593),[pdf](https://www.usenix.org/conference/hotcloud17/program/presentation/ramasubramanian)]

248 Teaching Rigorous Distributed Systems With Efficient Model Checking, EuroSys 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3303947),[pdf](https://homes.cs.washington.edu/~mernst/pubs/dslabs-eurosys2019.pdf)]

249 FlyMC: Highly Scalable Testing of Complex Interleavings in Distributed Systems, Eurosys 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3303986),[pdf](https://ucare.cs.uchicago.edu/pdf/eurosys19-flyMC.pdf)]

250 Proving the Correctness of Disk Paxos in Isabelle/HOL, Unpublished 2019 [[pdf](https://www.isa-afp.org/browser_info/current/AFP/DiskPaxos/outline.pdf)]

251 I4: Incremental Inference of Inductive Invariants for Verification of Distributed Protocols, SOSP 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3359651),[code](https://github.com/GLaDOS-Michigan/I4)]

252 Scaling symbolic evaluation for automated verification of systems code with Serval, SOSP 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3359641)]

253 WormSpace: A Modular Foundation for Simple, Verifiable Distributed Systems, SoCC 2019 [[acmdl](https://dl.acm.org/doi/10.1145/3357223.3362739)]

254 TLA+ model checking made symbolic, OOPSLA 2019 [[acmdl](https://dl.acm.org/doi/10.1145/3360549)]

255 Towards an Automatic Proof of Lamport’s Paxos, FMCAD 2021 [[arxiv](https://arxiv.org/abs/2108.08796)]

  • Automatic inference of Paxos's inductive invariants

256 DistAI: Data-Driven Automated Invariant Learning for Distributed Protocols, OSDI 2021 [[code](https://github.com/VeriGu/DistAI),[pdf](https://www.usenix.org/system/files/osdi21-yao.pdf)]

  • Next step in the inference of inductive invariants for distributed protocols, following on from Ivy and I4. Still does not support Paxos.

257 Much ADO about Failures: A Fault-Aware Model for Compositional Verification of Strongly Consistent Distributed Systems, OOPSLA 2021 [[pdf](https://flint.cs.yale.edu/flint/publications/ado-tr.pdf)]

  • Formal proofs of distributed protocols in Coq including Multi-Paxos, produces verified C executables.

258 Adore: Atomic Distributed Objects with Certified Reconfiguration, PLDI 2022 [[pdf](https://flint.cs.yale.edu/flint/publications/adore.pdf)]

259 Formal Verification of a Distributed Dynamic Reconfiguration Protocol, CCP 2022 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3497775.3503688),[arxiv](https://arxiv.org/abs/2109.11987),[pdf](https://will62794.github.io/assets/papers/cpp22-formal-verification-reconfig.pdf),[code](https://github.com/will62794/logless-reconfig/tree/master]),[talk](https://youtu.be/VwCBlmS7XEA)]

260 Plain and Simple Inductive Invariant Inference for Distributed Protocols in TLA+, Draft 2022 [[pdf](https://will62794.github.io/assets/papers/dist-invariant-inference-tla.pdf)]

### Quorum systems This section lists papers on quorum systems.

261 ⭐️ A Majority Consensus Approach to Concurrency Control for Multiple Copy Databases, TODS 1979 [[acmdl](https://dl.acm.org/citation.cfm?id=320076),[pdf](http://csis.pace.edu/~marchese/CS865/Papers/p180-thomas.pdf)]

262 ⭐️ Weighted Voting for Replicated Data, SOSP 1979 [[acmdl](https://dl.acm.org/citation.cfm?id=806583),[pdf](http://pages.cs.wisc.edu/~remzi/Classes/739/Fall2015/Papers/gifford79.pdf)]

263 How to Assign Votes in a Distributed System, JACM 1985 [[acmdl](https://dl.acm.org/citation.cfm?id=4223),[pdf](https://www.cs.purdue.edu/homes/bb/cs542-17Spr/How%20to%20assign%20Votes-JACM-garcia-molina.pdf)]

264 A √N algorithm for mutual exclusion in decentralized systems, TOCS 1985 [[acmdl](https://dl.acm.org/citation.cfm?id=214445),[pdf](https://cseweb.ucsd.edu/classes/wi09/cse223a/p145-maekawa.pdf)]

265 A Quorum-Consensus Replication Method for Abstract Data Types, TOCS 1984 [[acmdl](https://dl.acm.org/doi/10.1145/6306.6308)]

266 The Reliability of Voting Mechanisms, TC 1987 [[acmdl](https://dl.acm.org/citation.cfm?id=32406),[pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1676860)]

267 The Tree Quorum Protocol: An Efficient Approach for Managing Replicated Data, VLDB 1990 [[pdf](http://www.vldb.org/conf/1990/P243.PDF)]

268 An Efficient and Fault-tolerant Solution for Distributed Mutual Exclusion, TOCS 1991 [[acmdl](https://dl.acm.org/citation.cfm?doid=103727.103728),[pdf](https://users.soe.ucsc.edu/~scott/courses/Fall11/221/Papers/Sync/agrawal-tocs91.pdf)]

269 Hierarchical Quorum Consensus: A New Algorithm for Managing Replicated Data, TC 1991 [[acmdl](https://dl.acm.org/citation.cfm?id=126154),[pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=83661)]

270 The Generalized Tree Quorum Protocol: An Efficient Approach for Managing Replicated Data, TODS 1992 [[acmdl](https://dl.acm.org/citation.cfm?id=146935),[pdf](https://www.cs.rice.edu/~alc/old/comp520/papers/generalized-tree.pdf)]

271 The Grid Protocol: A High Performance Scheme for Maintaining Replicated Data, TKDE 1992 [[acmdl](https://dl.acm.org/citation.cfm?id=627546),[pdf](https://ieeexplore.ieee.org/abstract/document/180609)]

272 Enhancing concurrency and availability for database systems, Thesis [[acmdl](https://dl.acm.org/doi/10.5555/143912)]

273 The Availability of Quorum Systems, Tech report 1993 [[acmdl](https://dl.acm.org/citation.cfm?id=903705),[pdf](https://pdfs.semanticscholar.org/ab7d/30f7a808173bc305d679262f9838869cb681.pdf)]

274 Crumbling Walls: A Class of Practical and Efficient Quorum Systems, PODC 1995 [[acmdl](https://dl.acm.org/citation.cfm?id=224978),[pdf](https://link.springer.com/article/10.1007/s004460050027)]

275 Evaluating quorum systems over the Internet, PODC 1996 [[acmdl](https://dl.acm.org/doi/10.1145/248052.248125)]

276 An Adaptive Data Replication Algorithm, TODC 1997 [[acmdl](https://dl.acm.org/doi/10.1145/249978.249982)]

277 💎 The Load, Capacity, and Availability of Quorum Systems, SIAM 1998 [[acmdl](https://dl.acm.org/citation.cfm?id=279082.279096),[pdf](https://epubs.siam.org/doi/pdf/10.1137/S0097539795281232)]

278 Optimal availability quorum systems: Theory and practice, IPL 1998 [[pdf](https://www.eng.tau.ac.il/~yash/ipl98.pdf)]

279 Are Quorums an Alternative for Data Replication?, TODS 2003 [[acmdl](https://dl.acm.org/doi/10.1145/937598.937601)]

280 Coterie Availability in Sites, DISC 2005 [[acmdl](https://dl.acm.org/citation.cfm?id=2162323),[pdf](https://link.springer.com/chapter/10.1007/11561927_3)]

281 The virtue of dependent failures in multi-site systems, HotDep 2005 [[acmdl](https://dl.acm.org/citation.cfm?id=1973401),[pdf](https://pdfs.semanticscholar.org/720c/1b5222bc91e8238b1ced2991232b9742dedc.pdf)]

282 Read-Write Quorum Systems Made Practical, PaPoC 2021 [[acmdl](https://dl.acm.org/doi/10.1145/3447865.3457962),[arxiv](https://arxiv.org/abs/2104.04102),[code](https://github.com/mwhittaker/quoracle),[pdf](https://mwhittaker.github.io/publications/quoracle.pdf),[talk](https://youtu.be/CRLw9PcK8d4)]

### Byzantine fault tolerance This section lists papers on [Byzantine Fault Tolerance](https://en.wikipedia.org/wiki/Byzantine_fault) (BFT), often used as the basis of permissioned blockchains.

#### BFT surveys

283 SoK: Consensus in the Age of Blockchains, AFT 2019 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3318041.3355458),[arxiv](https://arxiv.org/abs/1711.03936)]

284 BFT in Blockchains: From Protocols to Use Cases, ACM Computing Surveys 2021 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3503042)]

  • New survey paper on BFT, more up-to-date than "Consensus in the Age of Blockchains".

#### BFT in theory

285 ⭐️ Reaching Agreement in the Presence of Faults, JACM 1980 [[pdf](https://lamport.azurewebsites.net/pubs/reaching.pdf)]

  • Considered to be the first proof that Byzantine agreement requires at least 3f+1 nodes to tolerate f faults.

286 ⭐️ The Byzantine Generals Problem, TPLS 1982 [[acmdl](https://dl.acm.org/doi/10.1145/357172.357176),[pdf](https://www.microsoft.com/en-us/research/uploads/prod/2016/12/The-Byzantine-Generals-Problem.pdf)]

  • Famous Lamport paper which popularized the Byzantine agreement problem

287 Asynchronous consensus and broadcast protocols, JACM 1985 [[acmdl](https://dl.acm.org/citation.cfm?id=214134),[pdf](https://zoo.cs.yale.edu/classes/cs426/2013/bib/bracha85asynchronous.pdf)]

  • Another proof that crash fault tolerance requires 2f+1 nodes and BFT requires 3f+1 nodes.

288 Byzantine quorum systems, STOC 1997 [[acmdl](https://dl.acm.org/citation.cfm?id=258650),[pdf](https://dahliamalkhi.files.wordpress.com/2015/12/byzquorums-distcomputing1998.pdf)]

289 The load and availability of Byzantine quorum systems, PODC 1997 [[acmdl](https://dl.acm.org/doi/abs/10.1145/259380.259450)]

  • Follow up to Byzantine quorum systems paper.

290 Byzantine disk paxos: optimal resilience with byzantine shared memory, PODC 2004 [[acmdl](https://dl.acm.org/citation.cfm?id=1011801),[pdf](https://dahliamalkhi.files.wordpress.com/2015/12/byzdp-dc2006.pdf)]

291 Fast Byzantine Consensus, IEEE TDSC 2006 [[acmdl](https://dl.acm.org/citation.cfm?id=1159374),[pdf](http://www.cs.cornell.edu/lorenzo/papers/Martin06Fast.pdf)]

  • Describes FaB, similar in nature to Q/U.

292 Matrix Signatures: From MACs to Digital Signatures in Distributed Systems, DISC 2008 [[pdf](http://www.cs.cornell.edu/lorenzo/papers/Aiyer08Matrix.pdf)]

293 Leaderless Byzantine Paxos, DISC 2011 [[pdf](https://www.microsoft.com/en-us/research/uploads/prod/2016/12/Leaderless-Byzantine-Paxos.pdf)]

294 Byzantizing Paxos by Refinement, DISC 2011 [[acmdl](https://dl.acm.org/citation.cfm?id=2075058),[pdf](https://lamport.azurewebsites.net/tla/byzsimple.pdf)]

295 Revisiting Fast Practical Byzantine Fault Tolerance, Unpublished 2017 [[arxiv](https://arxiv.org/abs/1712.01367)]

  • Describes bugs in Zyzzyva and FaB

296 Making Byzantine Consensus Live, DISC 2020 [[arvix](https://arxiv.org/abs/2008.04167),[talk](https://youtu.be/8tBk4pFkrVo)]

  • Liveness proof for various BFT protocols including view synchtronization

297 Order-Fairness for Byzantine Consensus, Crypto 2020 [[acmdl](https://dl.acm.org/doi/10.1007/978-3-030-56877-1_16),[pdf](https://eprint.iacr.org/2020/269)]

298 Quadratic worst-case message complexity for State Machine Replication in the partial synchrony model, Preprint 2022 [[arxiv](https://arxiv.org/abs/2201.01107)]

299 Liveness and Latency of Byzantine State-Machine Replication, Preprint 2022 [[arxiv](https://arxiv.org/pdf/2202.06679.pdf)]

300 Byzantine Agreement in Polynomial Time with Near-Optimal Resilience. Preprint 2022 [[arxiv](https://arxiv.org/abs/2202.13452)]

301 On the Correctness of Speculative Consensus, Preprint 2022 [[arxiv](https://arxiv.org/pdf/2204.03552.pdf)]

302 Basilic: Resilient Optimal Consensus Protocols With Benign and Deceitful Faults, Preprint 2022 [[arxiv](https://arxiv.org/abs/2204.08670)]

#### BFT in practice

303 ⭐️ Practical Byzantine Fault Tolerance, OSDI 1999 [[acmdl](https://dl.acm.org/citation.cfm?id=296824),[pdf](http://pmg.csail.mit.edu/papers/osdi99.pdf),[proof](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/tm590.pdf),[talk](https://youtu.be/Q0xYCN-rvUs)]

  • Considered to be the first practical BFT-SMR protocol.

304 Separating agreement from execution for byzantine fault tolerant services, SOSP 2003 [[acmdl](https://dl.acm.org/citation.cfm?id=945470),[pdf](http://www.cs.cornell.edu/lorenzo/papers/sosp03.pdf)]

  • Proposes decoupling consensus from state machine execution, similar to the distinction in Paxos between proposers/acceptors and learners.
  • Byzantized variant of Disk Paxos.

305 Fault-Scalable Byzantine Fault-Tolerant Services, SOSP 2005 [[acmdl](https://dl.acm.org/citation.cfm?id=1095817)]

  • Describes the Q/U protocol, leaderless but requires 5f+1 nodes instead of 3f+1 nodes

306 HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance, OSDI 2006 [[acmdl](https://dl.acm.org/citation.cfm?id=1298473),[pdf](http://pmg.csail.mit.edu/papers/hq/hq-osdi06.pdf)]

307 Zyzzyva: speculative byzantine fault tolerance, SOSP 2007 [[acmdl](https://dl.acm.org/citation.cfm?id=1294267),[pdf](http://www.cs.cornell.edu/lorenzo/papers/kotla07Zyzzyva.pdf)]

308 Attested Append-Only Memory: Making Adversaries Stick to their Word, SOSP 2007 [[acmdl](https://dl.acm.org/doi/10.1145/1294261.1294280),[pdf](http://www.sosp2007.org/papers/sosp134-chun.pdf)]

309 Tolerating Byzantine Faults in Transaction Processing Systems using Commit Barrier Scheduling, OSR 2007 [[pdf](http://db.csail.mit.edu/pubs/hrdb.pdf),[acmdl](https://dl.acm.org/doi/10.1145/1323293.1294268)]

310 Upright cluster services, SOSP 2009 [[acmdl](https://dl.acm.org/citation.cfm?id=1629602),[pdf](http://www.cs.albany.edu/~jhh/courses/readings/clement.sosp09.upright.pdf),[code](https://github.com/amiller/upright)]

  • Develops a BFT fork of Zookeeper and HDFS, source code does not seem to be used/maintained

311 Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults, NSDI 2009 [[acmdl](https://dl.acm.org/citation.cfm?id=1558988),[pdf](http://static.usenix.org/events/nsdi09/tech/full_papers/clement/clement.pdf)]

312 TrInc: Small Trusted Hardware for Large Distributed Systems, NSDI 2009 [[pdf](https://www.usenix.org/legacy/events/nsdi09/tech/full_papers/levin/levin.pdf),[acmdl](https://dl.acm.org/doi/10.5555/1558977.1558978)]

313 Zzyzx: Scalable Fault Tolerance through Byzantine Locking, DSN 2010 [[pdf](https://www.cs.unc.edu/~reiter/papers/2010/DSN.pdf)]

314 Byzantine Chain Replication, OPODIS 2012 [[pdf](http://www.cs.cornell.edu/home/rvr/newpapers/opodis2012.pdf)]

315 Automatic Reconfiguration for Large-Scale Reliable Storage Systems, TDSC 2012 [[pdf](http://www.pmg.csail.mit.edu/papers/tdsc12.pdf)]

  • Describes an approach to reconfigure BFT systems

316 State Machine Replication for the Masses with BFT-SMART, DSN 2014 [[code](https://github.com/bft-smart/library),[pdf](https://www.di.fc.ul.pt/~bessani/publications/dsn14-bftsmart.pdf)]

  • BFT-SMR implementation, similar to PBFT. Often used as a benchmark against which new BFT protocols are evaluated.

317 The Next 700 BFT Protocols, TOCS 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2658994),[pdf](http://www.vukolic.com/700-Eurosys.pdf)]

318 Algorand: Scaling Byzantine Agreements for Cryptocurrencies, SOSP 2017 [[acmdl](https://dl.acm.org/doi/10.1145/3132747.3132757),[pdf](https://people.csail.mit.edu/nickolai/papers/gilad-algorand-eprint.pdf)]

319 Hardening Cassandra Against Byzantine Failures, OPODIS 2017 [[pdf](http://drops.dagstuhl.de/opus/volltexte/2018/8642/pdf/LIPIcs-OPODIS-2017-27.pdf)]

320 Casper the Friendly Finality Gadget, Tech report 2017 [[arxiv](https://arxiv.org/abs/1710.09437)]

321 BFT Protocols Under Fire, NSDI 2008 [[pdf](https://www.usenix.org/legacy/event/nsdi08/tech/full_papers/singh/singh.pdf)]

322 Algorand: A secure and efficient distributed ledger, TCS 2019 [[code](https://github.com/algorand/go-algorand),[pdf](https://www.algorand.com/Algorand_%20A%20secure%20and%20efficient%20distributed%20ledger.pdf)]

  • 2nd of the two Algorand papers

323 HotStuff: BFT Consensus with Linearity and Responsiveness, PODC 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3331591),[arxiv](https://arxiv.org/abs/1803.05069)]

324 The latest gossip on BFT consensus, Unpublished 2018 [[arxiv](https://arxiv.org/abs/1807.04938)]

325 SBFT: a Scalable and Decentralized Trust Infrastructure, DSN 2019 [[arxiv](https://arxiv.org/abs/1804.01626),[code](https://github.com/vmware/concord-bft)]

326 Stellar Consensus by Instantiation, DISC 2019 [[pdf](http://drops.dagstuhl.de/opus/volltexte/2019/11334/pdf/LIPIcs-DISC-2019-27.pdf)]

327 Fast and secure global payments with Stellar, SOSP 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3359636)]

  • Formal verification in Ivy and Isabelle/HOL

328 Flexible Byzantine Fault Tolerance, CCS 2019 [[acmdl](https://dl.acm.org/citation.cfm?id=3319535.3354225),[pdf](https://dahliamalkhi.files.wordpress.com/2019/09/flex-bft-ccs19.pdf)]

329 Byzantine Ordered Consensus without Byzantine Oligarchy, OSDI 2020 [[acmdl](https://dl.acm.org/doi/10.5555/3488766.3488802),[pdf](https://www.usenix.org/system/files/osdi20-zhang_yunhao_0.pdf),[talk](https://youtu.be/m2wVye5FCS8)]

330 Making Reads in BFT State Machine Replication Fast, Linearizable, and Live, SRDS 2021 [[arxiv](https://arxiv.org/abs/2107.11144)]

331 Be Aware of Your Leaders, Unpublished 2021 [[pdf](https://sonnino.com/papers/leader-reputation.pdf)]

  • Reputation-based leader rotation algorithm as an alternative to simple round robin.

332 Basil: Breaking up BFT with ACID (transactions), SOSP 2021 [[acmdl](https://dl.acm.org/doi/abs/10.1145/3477132.3483552),[arxiv](https://arxiv.org/pdf/2109.12443.pdf),[pdf](https://www.cs.cornell.edu/~fsp/reports/Suri21Basil.pdf)]

333 BigBFT: A Multileader Byzantine Fault Tolerance Protocol for High Throughput, 2021 [[arxiv](https://arxiv.org/abs/2109.12664)]

334 Scaling Membership of Byzantine Consensus, TOCS 2021 [[acmdl](https://dl.acm.org/doi/full/10.1145/3473138)]

335 DiemBFT v4: State Machine Replication in the Diem Blockchain, White paper 2021 [[pdf](https://developers.diem.com/papers/diem-consensus-state-machine-replication-in-the-diem-blockchain/2021-08-17.pdf)]

  • Describes the latest version of DiemBFT, based on a variant of HotStuff with 2-phases and quadratic view changes.

336 Dissecting the Performance of Chained-BFT, Preprint 2021 [[arxiv](https://arxiv.org/abs/2103.00777)]

337 Crime and Punishment in Distributed Byzantine Decision Tasks, Preprint 2022 [[arxiv](https://eprint.iacr.org/2022/121)]

338 Dissecting BFT Consensus: In Trusted Components we Trust!, Preprint 2022 [[arxiv](https://arxiv.org/pdf/2202.01354.pdf)]

339 Scalable Byzantine Fault Tolerance via Partial Decentralization, Preprint 2022 [[arxiv](https://arxiv.org/abs/2202.13408)]

340 Block-STM: Scaling Blockchain Execution by Turning Ordering Curse to a Performance Blessing, Preprint 2022 [[arxiv](https://arxiv.org/abs/2203.06871)]

341 Hierarchical Consensus: A Horizontal Scaling Framework for Blockchains, Preprint 2022 [[pdf](https://research.protocol.ai/publications/hierarchical-consensus-a-horizontal-scaling-framework-for-blockchains/delarocha2022.pdf)]

342 IA-CCF: Individual Accountability for Permissioned Ledgers, NSDI 2022 [[arxiv](https://arxiv.org/abs/2105.13116),[pdf](https://www.usenix.org/system/files/nsdi22-paper-shamis.pdf)]

343 DispersedLedger: High-Throughput Byzantine Consensus on Variable Bandwidth Networks, NSDI 2022 [[pdf](https://www.usenix.org/system/files/nsdi22-paper-yang_lei.pdf)]

344 DAMYSUS: Streamlined BFT Consensus Leveraging, Eurosys 2022 [[acmdl](https://dl.acm.org/doi/pdf/10.1145/3492321.3519568)

345 State Machine Replication Scalability Made Simple, Eurosys 2022 [[acmdl](https://dl.acm.org/doi/pdf/10.1145/3492321.3519579)]

346 Narwhal and Tusk: A DAG-based Mempool and Efficient BFT Consensus, Eurosys 2022 [[acmdl](https://dl.acm.org/doi/pdf/10.1145/3492321.3519594)]

347 UTT: Decentralized Ecash with Accountable Privacy, Preprint 2022 [[pdf](https://eprint.iacr.org/2022/452.pdf)]

348 Treaty: Secure Distributed Transactions, DSN 2022 [[pdf](https://dse.in.tum.de/wp-content/uploads/2022/04/Treaty_PDFExpress.pdf)]

### Alternative fault models in distributed consensus Most of these papers handle crash faults or byzantine faults. This section considers the fault models between crash and byzantine.

349 Practical Hardening of Crash-Tolerant Systems, ATC 2012 [[acmdl](https://www.usenix.org/system/files/conference/fast18/fast18-alagappan.pdf),[pdf](https://www.usenix.org/system/files/conference/atc12/atc12-final190.pdf)]

350 Visigoth Fault Tolerance, EuroSys 2015 [[acmdl](https://dl.acm.org/citation.cfm?id=2741979),[pdf](http://staff.ustc.edu.cn/~chengli7/papers/a8-porto.pdf)]

351 XFT: Practical Fault Tolerance beyond Crashes, OSDI 2016 [[acmdl](https://dl.acm.org/citation.cfm?id=3026877.3026915),[pdf](https://www.usenix.org/system/files/conference/osdi16/osdi16-liu.pdf)]

352 💎 Protocol-Aware Recovery for Consensus-Based Storage, FAST 2018 [[acmdl](https://dl.acm.org/citation.cfm?id=3241062),[pdf](https://www.usenix.org/system/files/conference/fast18/fast18-alagappan.pdf)]

### Misc Blog posts, books, talks, dissertations, etc…

353 Readings in Database Systems (5th Edition), Book 2015 [[pdf](http://www.redbook.io/pdf/redbook-5th-edition.pdf)]

354 Introduction to Reliable and Secure Distributed Programming, Book 2011 [[acmdl](https://dl.acm.org/citation.cfm?id=1972495),[website](https://www.distributedprogramming.net)]

355 Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, Book 2017 [[website](https://dataintensive.net),[amazon](https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321)]

356 CAP Twelve Years Later: How the "Rules" Have Changed, Computer Magazine 2012 [[html](https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed/)]

357 [FaunaDB: An Architectural Overview](https://fauna-assets.s3.amazonaws.com/public/FaunaDB-Technical-Whitepaper.pdf)

358 [Distributed Coordination Engine (DConE)](https://www.wandisco.com/assets/blt1d792cb4d9252692/WANdisco_DConE_White_Paper.pdf)

359 [Communication Costs in Real-world Networks](http://www.bailis.org/blog/communication-costs-in-real-world-networks/)

360 [Modeling Paxos and Flexible Paxos in Pluscal and TLA+](http://muratbuffalo.blogspot.com/2016/11/modeling-paxos-and-flexible-paxos-in.html)

361 [Waltz: A Distributed Write-Ahead Log](https://wecode.wepay.com/posts/waltz-a-distributed-write-ahead-log)

362 [Open-sourcing LogDevice, a distributed data store for sequential data](https://logdevice.io/blog/)

363 [Apache BookKeeper Insights Part 1 — External Consensus and Dynamic Membership](https://medium.com/splunk-maas/apache-bookkeeper-insights-part-1-external-consensus-and-dynamic-membership-c259f388da21)

364 Building on Quicksand, CIDR 2009 [[pdf](https://dsf.berkeley.edu/cs286/papers/quicksand-cidr2009.pdf)]

## Future reading list The following lists contain places to watch for new writings in the field of distributed consensus. They are in no particular order.

### Blogroll

365 [Jepsen](https://jepsen.io) by Kyle Kingsbury

366 [Aphyr](https://aphyr.com/posts) by Kyle Kingsbury

367 [The Paper Trail](https://www.the-paper-trail.org)

368 [Brave new geek](https://bravenewgeek.com/archive/) by Tyler Treat

369 [Highly Available, Seldom Consistent](http://www.bailis.org/blog/) by [Peter Bailis](https://twitter.com/pbailis)

370 [Christopher Meiklejohn](http://christophermeiklejohn.com)

371 [Denis Rystsov](http://rystsov.info)

372 [Metadata](http://muratbuffalo.blogspot.com) by [Murat Demirbas](https://twitter.com/muratdemirbas)

373 [Slash dev slash null](https://simbo1905.blog)

374 [David Turner](https://davecturner.github.io)

375 [Aleksey Charapko](http://charap.co)

376 [Marc Brooker](http://brooker.co.za/blog/)

377 [The Morning Paper](https://blog.acolyer.org/about/) by [Adrian Colyer](https://twitter.com/adriancolyer) (No longer updated)

378 [Hacking, Distributed](http://hackingdistributed.com) by Emin Gün Sirer

379 [All Things Distributed](https://www.allthingsdistributed.com) by [Werner Vogels](https://twitter.com/Werner)

380 [Decentralized Thoughts](https://ittaiab.github.io/) by various authors including [Ittai Abraham](https://twitter.com/ittaia?lang=en)

381 [Micah Lerner](https://www.micahlerner.com)

### Reading lists

382 [Awesome Consensus](https://github.com/dgryski/awesome-consensus) by [Damian Gryski](https://twitter.com/dgryski)

383 [Testing Distributed Systems](https://asatarin.github.io/testing-distributed-systems/) by [Andrey Satarin](https://twitter.com/asatarin)

384 [An introduction to distributed systems](https://github.com/aphyr/distsys-class) by Kyle Kingsbury

385 [Distributed systems theory for the distributed systems engineer](https://www.the-paper-trail.org/post/2014-08-09-distributed-systems-theory-for-the-distributed-systems-engineer/) by [Henry Robinson](https://twitter.com/henryr)

386 [Collective works of Leslie Lamport](http://lamport.azurewebsites.net/pubs/pubs.html)

387 [Paxosmon: Gotta Consensus Them All](https://vadosware.io/post/paxosmon-gotta-concensus-them-all/)

388 [Foundational distributed systems papers](http://muratbuffalo.blogspot.com/2021/02/foundational-distributed-systems-papers.html)

389 [Errors found in distributed protocols](https://github.com/dranov/protocol-bugs-list)

390 [Practical Byzantine Fault Tolerance](https://pmg.csail.mit.edu/bft/)

391 [Blockchain@UBC](https://blockchain.ubc.ca/research/research-papers)

### Academic conferences & symposiums

392 [Networked Systems Design and Implementation (NSDI)](https://www.usenix.org/conference/nsdi20)

393 [File and Storage Technologies (FAST)](https://www.usenix.org/conference/fast20)

394 [European Conference on Computer Systems (EuroSys)](https://www.eurosys2020.org)

395 [Dependable Systems and Networks (DSN)](https://dsn2020.webs.upv.es)

396 [Symposium on Parallelism in Algorithms and Architectures (SPAA)](https://spaa.acm.org)

397 [SIGMOD/PODS](https://sigmod2020.org)

398 [SIGMETRICS / IFIP Performance](http://www.sigmetrics.org/sigmetrics2020/)

399 [Programming Language Design and Implementation (PLDI)](https://conf.researchr.org/home/pldi-2020)

400 [Symposium on Theory of Computing (STOC)](http://acm-stoc.org/stoc2020/)

401 [Principles of Distributed Computing (PODC)](http://www.podc.org)

402 [International Conference on Distributed Computing Systems (ICDCS)](https://icdcs2020.sg)

403 [Annual Technical Conference (ATC)](https://www.usenix.org/conference/atc20)

404 [Special Interest Group on Data Communication (SIGCOMM)](http://sigcomm.org/events/sigcomm-conference)

405 [Very Large Data Bases (VLDB)](https://vldb2020.org)

406 [Operating Systems Design and Implementation (OSDI)](https://www.usenix.org/conferences/byname/179)

407 [Symposium on Reliable Distributed Systems (SRDS)](https://srds-conference.org)

408 [International Symposium on Distributed Computing (DISC)](http://www.disc-conference.org/wp/)

409 [International Conference on Principles of Distributed Systems (OPODIS)](https://opodis2019.unine.ch)

410 [Symposium on Operating Systems Principles (SOSP)](https://sosp19.rcs.uwaterloo.ca)

411 [Symposium on Cloud Computing (SoCC)](https://acmsocc.github.io/2019/)

412 [Conference on Innovative Data Systems Research (CIDR)](http://cidrdb.org/cidr2021/cfp.html)

413 [ACM Advances in Financial Technologies (AFT)](https://aft.acm.org/aft22/index.html)

414 [Computer and Communications Security (CCS)](http://www.sigsac.org/ccs.html)

415 [USENIX Security](https://www.usenix.org/conference/usenixsecurity22)

416 [Security and Privacy (S&P) / Oakland](https://www.ieee-security.org/TC/SP2023/cfpapers.html)

417 [Certified Programs and Proofs (CCP)](https://popl22.sigplan.org/home/CPP-2022#About)

[Dan Tsafrir](http://www.cs.technion.ac.il/~dan/index.html) maintains a useful list of [systems conferences by deadline](http://www.cs.technion.ac.il/~dan/index_sysvenues_deadline.html).

### Academic workshops

418 [Principles and Practice of Consistency for Distributed Data (PaPoC)](https://novasys.di.fct.unl.pt/conferences/papoc19/)

419 [Large-Scale Distributed Systems and Middleware (LADIS)](http://ladisworkshop.org)

420 [Hot Topics in Storage and File Systems (HotStorage)](https://www.hotstorage.org/2022/)

421 [Hot Topics in Operating Systems (HotOS)](https://www.sigops.org/2018/hotos2019/)

422 [Hot Topics in Networks (HotNets)](https://conferences.sigcomm.org/hotnets/2019/)

423 [Hot Topics in Cloud Computing (HotCloud)](https://www.usenix.org/conference/hotcloud19)

424 [Hot Topics in Edge Computing (HotEdge)](https://www.usenix.org/conference/hotedge19)

425 [Distributed Cloud Computing (DCC)](http://www.disc-conference.org/wp/dcc2019/)

426 [High Performance Transaction Systems (HPTS)](http://www.hpts.ws)

### Academic journals & magazines

427 ACM

428 IEEE

429 [Journal of Systems Research (JSys)](http://jsys.org/)