PACELC, Paxos, and Raft: How “Consistency vs Latency” Shows Up in Real Systems
Distributed systems design is mostly about choosing which pain you’re willing to live with—because you can’t eliminate it. The PACELC theorem is a practical lens for those choices, and Paxos and Raft are two of the most important tools engineers use when they decide “we’re going to pay latency to buy correctness.”
This post ties them together:
PACELC tells you what trade-off you’re making
Paxos/Raft are two ways to implement the “consistent” side of that trade-off
You’ll see concrete examples, message flows, and how partitions change behavior
Why CAP isn’t enough, and why PACELC exists
You likely know the CAP-style story:
Partition happens → you must choose Consistency or Availability.
The missing piece is: most of the time you’re not partitioned—you’re just dealing with latency, replication delay, tail latencies, and node slowness.
PACELC adds the everyday reality:
If there is a Partition (P): you choose Availability (A) or Consistency (C)
Else (E) (normal operation, no partition): you choose Latency (L) or Consistency (C)
That “ELC” half is what you feel in production:
“Do we wait for quorum to acknowledge the write?”
“Do we allow stale reads from a local replica for speed?”
“Do we block requests during leader failover?”
Those are ELC decisions.
A simple example PACELC makes obvious
Imagine a service with 3 replicas: A, B, C.
Else case (no partition): latency vs consistency
A client writes balance=100.
You have two broad choices:
Low latency (EL)
Acknowledging after only the local node writes:Write is fast
But another client reading from a different replica might still see the old value for a bit (eventual consistency)
High consistency (EC)
Acknowledging only after a majority confirms (quorum):Write is slower (extra network round(s))
But any read that respects the quorum rules can be made strongly consistent
Partition case (P): availability vs consistency
Now suppose a partition splits the cluster:
Side 1: {A}
Side 2: {B, C}
If a client reaches A, can A accept writes?
If Availability is the priority: A might accept writes anyway → but B/C might accept conflicting writes → you’ll reconcile later (and you’re no longer strongly consistent).
If Consistency is the priority: A rejects writes because it can’t reach a quorum → system remains consistent, but requests fail.
So PACELC makes your real choice explicit:
During P: do you error out or diverge?
During E: do you wait (latency) or allow staleness?
Where consensus fits: Paxos/Raft are “choose C” engines
Consensus algorithms exist for one main reason:
Make a replicated system behave like there is a single, consistent order of operations—even when nodes crash and networks are unreliable.
Paxos and Raft are typically used to build a replicated log (a sequence of commands). Every replica applies the same commands in the same order → the state stays consistent.
This usually places systems using them in the CP (in partitions) and EC (in normal operations) region:
P → C: if quorum isn’t possible, stop (or degrade) rather than diverge
E → C over L: pay extra rounds to ensure ordering/commit rules
Paxos: consensus without a permanent leader (but often used with one)
Paxos can look abstract, but the core idea is clean:
Multiple nodes might propose values
The system must pick one value
It must remain safe even with node crashes and message delays
Roles
Two phases (single-decree Paxos)
Phase 1: Prepare / Promise
Proposer picks a proposal number
n(unique, increasing).Proposer sends
PREPARE(n)to a majority of acceptors.Each acceptor replies with a promise:
“I won’t accept proposals less than
n”and includes the highest-numbered proposal/value it already accepted (if any)
Phase 2: Accept / Accepted
If proposer gets promises from a majority, it sends
ACCEPT(n, value):If any acceptor reported a previously accepted value, proposer must use the value with the highest proposal number it heard about.
Acceptors accept it (if they haven’t promised a higher
n) and replyACCEPTED.
Once a majority has accepted the same (n, value), the value is chosen.
Why this works (intuitive)
Majorities intersect. Any two majorities share at least one acceptor. That overlap prevents two different values from both being safely “chosen” without contradiction.
The “real systems” Paxos: Multi-Paxos
Single-decree Paxos chooses one value once. Real systems need a sequence of values (a log).
Multi-Paxos is the common practical variant:
You run Phase 1 once to establish a stable leader (often called a “distinguished proposer”)
Then for each new log entry, you mainly run Phase 2 (accept) repeatedly
This is why people say Paxos “is leader-based in practice” even if the theory doesn’t require a single permanent leader.
Paxos + PACELC
Else (E): Multi-Paxos pays latency to replicate to quorum before commit → EC
Partition (P): if the leader can’t reach quorum, it can’t safely commit → choose PC
Raft: the “engineer-friendly” replicated log
Raft was designed to be easier to understand and implement correctly than Paxos, while achieving the same end goal: a consistent replicated log.
Raft’s big design move is making leadership explicit and central:
There is a Leader
Others are Followers
A Candidate appears during elections
Raft divides the problem into:
Leader election
Log replication
Safety rules
1) Leader election (how the cluster chooses a leader)
Followers expect periodic heartbeats from a leader.
If a follower times out, it becomes a candidate and requests votes.
If it gets a majority, it becomes leader.
Key idea: randomized election timeouts reduce split votes.
2) Log replication (how writes get committed)
When a client sends a write:
Leader appends the command as a new log entry locally.
Leader sends
AppendEntriesRPCs to followers with the new entry.Followers append if consistent with their previous log.
Once leader sees the entry replicated to a majority, it marks it committed.
Leader tells followers the commit index; everyone applies committed entries to state machine.
3) Safety (why the log stays consistent)
Raft enforces critical rules, like:
Only entries replicated on a majority can be committed
Leaders are elected in a way that prevents a leader with an outdated log from being chosen (log up-to-date check during voting)
Followers reject entries that would create inconsistencies (conflict detection + backtracking)
A concrete Raft example (5 nodes)
Nodes: A, B, C, D, E
Majority = 3
Client write arrives at leader A:
A appends entry
x=7at index 10A sends AppendEntries to B, C, D, E
B and C succeed; D is slow; E is down
A has A+B+C = 3 replicas → entry is committed
A can safely respond to client
Later, D catches up; E recovers and catches up
Raft + PACELC
Raft’s default posture is the same as Multi-Paxos:
Else (E): commit waits for quorum replication → you pay network latency → EC
Partition (P): minority partition can’t form quorum, so it can’t accept commits → you lose availability but preserve safety → PC
Paxos vs Raft through the PACELC lens
Under partition (P): what happens?
Both Raft and practical Paxos variants tend to behave like:
Only the partition that can assemble a quorum can make progress
The minority side will reject writes (and often reads that require linearizability)
So they usually choose Consistency over Availability during partitions.
Without partition (E): where do they spend latency?
Both typically spend latency on:
quorum replication
leader coordination
(sometimes) quorum reads for strong read semantics
So by default they choose Consistency over Latency, though you can tune read strategies.
Reads matter: your “ELC” choice often hides in read paths
A lot of teams think “writes go through consensus, so we’re consistent.”
But reads decide whether you’re truly EC or drifting toward EL.
Common patterns:
Strong reads (consistent, higher latency)
Read from leader only (and ensure leader is still valid)
Or use quorum-based read mechanisms (implementation-dependent)
In Raft-like systems, a “read barrier” approach ensures leader has committed up-to-date info before serving linearizable reads
Fast reads (lower latency, potentially stale)
Read from nearest follower
Allow follower reads without strict coordination
Great for latency, but you have to accept staleness windows and tricky edge cases after failover
PACELC interpretation:
Strong reads: E → C
Fast follower reads: E → L
Minimal pseudo-flows (easy mental model)
Paxos (single-decree) sketch
Proposer picks proposal n
Phase 1 (prepare):
send PREPARE(n) to quorum
receive PROMISE(n, accepted_n?, accepted_value?)
Choose value:
if any accepted_value returned, pick value with highest accepted_n
else pick own proposed value
Phase 2 (accept):
send ACCEPT(n, value) to quorum
if quorum replies ACCEPTED, value is chosen
Raft append (leader-based) sketch
Client -> Leader: write(command)
Leader:
append command to local log (uncommitted)
send AppendEntries(log entry) to followers
if replicated to majority:
mark committed
apply to state machine
reply success to client
Followers:
if AppendEntries consistent with previous log:
append entry
reply ok
else:
reject and help leader backtrack to matching prefix
So… when do you pick Paxos vs Raft?
Choose Raft when:
You want a simpler mental model and implementation story
You prefer explicit leadership and straightforward operational behavior
Your main challenge is reliability under crashes, restarts, delays—not adversarial nodes
Choose Paxos (or Multi-Paxos) when:
You’re working in an ecosystem where Paxos is already the foundational primitive
You’re using a system that is Paxos-based under the hood (many are)
You’re comfortable with the more “proofy” model, or you want its conceptual flexibility
In practice: many engineers “choose Raft” because it’s easier to reason about, and “use Paxos” because the platform they depend on already uses it.
Takeaways: what PACELC teaches you about consensus
Consensus is a deliberate choice to “buy C.”
You pay with latency and some availability during partitions.The E-side matters most day-to-day.
Your tail latency, quorum size, cross-AZ RTT, and read strategy define your “ELC” reality.Raft and Multi-Paxos live in similar PACELC territory.
Both generally choose PC during partitions and EC during normal operation.Your read strategy is your hidden PACELC dial.
Leader/quorum reads → stronger consistency.
Replica reads → better latency but staleness.
If you want, I can add a section that maps common data systems into PACELC categories (e.g., leader-based quorum replication vs multi-leader eventual stores) and show how the same product can shift along E→L vs E→C depending on configuration (quorum settings, read preferences, multi-region topology).
Comments
Post a Comment