PACELC, Paxos, and Raft: How “Consistency vs Latency” Shows Up in Real Systems

Distributed systems design is mostly about choosing which pain you’re willing to live with—because you can’t eliminate it. The PACELC theorem is a practical lens for those choices, and Paxos and Raft are two of the most important tools engineers use when they decide “we’re going to pay latency to buy correctness.”

This post ties them together:

  • PACELC tells you what trade-off you’re making

  • Paxos/Raft are two ways to implement the “consistent” side of that trade-off

  • You’ll see concrete examples, message flows, and how partitions change behavior


Why CAP isn’t enough, and why PACELC exists

You likely know the CAP-style story:

  • Partition happens → you must choose Consistency or Availability.

The missing piece is: most of the time you’re not partitioned—you’re just dealing with latency, replication delay, tail latencies, and node slowness.

PACELC adds the everyday reality:

  • If there is a Partition (P): you choose Availability (A) or Consistency (C)

  • Else (E) (normal operation, no partition): you choose Latency (L) or Consistency (C)

That “ELC” half is what you feel in production:

  • “Do we wait for quorum to acknowledge the write?”

  • “Do we allow stale reads from a local replica for speed?”

  • “Do we block requests during leader failover?”

Those are ELC decisions.


A simple example PACELC makes obvious

Imagine a service with 3 replicas: A, B, C.

Else case (no partition): latency vs consistency

A client writes balance=100.

You have two broad choices:

  1. Low latency (EL)
    Acknowledging after only the local node writes:

    • Write is fast

    • But another client reading from a different replica might still see the old value for a bit (eventual consistency)

  2. High consistency (EC)
    Acknowledging only after a majority confirms (quorum):

    • Write is slower (extra network round(s))

    • But any read that respects the quorum rules can be made strongly consistent

Partition case (P): availability vs consistency

Now suppose a partition splits the cluster:

  • Side 1: {A}

  • Side 2: {B, C}

If a client reaches A, can A accept writes?

  • If Availability is the priority: A might accept writes anyway → but B/C might accept conflicting writes → you’ll reconcile later (and you’re no longer strongly consistent).

  • If Consistency is the priority: A rejects writes because it can’t reach a quorum → system remains consistent, but requests fail.

So PACELC makes your real choice explicit:

  • During P: do you error out or diverge?

  • During E: do you wait (latency) or allow staleness?


Where consensus fits: Paxos/Raft are “choose C” engines

Consensus algorithms exist for one main reason:

Make a replicated system behave like there is a single, consistent order of operations—even when nodes crash and networks are unreliable.

Paxos and Raft are typically used to build a replicated log (a sequence of commands). Every replica applies the same commands in the same order → the state stays consistent.

This usually places systems using them in the CP (in partitions) and EC (in normal operations) region:

  • P → C: if quorum isn’t possible, stop (or degrade) rather than diverge

  • E → C over L: pay extra rounds to ensure ordering/commit rules


Paxos: consensus without a permanent leader (but often used with one)

Paxos can look abstract, but the core idea is clean:

  • Multiple nodes might propose values

  • The system must pick one value

  • It must remain safe even with node crashes and message delays

Roles

Two phases (single-decree Paxos)

Phase 1: Prepare / Promise

  1. Proposer picks a proposal number n (unique, increasing).

  2. Proposer sends PREPARE(n) to a majority of acceptors.

  3. Each acceptor replies with a promise:

    • “I won’t accept proposals less than n

    • and includes the highest-numbered proposal/value it already accepted (if any)

Phase 2: Accept / Accepted

  1. If proposer gets promises from a majority, it sends ACCEPT(n, value):

    • If any acceptor reported a previously accepted value, proposer must use the value with the highest proposal number it heard about.

  2. Acceptors accept it (if they haven’t promised a higher n) and reply ACCEPTED.

Once a majority has accepted the same (n, value), the value is chosen.

Why this works (intuitive)

Majorities intersect. Any two majorities share at least one acceptor. That overlap prevents two different values from both being safely “chosen” without contradiction.

The “real systems” Paxos: Multi-Paxos

Single-decree Paxos chooses one value once. Real systems need a sequence of values (a log).

Multi-Paxos is the common practical variant:

  • You run Phase 1 once to establish a stable leader (often called a “distinguished proposer”)

  • Then for each new log entry, you mainly run Phase 2 (accept) repeatedly

This is why people say Paxos “is leader-based in practice” even if the theory doesn’t require a single permanent leader.

Paxos + PACELC

  • Else (E): Multi-Paxos pays latency to replicate to quorum before commit → EC

  • Partition (P): if the leader can’t reach quorum, it can’t safely commit → choose PC


Raft: the “engineer-friendly” replicated log

Raft was designed to be easier to understand and implement correctly than Paxos, while achieving the same end goal: a consistent replicated log.

Raft’s big design move is making leadership explicit and central:

  • There is a Leader

  • Others are Followers

  • A Candidate appears during elections

Raft divides the problem into:

  1. Leader election

  2. Log replication

  3. Safety rules

1) Leader election (how the cluster chooses a leader)

  • Followers expect periodic heartbeats from a leader.

  • If a follower times out, it becomes a candidate and requests votes.

  • If it gets a majority, it becomes leader.

Key idea: randomized election timeouts reduce split votes.

2) Log replication (how writes get committed)

When a client sends a write:

  1. Leader appends the command as a new log entry locally.

  2. Leader sends AppendEntries RPCs to followers with the new entry.

  3. Followers append if consistent with their previous log.

  4. Once leader sees the entry replicated to a majority, it marks it committed.

  5. Leader tells followers the commit index; everyone applies committed entries to state machine.

3) Safety (why the log stays consistent)

Raft enforces critical rules, like:

  • Only entries replicated on a majority can be committed

  • Leaders are elected in a way that prevents a leader with an outdated log from being chosen (log up-to-date check during voting)

  • Followers reject entries that would create inconsistencies (conflict detection + backtracking)

A concrete Raft example (5 nodes)

Nodes: A, B, C, D, E
Majority = 3

Client write arrives at leader A:

  • A appends entry x=7 at index 10

  • A sends AppendEntries to B, C, D, E

  • B and C succeed; D is slow; E is down

  • A has A+B+C = 3 replicas → entry is committed

  • A can safely respond to client

  • Later, D catches up; E recovers and catches up

Raft + PACELC

Raft’s default posture is the same as Multi-Paxos:

  • Else (E): commit waits for quorum replication → you pay network latency → EC

  • Partition (P): minority partition can’t form quorum, so it can’t accept commits → you lose availability but preserve safety → PC


Paxos vs Raft through the PACELC lens

Under partition (P): what happens?

Both Raft and practical Paxos variants tend to behave like:

  • Only the partition that can assemble a quorum can make progress

  • The minority side will reject writes (and often reads that require linearizability)

So they usually choose Consistency over Availability during partitions.

Without partition (E): where do they spend latency?

Both typically spend latency on:

  • quorum replication

  • leader coordination

  • (sometimes) quorum reads for strong read semantics

So by default they choose Consistency over Latency, though you can tune read strategies.


Reads matter: your “ELC” choice often hides in read paths

A lot of teams think “writes go through consensus, so we’re consistent.”
But reads decide whether you’re truly EC or drifting toward EL.

Common patterns:

Strong reads (consistent, higher latency)

  • Read from leader only (and ensure leader is still valid)

  • Or use quorum-based read mechanisms (implementation-dependent)

  • In Raft-like systems, a “read barrier” approach ensures leader has committed up-to-date info before serving linearizable reads

Fast reads (lower latency, potentially stale)

  • Read from nearest follower

  • Allow follower reads without strict coordination

  • Great for latency, but you have to accept staleness windows and tricky edge cases after failover

PACELC interpretation:

  • Strong reads: E → C

  • Fast follower reads: E → L


Minimal pseudo-flows (easy mental model)

Paxos (single-decree) sketch

Proposer picks proposal n

Phase 1 (prepare):
  send PREPARE(n) to quorum
  receive PROMISE(n, accepted_n?, accepted_value?)

Choose value:
  if any accepted_value returned, pick value with highest accepted_n
  else pick own proposed value

Phase 2 (accept):
  send ACCEPT(n, value) to quorum
  if quorum replies ACCEPTED, value is chosen

Raft append (leader-based) sketch

Client -> Leader: write(command)

Leader:
  append command to local log (uncommitted)
  send AppendEntries(log entry) to followers
  if replicated to majority:
      mark committed
      apply to state machine
      reply success to client

Followers:
  if AppendEntries consistent with previous log:
      append entry
      reply ok
  else:
      reject and help leader backtrack to matching prefix

So… when do you pick Paxos vs Raft?

Choose Raft when:

  • You want a simpler mental model and implementation story

  • You prefer explicit leadership and straightforward operational behavior

  • Your main challenge is reliability under crashes, restarts, delays—not adversarial nodes

Choose Paxos (or Multi-Paxos) when:

  • You’re working in an ecosystem where Paxos is already the foundational primitive

  • You’re using a system that is Paxos-based under the hood (many are)

  • You’re comfortable with the more “proofy” model, or you want its conceptual flexibility

In practice: many engineers “choose Raft” because it’s easier to reason about, and “use Paxos” because the platform they depend on already uses it.


Takeaways: what PACELC teaches you about consensus

  1. Consensus is a deliberate choice to “buy C.”
    You pay with latency and some availability during partitions.

  2. The E-side matters most day-to-day.
    Your tail latency, quorum size, cross-AZ RTT, and read strategy define your “ELC” reality.

  3. Raft and Multi-Paxos live in similar PACELC territory.
    Both generally choose PC during partitions and EC during normal operation.

  4. Your read strategy is your hidden PACELC dial.
    Leader/quorum reads → stronger consistency.
    Replica reads → better latency but staleness.


If you want, I can add a section that maps common data systems into PACELC categories (e.g., leader-based quorum replication vs multi-leader eventual stores) and show how the same product can shift along E→L vs E→C depending on configuration (quorum settings, read preferences, multi-region topology).

Comments

Popular posts from this blog

CAP Theorem, Explained: Why Distributed Systems Can’t Have It All

Concurrency Control from First Principles

Tomcat vs Jetty vs GlassFish vs Quarkus — A Deep, Story-Driven Guide (with Eureka)