3 ways on a single node, 3 ways across multiple nodes — with real-life examples

Race conditions aren’t a “database problem” or a “threading problem”. They’re a physics-of-computing problem: two actors try to change the same thing, and time doesn’t give you a single, obvious order.

This blog explains concurrency using first principles, then maps that to the 3 most common strategies on a single node and the 3 most common strategies in a distributed (multi-node) system, with practical examples you can reuse.

First principles: what causes a race condition?

A race condition exists when all three are true:

Shared mutable state
Something can be changed (a DB row, cache entry, file, in-memory map).
Concurrent actors
Two+ threads/processes/nodes can touch it “at the same time”.
Non-atomic read → compute → write
The update is not a single indivisible step.

The classic shape is:

Read current state
Compute new state
Write new state

If two actors do this concurrently, you can violate invariants like:

“version must monotonically increase”
“balance can’t go below 0”
“a user’s latest phone number should be the last update we accepted”

Real-life analogy: two people editing the same Google Doc offline

Both download the document (read)
Both edit independently (compute)
Both upload (write)
If the system doesn’t reconcile properly, one person’s changes overwrite the other’s.

So the fundamental question becomes:

Where will we enforce a single order of updates (serialization)?
In the DB? In the app? In a coordinator? In a workflow?

Every solution is just a different answer to that question.

A concrete example we’ll use throughout

A USER row with a version column:

USER(id, name, phone, version)

Two requests hit at the same time:

Request A: change name = "jating"
Request B: change name = "jatink"

Both target id = 1.

Part 1: Single node — 3 ways

On a single node, “actors” are typically threads (or async tasks) inside one process.

1) Row lock + transaction isolation

“Hold the key while you update”

First principle: Make the update mutually exclusive by locking the row, so only one transaction can modify it at a time.

Example (pessimistic row lock)

BEGIN;

SELECT id, name, phone, version
FROM user
WHERE id = 1
FOR UPDATE;   -- lock the row

-- compute changes

UPDATE user
SET name = 'jatink'
WHERE id = 1;

COMMIT;

What you get

Correctness: strong (you serialize writers)
Behavior under contention: others block
Cost: lock waits, potential deadlocks (if you lock multiple rows in different orders)

Real-life analogy: single key for a storage room

If you need to change what’s inside the room, you take the only key.
Others wait until you return it.

When it’s a good fit

High contention on a small set of rows (“hot users”)
You need strict ordering and can keep the transaction short

When it hurts

Long transactions (locks held too long)
Many rows locked → deadlocks and throughput collapse

Practical rule: if you choose row locks, keep the critical section tiny (no remote calls, no long computations while holding locks).

2) MVCC / Optimistic locking

“Don’t block; detect conflicts and retry”

First principle: Assume conflicts are rare. Let everyone proceed concurrently, then reject updates that used stale data.

This is usually implemented with a version column.

Example (optimistic version check)

-- Read
SELECT id, name, phone, version
FROM user
WHERE id = 1;          -- suppose version = 7

-- Write (only succeeds if version is still 7)
UPDATE user
SET name = 'jatink',
    version = version + 1
WHERE id = 1
  AND version = 7;

If UPDATE affects 0 rows → conflict happened → retry.

What you get

Correctness: strong (prevents lost updates)
Behavior under contention: no blocking, but retries
Cost: wasted work when conflicts are common

Real-life analogy: “Take a number” at a deli counter

You don’t block others from browsing.
But when it’s your turn, your ticket must still be valid. If the counter advanced, you might need a new ticket.

The hidden pitfall: retry storms

If the same row is updated constantly:

many threads read version 7
only one succeeds to version 8
everyone else retries → more load → more conflicts

This is why optimistic locking is amazing for read-heavy workloads and painful for hot write-heavy keys.

A sane retry policy (important)

Retry 3–5 times max
Exponential backoff + jitter
Fail fast after threshold (don’t melt your system)

Pseudo:

for attempt in 1..MAX:
  try updateWithVersionCheck()
  if success: return OK
  sleep(base * 2^attempt + random_jitter)
return CONFLICT

3) Application-level pessimistic locking

“Serialize by key before you reach the database”

First principle: If the real contention is “same user ID updated repeatedly”, then enforce order in the app by ensuring only one update for that user runs at a time.

Common patterns:

Striped locks (fixed number of locks; hash(userId) → lock)
Per-key single-thread executor / queue (actor model)
In-memory or distributed queue partitioning

Example: striped lock (single JVM)

// 64 stripes: constant memory, avoids per-user lock map
Lock[] stripes = new ReentrantLock[64];
for (int i = 0; i < stripes.length; i++) stripes[i] = new ReentrantLock();

Lock lockFor(long userId) {
  return stripes[(int)(Math.abs(userId) % stripes.length)];
}

void updateUser(long userId, Consumer<User> updater) {
  Lock l = lockFor(userId);
  l.lock();
  try {
    // read -> compute -> write safely for this userId stripe
    User u = repo.get(userId);
    updater.accept(u);
    repo.save(u);
  } finally {
    l.unlock();
  }
}

What you get

Correctness: strong (for that app instance)
Behavior under contention: one updater runs, others wait in-memory
Cost: doesn’t automatically protect you if you have multiple nodes (more on that below)

Real-life analogy: bouncer with a guest list

If you want to change your name on the list, the bouncer only handles one request per person at a time.
You don’t even get to the desk until you’re “your turn”.

When it’s a great fit

Hot-key updates (same user/session/account)
You control the write path (all updates go through this service)
You want to avoid DB lock pressure and retry storms

Why single node sometimes feels like “3-way contention”

On one node, you can easily have:

Thread A
Thread B
Thread C
all trying to update the same row.

With optimistic locking, all 3 may do work and then 2 fail → it feels like 3-way contention.

With pessimistic row locks, 1 runs and 2 block → it’s still 3 actors, but only one active at a time.

Part 2: Multi-node — 3 ways

Now the actors aren’t just threads. They are independent processes on different machines. The difficulty increases because:

clocks aren’t perfectly aligned
networks partition
“who holds the lock” is itself a distributed state problem

The first-principles question becomes:

How do we enforce a single order across nodes?

1) Distributed lock

“One lock everyone agrees on”

First principle: Put the “key” in a shared, strongly coordinated place (Redis, ZooKeeper, etcd, DB advisory lock). Only the node holding the lock may update.

Typical shape:

Acquire lock(userId)
  if acquired:
    update
    release lock
  else:
    wait/retry/fail

Real-life analogy: booking a shared meeting room

Multiple offices can try to book the same room.
Everyone checks the same calendar system. Only one booking wins.

The big gotchas (must know)

Distributed locks can lie unless designed carefully:

Node gets lock, then pauses (GC / network hiccup)
lock expires
another node acquires lock
first node resumes and still thinks it owns the lock

Mitigation: fencing tokens
You attach a monotonically increasing token to each lock acquisition and make the DB reject writes with old tokens.

If you don’t have fencing (or equivalent), distributed locks are a common source of “it worked in staging” failures.

When to use

You truly need single-writer semantics across nodes
You can tolerate extra latency and operational complexity

2) Saga pattern

“Don’t lock globally; coordinate via steps + compensation”

First principle: For multi-service workflows, global locking kills scalability. Instead, make each step local and consistent, and handle failures with compensating actions.

Example saga for “create account + provision wallet + send welcome email”:

Create user in User Service
Provision wallet in Wallet Service
Send welcome email

If step 2 fails → compensate step 1 (delete/disable user), or mark as “pending”.

Real-life analogy: planning a trip

You book:

flight
hotel
cab

If the hotel booking fails, you cancel the flight (compensation) instead of trying to hold locks on all providers.

What you get

Scalability: high
Consistency: eventual (not immediate)
Cost: complexity in compensation, idempotency, and observability

When to use

Cross-service workflows
Long-running processes
You can live with eventual consistency

3) Two-phase commit (2PC)

“All participants agree to commit or none do”

First principle: Achieve atomic commit across multiple participants by introducing a coordinator:

Prepare phase: “Can you commit?” (participants lock resources and reply yes/no)
Commit phase: If all said yes, coordinator tells everyone to commit

Real-life analogy: escrow for a house purchase

The deal closes only when:

buyer’s funds are ready
seller’s documents are ready
legal checks pass
Everyone commits together, or nobody does.

The trade-offs

Strong consistency, but…
Coordinator failure can block progress
High latency
Poor scalability for high-throughput systems

When to use

Rare cases where strict atomicity is mandatory and throughput is not the main goal
(think: some financial/ledger systems)

Why multi-node often feels like “2-way contention”

In many distributed data systems (distributed DBs / caches), each key has a primary owner. Writes effectively serialize at the owner.

So contention looks like:

many clients
one owner coordinating
→ “clients vs owner” feels like 2-way.

A practical decision guide

Ask these questions in order:

1) Is this a single entity update or a multi-entity workflow?

Single row/entity → start with optimistic/pessimistic/app-level
Multi-service workflow → saga or (rarely) 2PC

2) What’s the contention profile?

Mostly reads, rare conflicts → optimistic (MVCC)
Frequent conflicting writes on same key → pessimistic row lock or app-level serialization
Hot-key across nodes → distributed lock or partitioned queue/owner model

3) What consistency do users/business require?

Must be immediate and strict → locking / 2PC
Can be eventual (with compensation) → saga

Cheat sheet

Single node

Row lock + transaction isolation: “block others; strong consistency; watch lock time”
MVCC / optimistic: “no blocking; retry on conflict; great for read-heavy”
App-level lock/queue: “serialize by key; best for hot keys; must consider multi-node”

Multiple nodes

Distributed lock: “single key across nodes; add fencing; operationally heavier”
Saga: “no global locks; eventual consistency; compensation required”
2PC: “strongest atomicity; slow and can block; use sparingly”

Closing: the one idea to remember

Every race condition solution is just:

Choose where you serialize updates.
Then pay the price (blocking, retries, coordination overhead, or compensation complexity).

If you pick the serialization point intentionally—and align it with your contention profile—you’ll get both correctness and performance. If you don’t, contention will pick it for you… usually at the worst possible place.

If you want, I can also:

add diagrams (sequence diagrams for each strategy),
include a full Spring/JPA example with optimistic locking + retry/backoff,
or show a production-grade “striped executor” implementation for per-user ordering.

Concurrency Control from First Principles

First principles: what causes a race condition?

Real-life analogy: two people editing the same Google Doc offline

A concrete example we’ll use throughout

Part 1: Single node — 3 ways

1) Row lock + transaction isolation

“Hold the key while you update”

Example (pessimistic row lock)

What you get

Real-life analogy: single key for a storage room

When it’s a good fit

When it hurts

2) MVCC / Optimistic locking

“Don’t block; detect conflicts and retry”

Example (optimistic version check)

What you get

Real-life analogy: “Take a number” at a deli counter

The hidden pitfall: retry storms

A sane retry policy (important)

3) Application-level pessimistic locking

“Serialize by key before you reach the database”

Example: striped lock (single JVM)

What you get

Real-life analogy: bouncer with a guest list

When it’s a great fit

Why single node sometimes feels like “3-way contention”

Part 2: Multi-node — 3 ways

1) Distributed lock

“One lock everyone agrees on”

Real-life analogy: booking a shared meeting room

The big gotchas (must know)

When to use

2) Saga pattern

“Don’t lock globally; coordinate via steps + compensation”

Real-life analogy: planning a trip

What you get

When to use

3) Two-phase commit (2PC)

“All participants agree to commit or none do”

Real-life analogy: escrow for a house purchase

The trade-offs

When to use

Why multi-node often feels like “2-way contention”

A practical decision guide

1) Is this a single entity update or a multi-entity workflow?

2) What’s the contention profile?

3) What consistency do users/business require?

Cheat sheet

Single node

Multiple nodes

Closing: the one idea to remember

Comments

Post a Comment

Popular posts from this blog

CAP Theorem, Explained: Why Distributed Systems Can’t Have It All

Concurrency Control from First Principles

Ensuring Missing Resources Are Created Automatically in a Spring Boot Project