Concurrency Control from First Principles
3 ways on a single node, 3 ways across multiple nodes — with real-life examples
Race conditions aren’t a “database problem” or a “threading problem”. They’re a physics-of-computing problem: two actors try to change the same thing, and time doesn’t give you a single, obvious order.
This blog explains concurrency using first principles, then maps that to the 3 most common strategies on a single node and the 3 most common strategies in a distributed (multi-node) system, with practical examples you can reuse.
First principles: what causes a race condition?
A race condition exists when all three are true:
Shared mutable state
Something can be changed (a DB row, cache entry, file, in-memory map).Concurrent actors
Two+ threads/processes/nodes can touch it “at the same time”.Non-atomic read → compute → write
The update is not a single indivisible step.
The classic shape is:
Read current state
Compute new state
Write new state
If two actors do this concurrently, you can violate invariants like:
“version must monotonically increase”
“balance can’t go below 0”
“a user’s latest phone number should be the last update we accepted”
Real-life analogy: two people editing the same Google Doc offline
Both download the document (read)
Both edit independently (compute)
Both upload (write)
If the system doesn’t reconcile properly, one person’s changes overwrite the other’s.
So the fundamental question becomes:
Where will we enforce a single order of updates (serialization)?
In the DB? In the app? In a coordinator? In a workflow?
Every solution is just a different answer to that question.
A concrete example we’ll use throughout
A USER row with a version column:
USER(id, name, phone, version)
Two requests hit at the same time:
Request A: change
name = "jating"Request B: change
name = "jatink"
Both target id = 1.
Part 1: Single node — 3 ways
On a single node, “actors” are typically threads (or async tasks) inside one process.
1) Row lock + transaction isolation
“Hold the key while you update”
First principle: Make the update mutually exclusive by locking the row, so only one transaction can modify it at a time.
Example (pessimistic row lock)
BEGIN;
SELECT id, name, phone, version
FROM user
WHERE id = 1
FOR UPDATE; -- lock the row
-- compute changes
UPDATE user
SET name = 'jatink'
WHERE id = 1;
COMMIT;
What you get
Correctness: strong (you serialize writers)
Behavior under contention: others block
Cost: lock waits, potential deadlocks (if you lock multiple rows in different orders)
Real-life analogy: single key for a storage room
If you need to change what’s inside the room, you take the only key.
Others wait until you return it.
When it’s a good fit
High contention on a small set of rows (“hot users”)
You need strict ordering and can keep the transaction short
When it hurts
Long transactions (locks held too long)
Many rows locked → deadlocks and throughput collapse
Practical rule: if you choose row locks, keep the critical section tiny (no remote calls, no long computations while holding locks).
2) MVCC / Optimistic locking
“Don’t block; detect conflicts and retry”
First principle: Assume conflicts are rare. Let everyone proceed concurrently, then reject updates that used stale data.
This is usually implemented with a version column.
Example (optimistic version check)
-- Read
SELECT id, name, phone, version
FROM user
WHERE id = 1; -- suppose version = 7
-- Write (only succeeds if version is still 7)
UPDATE user
SET name = 'jatink',
version = version + 1
WHERE id = 1
AND version = 7;
If UPDATE affects 0 rows → conflict happened → retry.
What you get
Correctness: strong (prevents lost updates)
Behavior under contention: no blocking, but retries
Cost: wasted work when conflicts are common
Real-life analogy: “Take a number” at a deli counter
You don’t block others from browsing.
But when it’s your turn, your ticket must still be valid. If the counter advanced, you might need a new ticket.
The hidden pitfall: retry storms
If the same row is updated constantly:
many threads read version 7
only one succeeds to version 8
everyone else retries → more load → more conflicts
This is why optimistic locking is amazing for read-heavy workloads and painful for hot write-heavy keys.
A sane retry policy (important)
Retry 3–5 times max
Fail fast after threshold (don’t melt your system)
Pseudo:
for attempt in 1..MAX:
try updateWithVersionCheck()
if success: return OK
sleep(base * 2^attempt + random_jitter)
return CONFLICT
3) Application-level pessimistic locking
“Serialize by key before you reach the database”
First principle: If the real contention is “same user ID updated repeatedly”, then enforce order in the app by ensuring only one update for that user runs at a time.
Common patterns:
Striped locks (fixed number of locks; hash(userId) → lock)
Per-key single-thread executor / queue (actor model)
In-memory or distributed queue partitioning
Example: striped lock (single JVM)
// 64 stripes: constant memory, avoids per-user lock map
Lock[] stripes = new ReentrantLock[64];
for (int i = 0; i < stripes.length; i++) stripes[i] = new ReentrantLock();
Lock lockFor(long userId) {
return stripes[(int)(Math.abs(userId) % stripes.length)];
}
void updateUser(long userId, Consumer<User> updater) {
Lock l = lockFor(userId);
l.lock();
try {
// read -> compute -> write safely for this userId stripe
User u = repo.get(userId);
updater.accept(u);
repo.save(u);
} finally {
l.unlock();
}
}
What you get
Correctness: strong (for that app instance)
Behavior under contention: one updater runs, others wait in-memory
Cost: doesn’t automatically protect you if you have multiple nodes (more on that below)
Real-life analogy: bouncer with a guest list
If you want to change your name on the list, the bouncer only handles one request per person at a time.
You don’t even get to the desk until you’re “your turn”.
When it’s a great fit
Hot-key updates (same user/session/account)
You control the write path (all updates go through this service)
You want to avoid DB lock pressure and retry storms
Why single node sometimes feels like “3-way contention”
On one node, you can easily have:
Thread A
Thread B
Thread C
all trying to update the same row.
With optimistic locking, all 3 may do work and then 2 fail → it feels like 3-way contention.
With pessimistic row locks, 1 runs and 2 block → it’s still 3 actors, but only one active at a time.
Part 2: Multi-node — 3 ways
Now the actors aren’t just threads. They are independent processes on different machines. The difficulty increases because:
clocks aren’t perfectly aligned
networks partition
“who holds the lock” is itself a distributed state problem
The first-principles question becomes:
How do we enforce a single order across nodes?
1) Distributed lock
“One lock everyone agrees on”
First principle: Put the “key” in a shared, strongly coordinated place (Redis, ZooKeeper, etcd, DB advisory lock). Only the node holding the lock may update.
Typical shape:
Acquire lock(userId)
if acquired:
update
release lock
else:
wait/retry/fail
Real-life analogy: booking a shared meeting room
Multiple offices can try to book the same room.
Everyone checks the same calendar system. Only one booking wins.
The big gotchas (must know)
Distributed locks can lie unless designed carefully:
Node gets lock, then pauses (GC / network hiccup)
lock expires
another node acquires lock
first node resumes and still thinks it owns the lock
Mitigation: fencing tokens
You attach a monotonically increasing token to each lock acquisition and make the DB reject writes with old tokens.
If you don’t have fencing (or equivalent), distributed locks are a common source of “it worked in staging” failures.
When to use
You truly need single-writer semantics across nodes
You can tolerate extra latency and operational complexity
2) Saga pattern
“Don’t lock globally; coordinate via steps + compensation”
First principle: For multi-service workflows, global locking kills scalability. Instead, make each step local and consistent, and handle failures with compensating actions.
Example saga for “create account + provision wallet + send welcome email”:
Create user in User Service
Provision wallet in Wallet Service
Send welcome email
If step 2 fails → compensate step 1 (delete/disable user), or mark as “pending”.
Real-life analogy: planning a trip
You book:
flight
hotel
cab
If the hotel booking fails, you cancel the flight (compensation) instead of trying to hold locks on all providers.
What you get
Scalability: high
Consistency: eventual (not immediate)
Cost: complexity in compensation, idempotency, and observability
When to use
Cross-service workflows
Long-running processes
You can live with eventual consistency
3) Two-phase commit (2PC)
“All participants agree to commit or none do”
First principle: Achieve atomic commit across multiple participants by introducing a coordinator:
Prepare phase: “Can you commit?” (participants lock resources and reply yes/no)
Commit phase: If all said yes, coordinator tells everyone to commit
Real-life analogy: escrow for a house purchase
The deal closes only when:
buyer’s funds are ready
seller’s documents are ready
legal checks pass
Everyone commits together, or nobody does.
The trade-offs
Strong consistency, but…
Coordinator failure can block progress
High latency
Poor scalability for high-throughput systems
When to use
Rare cases where strict atomicity is mandatory and throughput is not the main goal
(think: some financial/ledger systems)
Why multi-node often feels like “2-way contention”
In many distributed data systems (distributed DBs / caches), each key has a primary owner. Writes effectively serialize at the owner.
So contention looks like:
many clients
one owner coordinating
→ “clients vs owner” feels like 2-way.
A practical decision guide
Ask these questions in order:
1) Is this a single entity update or a multi-entity workflow?
Single row/entity → start with optimistic/pessimistic/app-level
Multi-service workflow → saga or (rarely) 2PC
2) What’s the contention profile?
Mostly reads, rare conflicts → optimistic (MVCC)
Frequent conflicting writes on same key → pessimistic row lock or app-level serialization
Hot-key across nodes → distributed lock or partitioned queue/owner model
3) What consistency do users/business require?
Must be immediate and strict → locking / 2PC
Can be eventual (with compensation) → saga
Cheat sheet
Single node
Row lock + transaction isolation: “block others; strong consistency; watch lock time”
MVCC / optimistic: “no blocking; retry on conflict; great for read-heavy”
App-level lock/queue: “serialize by key; best for hot keys; must consider multi-node”
Multiple nodes
Distributed lock: “single key across nodes; add fencing; operationally heavier”
Saga: “no global locks; eventual consistency; compensation required”
2PC: “strongest atomicity; slow and can block; use sparingly”
Closing: the one idea to remember
Every race condition solution is just:
Choose where you serialize updates.
Then pay the price (blocking, retries, coordination overhead, or compensation complexity).
If you pick the serialization point intentionally—and align it with your contention profile—you’ll get both correctness and performance. If you don’t, contention will pick it for you… usually at the worst possible place.
If you want, I can also:
add diagrams (sequence diagrams for each strategy),
include a full Spring/JPA example with optimistic locking + retry/backoff,
or show a production-grade “striped executor” implementation for per-user ordering.
Comments
Post a Comment