Concurrency Control from First Principles

3 Strategies on a Single Node, 3 Across Multiple Nodes — with Real-Life Analogies and Future-Ready Patterns

Credits / Acknowledgements
This article is based on deep technical discussions and whiteboarding sessions with Sourabh Kumar Banka and Jatin Goyal.

Why This Matters (Now and in the Future)

Most real production failures are not caused by wrong business logic. They are caused by incorrect ordering of updates.

As systems scale — microservices, distributed caches, cloud-native deployments, async retries, autoscaling — concurrency issues increase, not decrease.

If you remember only one thing from this article:

Concurrency control is about deciding where updates become ordered (serialized) — and intentionally paying the right trade-off.

Every correct system enforces order somewhere:

Database
Application
Distributed coordinator
Event log
Workflow engine

If you don’t choose where, contention will choose for you.

First Principles: Why Race Conditions Exist

A race condition requires three ingredients:

Shared mutable state
A database row, cache entry, account balance, session object.
Concurrent actors
Threads, processes, containers, retries, background jobs.
Non-atomic update
The operation happens as:

Read → Compute → Write

When two actors execute this pattern simultaneously, invariants break.

Concrete Example

USER(id, name, phone, version)

Two requests update id = 1:

Request A → name = "jating"
Request B → name = "jatink"

If both read "jatin" and both write, one update is lost.

Real-Life Analogy

Two people editing the same document offline:

Both download it.
Both edit.
Both upload.

Without coordination or version checks, one set of changes disappears.

Part 1 — Single Node: 3 Core Strategies

On a single node, concurrency usually means multiple threads inside one service instance.

1️⃣ Row Locks + Transaction Isolation

“Hold the key while you update”

First principle: Make the update mutually exclusive by locking the row.

BEGIN;

SELECT *
FROM user
WHERE id = 1
FOR UPDATE;

UPDATE user
SET name = 'jatink'
WHERE id = 1;

COMMIT;

What Happens

First transaction locks row.
Others wait.
Updates are serialized.

Real-Life Analogy

There is only one key to a secure room.
If you want to rearrange it, you must hold the key. Others wait.

Pros

Strong correctness
No retries needed
Simple mental model

Cons

Blocking under contention
Deadlocks if multiple rows locked
Throughput degrades with long transactions

Future-Ready Guidance

Keep critical sections extremely short.
Never hold DB locks while calling external systems.
Monitor lock wait time as a first-class metric.

2️⃣ MVCC / Optimistic Locking

“Don’t block; detect conflicts”

First principle: Assume conflicts are rare. Allow concurrent execution and detect conflict at write time.

SELECT id, version
FROM user
WHERE id = 1;

UPDATE user
SET name = 'jatink',
    version = version + 1
WHERE id = 1
  AND version = 7;

If zero rows updated → conflict → retry or fail.

Real-Life Analogy

Google Docs warns:

“This file was modified since you opened it.”

You merge or retry.

Pros

No blocking
High throughput for read-heavy systems
Scales well when contention is low

Cons

Retry storms under high contention
Wasted CPU work
Can overload DB if misused

Production-Ready Retry Strategy

Max 3–5 retries
Exponential backoff
Add jitter
Return conflict after threshold

Future-Ready Guidance

Track conflict rate as a metric.
Implement idempotency keys for external effects.
Avoid optimistic locking for hot keys with high write frequency.

3️⃣ Application-Level Serialization

“Single writer per key”

First principle: Serialize updates before reaching the database.

Approaches:

Striped locks (hash(key) → lock)
Actor model (per-key queue)
Partitioned executor

Conceptually:

hash(userId) → partition → single worker → DB write

Real-Life Analogy

A dedicated clerk handles all changes for a specific account.
No two clerks edit the same account simultaneously.

Pros

Eliminates retry storms
Reduces DB lock pressure
Predictable ordering
Stable latency under contention

Cons

Works only per node unless combined with routing
Requires architectural discipline

Future-Ready Guidance

To scale across nodes:

Use consistent hashing + sticky routing
Or partition via an event log (Kafka-style)
Or assign key ownership per node

This pattern is highly scalable and often superior to DB locking for hot entities.

Part 2 — Multi-Node Systems: 3 Distributed Strategies

When multiple nodes are involved, concurrency is harder:

Nodes crash
Networks partition
Clocks drift
GC pauses happen

Now the question becomes:

How do we enforce ordering across machines?

1️⃣ Distributed Lock

“Shared key everyone agrees on”

Use Redis/etcd/ZooKeeper/DB advisory locks.

Acquire lock(key)
  if success → update
  else → wait/retry/fail

Real-Life Analogy

Shared meeting room booking calendar.
Everyone consults the same system.

Critical Future-Proof Detail: Fencing Tokens

Locks can expire. Nodes can pause.

To avoid stale writers:

Each lock acquisition returns a monotonically increasing token.
Database rejects writes with older tokens.

Without fencing, distributed locks can corrupt data.

Use When

You need single-writer semantics across nodes.
Contention is moderate.

Avoid When

Ultra high throughput needed.
Lock server becomes bottleneck.

2️⃣ Saga Pattern

“Commit locally, compensate globally”

First principle: Instead of locking everything, break workflow into steps and undo on failure.

Example:

Create user
Provision wallet
Send email

If wallet fails → disable user (compensation).

Real-Life Analogy

Booking travel.
If hotel fails, cancel flight.

Production-Ready Saga Requirements

Idempotent steps
Outbox pattern for reliable events
Deduplication / inbox pattern
Clear state transitions

Use When

Cross-service workflows
Long-running operations
Eventual consistency acceptable

3️⃣ Two-Phase Commit (2PC)

“All commit or none commit”

Coordinator asks all participants:

Prepare?
Commit.

Real-Life Analogy

Escrow closing: funds and documents must align.

Pros

Strong atomicity

Cons

Blocking protocol
Coordinator failure risk
High latency
Poor scalability at scale

Use Sparingly

Only where strict atomicity is legally or financially required.

Making This Future-Ready

Modern distributed systems introduce additional challenges:

1️⃣ Idempotency Everywhere

Every external call must be safe to retry.
Use idempotency keys.

2️⃣ Observability

Track:

Lock wait time
Conflict rate
Retry count
Deadlocks
Saga compensations

Concurrency problems hide without metrics.

3️⃣ Backpressure and Load Shedding

Unbounded retries destroy systems.
Apply limits and fail gracefully.

4️⃣ Partitioned Ownership (Highly Scalable Model)

Instead of global locking:

Assign key ownership.
Route updates by consistent hashing.
Treat partitions as single-writer streams.

This model scales horizontally and avoids distributed locking.

5️⃣ Consider CRDTs (When Applicable)

For some data types (counters, sets, collaborative docs), conflict-free replicated data types remove the need for coordination entirely.

But they require careful domain modeling.

Decision Framework

Single Node

Situation	Strategy
Low contention	Optimistic locking
High contention	Row lock
Hot key pattern	App-level serialization

Multiple Nodes

Situation	Strategy
Need strict mutual exclusion	Distributed lock (+ fencing)
Business workflow	Saga
Strong atomic commit required	2PC

Final Takeaway

Every concurrency solution enforces ordering somewhere:

Database
Application
Coordinator
Partitioned log
Workflow engine

Your job is not to eliminate contention.

Your job is to decide:

Where ordering lives
What trade-off you accept
How the system behaves under extreme load

Design intentionally — or production traffic will design it for you.

If you'd like, I can convert this into:

A polished Medium/LinkedIn article version
A PDF-ready version
Or add architecture diagrams and code samples for a specific tech stack