Hello curious engineers, welcome to the twenty-fifth issue of The Main Thread. In this issue, we are going to look at one of the most useful but hardly understood pattern in distributed systems: Idempotency.
Always remember, “just retry“ only works if our operation is idempotent. During the span of my career, I have lost count of how many production incidents I have traced back to this single misunderstanding.
A network blip causes a timeout → the client retries → the server processes the request twice → the customer get charged twice/order ships twice/database row corrupted because an increment ran twice.
“But the request failed, the client had to retry“ is not a solid defence, in my opinion. Yes, the client needed to retry but that should have been safe. It was not safe because no one thought about idempotency.
This is not an advanced distributed systems concept. It’s table stakes. If the system handles money, inventory, user data, or anything that can’t be corrupted, idempotency needs to be understood deeply.
Let’s make sure we do.
What Idempotency Actually Means
An operation is idempotent if performing it multiple times has the same effect as performing it once.
Simple, right? It is not. Let me show you with examples that trip even experienced engineers.
Idempotent
SET user.email = "[email protected]"If we run it once or run it ten times, user.email will still be [email protected] . Same result regardless of how many times we execute.
Not Idempotent
INCREMENT user.login_count BY 1If we run it once, count goes from 5 to 6, run it twice, count is 7. Run it ten times, count is 15. Each execution changes the state.
Surprisingly Not Idempotent
INSERT INTO orders (user_id, product_id, quantity) VALUES (123, 54352, 3);If we run it twice, we get two orders. The second execution isn’t a no-op: it creates a duplicate.
Surprisingly Idempotent
DELETE FROM sessions WHERE user_id = 123If we run it once, sessions of user 123 are deleted. If we run it 10 times, sessions are still deleted. The second through tenth operations are no-op because there is nothing left to delete.
Always think idempotency is about the effect on system state, not about the operation itself. HTTP delete is idempotent because repeated calls leave the system in same state.
Why This Matters In Distributed Systems
In a distributed system, we cannot distinguish between the following three scenarios:
Request succeeded, response lost
Request failed before processing
Request failed after processing
When a client times out, it doesn’t know which happened. The server might have processed the payment or might not have. The only safe option is to retry but retrying is only safe if the operation is idempotent.

The client sees success on the retry. It has no idea the operation ran twice. Without idempotency, we have just doubled the effect of that operation.
Idempotency Keys: The Solution
An idempotency key is a unique identifier attached to each operation that lets the server recognize and deduplicate retries.
POST /payments
Idempotency-Key: 8a7b6c5d-4e3f-2g1h-0i9j-8k7l6m5n4o3p
Content-Type: application/json
{
"amount": 100.00,
"currency": "USD",
"customer_id": "cust_123"
}The server will follow the following logic:
Receives request with idempotency key
Check if this key has been seen before
If yes, return the stored response (don't process again)
If no, process the request, store the response and return it
import hashlib
import json
from datetime import datetime, timedelta
class IdempotencyStore:
"""
Store for idempotency keys and their results.
In production, use Redis or database, not in-memory.
"""
def __init__(self, ttl_hours: int = 24):
self.store = {}
self.ttl = timedelta(hours=ttl_hours)
def get(self, key: str) -> dict | None:
"""Retrieve stored result for idempotency key."""
entry = self.store.get(key)
if entry is None:
return None
# Check expiration
if datetime.utcnow() > entry['expires_at']:
del self.store[key]
return None
return entry['result']
def set(self, key: str, result: dict) -> None:
"""Store result for idempotency key."""
self.store[key] = {
'result': result,
'expires_at': datetime.utcnow() + self.ttl
}
def set_in_progress(self, key: str) -> bool:
"""
Mark key as in-progress. Returns False if already exists.
Prevents concurrent duplicate processing.
"""
if key in self.store:
return False
self.store[key] = {
'result': None, # In progress
'expires_at': datetime.utcnow() + timedelta(minutes=5)
}
return True
def handle_payment(request, idempotency_key: str, store: IdempotencyStore):
"""
Idempotent payment handler.
"""
# Check for existing result
existing = store.get(idempotency_key)
if existing is not None:
if existing.get('status') == 'in_progress':
return {"error": "Request in progress"}, 409
return existing, 200 # Return cached response
# Mark as in progress (atomic check-and-set)
if not store.set_in_progress(idempotency_key):
return {"error": "Duplicate request"}, 409
try:
# Process the payment
result = process_payment(request)
# Store the result
store.set(idempotency_key, result)
return result, 200
except Exception as e:
# On failure, remove the in-progress marker
# so client can retry with same key
store.delete(idempotency_key)
raiseKey Design Principles
1. Who generates the key?
The client generates the key. Always. If the server generates the key, the client can’t safely retry because it won’t know the key for its lost request.
2. What makes a good key?
Following are three options to generate a good idempotency key
# Option 1: UUID (simple, unique, no semantic meaning)
idempotency_key = str(uuid.uuid4())
# Option 2: Hash of request parameters (same params = same key)
key_data = f"{user_id}:{action}:{amount}:{timestamp_minute}"
idempotency_key = hashlib.sha256(key_data.encode()).hexdigest()
# Option 3: Client-provided transaction ID (business meaning)
idempotency_key = f"order-{order_id}-payment-attempt"3. How long to store keys?
Long enough that retries are covered, short enough that storage is manageable. 24-48 hours is typical. After that, we assume the client would have noticed the issue and raised a support ticket.
4. What to store with the keys?
Response (so that we can return it on retries)
Request hash (optional: to detect different requests reusing a key)
Timestamp (for expiration)
Status (in_progress, completed, failed)
At-Least-Once vs Exactly-Once
We hear that distributed systems can only guarantee at-least-once delivery, not exactly-once. This is technically true and practically misleading.
At-Least-Once
Every message will be delivered, possibly multiple times. Sender retried until acknowledged.
Exactly Once
Every message is delivered exactly one time. No duplicates, no losses.
Pure exactly-once delivery is impossible in async distributed systems. We cannot distinguish “message lost“ from “acknowledgement lost“. The sender must retry, that may cause duplicates.
But exactly-once delivery is possible through idempotency:
`At-least-once delivery + Idempotent processing = Exactly-once semantics
We deliver the message multiple times but the receiver processes it only once.
# Consumer with exactly-once semantics via idempotency
def consume_message(message):
message_id = message['id']
# Check if already processed
if database.exists(f"processed:{message_id}"):
logger.info(f"Duplicate message {message_id}, skipping")
return
# Process the message
result = process(message)
# Mark as processed (atomically with the processing, ideally)
database.set(f"processed:{message_id}", True, ttl=SEVEN_DAYS)
return resultWhen someone claims “exactly-once delivery“, they almost always mean “at-least-once delivery with deduplication”. I am not at all criticising it. This is how it has to work. The main focus should be to make deduplication reliable.
Database Patterns For Idempotency
Idempotency ultimately comes down to what we do at the database level. Below are the patterns that work.
Pattern 1: Unique Constraints
In this pattern, the database enforces idempotency for you.
-- Prevent duplicate payments
CREATE TABLE payments (
id UUID PRIMARY KEY,
idempotency_key VARCHAR(255) UNIQUE NOT NULL,
user_id INTEGER NOT NULL,
amount DECIMAL(10, 2) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Inserting with conflict handling (PostgreSQL)
INSERT INTO payments (id, idempotency_key, user_id, amount)
VALUES (gen_random_uuid(), 'key-123', 456, 247.00)
ON CONFLICT (idempotency_key) DO NOTHING
RETURNING *;If the idempotency key already exists, the insert is silently ignored. The first write wins, and duplicates are impossible.
The limitation with this approach is that we only know the operation was deduplicated, we don’t get the original result. We need a follow-up query to fetch it.
Pattern 2: Conditional Writes
In this approach, we only perform writes if certain conditions are met.
-- Only insert if no payment exists for this order
INSERT INTO payments (id, order_id, amount)
SELECT gen_random_uuid(), 'order-456', 100.00
WHERE NOT EXISTS (
SELECT 1 FROM payments WHERE order_id = 'order-456'
);
-- Only update if version matches (optimistic concurrency)
UPDATE account
SET balance = balance - 100, version = version + 1
WHERE id = 123 AND version = 5;
-- If version changed, no rows updated, retryThis pattern is essential when multiple fields interact, and we need atomic checks.
Pattern 3: Transactional Outbox
This pattern is used when we need to update a database AND send a message (example, to Kafka).
BEGIN TRANSACTION;
-- Make the state change
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 123;
-- Record the event in an outbox table (same transaction)
INSERT INTO outbox (id, event_type, payload, created_at)
VALUES (gen_random_uuid(), 'inventory_updated', '{"product_id": 123}', NOW());
COMMIT;A separate process polls the outbox and publishes events. If it crashes before publishing, it retries. If it crashes after publishing but before marking as sent, it publishes again but consumers handle duplicates via their own idempotency.
def poll_outbox():
# Get unsent events
events = db.query("SELECT * FROM outbox WHERE sent = FALSE LIMIT 100")
for event in events:
# Publish (might be duplicate, consumers must be idempotent)
kafka.publish(event.payload)
# Mark as sent
db.execute("UPDATE outbox SET sent = TRUE WHERE id = ?", event.id)API Design For Idempotency
It is advisable to make idempotency explicit and easy while designing APIs.
Accept Idempotency Keys
POST /v1/charges HTTP/1.1
Host: api.example.com
Idempotency-Key: 8a7b6c5d-4e3f-2g1h-0i9j-8k7l6m5n4o3p
Content-Type: application/json
{
"amount": 5000,
"currency": "usd",
"customer": "cust_123"
}The response includes whether this was a new operation or a cached result.
{
"id": "ch_1234567890",
"amount": 5000,
"currency": "usd",
"status": "succeeded",
"idempotent_replayed": false
}Use PUT For Idempotent Updates
PUT semantics say: “Set this resource to this state“. Inherently idempotent.
PUT /users/123/email
Content-Type: application/json
{
"email": "[email protected]"
}Whether we call this once or 10 times, the email ends up as [email protected] .
Compare to:
PATCH /users/123
Content-Type: application/json
{
"op": "append",
"path": "/tags",
"value": "premium"
}PATCH with append is not idempotent. Each call adds another “premium“ tag.
Return Consistent Responses
This is very critical. If we detect a duplicate, we should return the same response as the original.
def handle_idempotent_request(idempotency_key, request):
existing = get_stored_response(idempotency_key)
if existing:
# SAME response as before — including status code
return existing['response'], existing['status_code']
# Process new request
response, status = process(request)
# Store for future deduplication
store_response(idempotency_key, response, status)
return response, statusIf the first request got a 201 created, the retry must also get 201 created, even though nothing was created on the retry. Returning 200 OK or 409 Conflict confuses clients.
Partial Failures: The Hard Case
Let’s look at an example scenario where idempotency gets genuinely different:
Charge the customer’s card
Update the inventory
Send confirmation email
What happens if step 2 fails after step 1 succeeded?
Request 1:
Step 1: charge card (success) ($100 charged)
Step 2: update inventory (fails) (database error)
Step 3: Never reached
Request 2 (retry):
Step 1: charge card... ???If we charge the card again, we have double-charged the customer. If we skip the charge because idempotency key exists, we have given them free product.
Solution 1: Make Each Step Independently Idempotent
We should use separate idempotency keys or checks for each step:
def process_order(order_id, idempotency_key):
# Step 1: Charge (idempotent via payment processor's idempotency)
payment = stripe.charges.create(
amount=order.total,
idempotency_key=f"{idempotency_key}:charge"
)
# Step 2: Update inventory (idempotent via conditional update)
result = db.execute("""
UPDATE inventory
SET quantity = quantity - 1
WHERE product_id = ? AND quantity > 0 AND order_id IS NULL
RETURNING *
""", order.product_id)
if not result:
# Inventory already updated for this order, or out of stock
# Check which case
existing = db.query(
"SELECT * FROM inventory WHERE order_id = ?",
order_id
)
if not existing:
raise OutOfStock()
# Already processed, continue
# Step 3: Send email (idempotent via deduplication table)
if not db.exists(f"email_sent:{order_id}"):
send_email(order.customer_email, order_id)
db.set(f"email_sent:{order_id}", True)Each step can be retried independently without corruption.
Solution 2: Saga Pattern With Compensation
If we can’t make every step idempotent, we should use compensating transactions:
def process_order_saga(order):
try:
payment_id = charge_card(order)
except PaymentFailed:
raise # Nothing to compensate
try:
reserve_inventory(order)
except InventoryError:
# Compensate: refund the charge
refund_payment(payment_id)
raise
try:
send_confirmation(order)
except EmailError:
# Email failed, but order is complete
# Log for manual follow-up, don't rollback
log_email_failure(order)
return successEach step either succeeds or is rolled back. Retried start from the beginning, but compensations ensure no double-charging.
Checklist
Before we ship any API or message handler that modifies state:
Design
Each operation has a clear idempotency key (explicit or implicit)
Client generates keys, not server
Keys are stored with TTL appropriate to your retry windows
Database
Unique constraints prevent duplicate creation
Conditional writes check preconditions
Transactions span all related changes
API
POST endpoints accept Idempotency-Key header
PUT semantics are truly idempotent (set, not append)
Responses are consistent between original and retries
Failures
Each step in multi-step operations is independently idempotent
Or: compensating transactions undo partial progress
In-progress operations are visible and timeout safely
Takeaway
Idempotency is how we build systems that can handle the real world, where unreliability prevails such as networks drop packets, servers crash mid-request, and clients retry out of desperation.
Without idempotency, every retry is a gamble. With idempotency, retries are safe by design.
Forget about being clever, we must respect the fundamental unreliability of distributed systems and design accordingly. Every operation that changes state should be idempotent. No exceptions.
What's the worst idempotency failure you have encountered? I am collecting stories of double-charges, duplicate orders, and corrupted data. The patterns are consistent, and the war stories are educational.
Hit reply. I read everything. Namaste!
— Anirudh
If this clicked, forward it to your team. Idempotency is one of those concepts everyone thinks they understand until they don't. And if you want more deep dives like this, subscribe to The Main Thread — practical distributed systems engineering, one essay per week.
