Hello curious engineers, welcome to the twenty-fifth issue of The Main Thread. In this issue, we are going to look at one of the most useful but hardly understood pattern in distributed systems: Idempotency.

Always remember, “just retry“ only works if our operation is idempotent. During the span of my career, I have lost count of how many production incidents I have traced back to this single misunderstanding.

A network blip causes a timeout → the client retries → the server processes the request twice → the customer get charged twice/order ships twice/database row corrupted because an increment ran twice.

“But the request failed, the client had to retry“ is not a solid defence, in my opinion. Yes, the client needed to retry but that should have been safe. It was not safe because no one thought about idempotency.

This is not an advanced distributed systems concept. It’s table stakes. If the system handles money, inventory, user data, or anything that can’t be corrupted, idempotency needs to be understood deeply.

Let’s make sure we do.

What Idempotency Actually Means

❝

An operation is idempotent if performing it multiple times has the same effect as performing it once.

Simple, right? It is not. Let me show you with examples that trip even experienced engineers.

Idempotent

SET user.email = "[email protected]"

If we run it once or run it ten times, user.email will still be [email protected] . Same result regardless of how many times we execute.

Not Idempotent

INCREMENT user.login_count BY 1

If we run it once, count goes from 5 to 6, run it twice, count is 7. Run it ten times, count is 15. Each execution changes the state.

Surprisingly Not Idempotent

INSERT INTO orders (user_id, product_id, quantity) VALUES (123, 54352, 3);

If we run it twice, we get two orders. The second execution isn’t a no-op: it creates a duplicate.

Surprisingly Idempotent

DELETE FROM sessions WHERE user_id = 123

If we run it once, sessions of user 123 are deleted. If we run it 10 times, sessions are still deleted. The second through tenth operations are no-op because there is nothing left to delete.

Always think idempotency is about the effect on system state, not about the operation itself. HTTP delete is idempotent because repeated calls leave the system in same state.

Why This Matters In Distributed Systems

In a distributed system, we cannot distinguish between the following three scenarios:

Request succeeded, response lost
Request failed before processing
Request failed after processing

When a client times out, it doesn’t know which happened. The server might have processed the payment or might not have. The only safe option is to retry but retrying is only safe if the operation is idempotent.

The client sees success on the retry. It has no idea the operation ran twice. Without idempotency, we have just doubled the effect of that operation.

Idempotency Keys: The Solution

An idempotency key is a unique identifier attached to each operation that lets the server recognize and deduplicate retries.

POST /payments
Idempotency-Key: 8a7b6c5d-4e3f-2g1h-0i9j-8k7l6m5n4o3p
Content-Type: application/json

{
  "amount": 100.00,
  "currency": "USD",
  "customer_id": "cust_123"
}

The server will follow the following logic:

Receives request with idempotency key
Check if this key has been seen before
If yes, return the stored response (don't process again)
If no, process the request, store the response and return it

import hashlib
import json
from datetime import datetime, timedelta

class IdempotencyStore:
    """
    Store for idempotency keys and their results.
    In production, use Redis or database, not in-memory.
    """

    def __init__(self, ttl_hours: int = 24):
        self.store = {}
        self.ttl = timedelta(hours=ttl_hours)

    def get(self, key: str) -> dict | None:
        """Retrieve stored result for idempotency key."""
        entry = self.store.get(key)
        if entry is None:
            return None

        # Check expiration
        if datetime.utcnow() > entry['expires_at']:
            del self.store[key]
            return None

        return entry['result']

    def set(self, key: str, result: dict) -> None:
        """Store result for idempotency key."""
        self.store[key] = {
            'result': result,
            'expires_at': datetime.utcnow() + self.ttl
        }

    def set_in_progress(self, key: str) -> bool:
        """
        Mark key as in-progress. Returns False if already exists.
        Prevents concurrent duplicate processing.
        """
        if key in self.store:
            return False

        self.store[key] = {
            'result': None,  # In progress
            'expires_at': datetime.utcnow() + timedelta(minutes=5)
        }
        return True


def handle_payment(request, idempotency_key: str, store: IdempotencyStore):
    """
    Idempotent payment handler.
    """
    # Check for existing result
    existing = store.get(idempotency_key)
    if existing is not None:
        if existing.get('status') == 'in_progress':
            return {"error": "Request in progress"}, 409
        return existing, 200  # Return cached response

    # Mark as in progress (atomic check-and-set)
    if not store.set_in_progress(idempotency_key):
        return {"error": "Duplicate request"}, 409

    try:
        # Process the payment
        result = process_payment(request)

        # Store the result
        store.set(idempotency_key, result)

        return result, 200

    except Exception as e:
        # On failure, remove the in-progress marker
        # so client can retry with same key
        store.delete(idempotency_key)
        raise

Key Design Principles

1. Who generates the key?

The client generates the key. Always. If the server generates the key, the client can’t safely retry because it won’t know the key for its lost request.

2. What makes a good key?

Following are three options to generate a good idempotency key

# Option 1: UUID (simple, unique, no semantic meaning)
idempotency_key = str(uuid.uuid4())

# Option 2: Hash of request parameters (same params = same key)
key_data = f"{user_id}:{action}:{amount}:{timestamp_minute}"
idempotency_key = hashlib.sha256(key_data.encode()).hexdigest()

# Option 3: Client-provided transaction ID (business meaning)
idempotency_key = f"order-{order_id}-payment-attempt"

3. How long to store keys?

Long enough that retries are covered, short enough that storage is manageable. 24-48 hours is typical. After that, we assume the client would have noticed the issue and raised a support ticket.

4. What to store with the keys?

Response (so that we can return it on retries)
Request hash (optional: to detect different requests reusing a key)
Timestamp (for expiration)
Status (in_progress, completed, failed)

At-Least-Once vs Exactly-Once

We hear that distributed systems can only guarantee at-least-once delivery, not exactly-once. This is technically true and practically misleading.

At-Least-Once

Every message will be delivered, possibly multiple times. Sender retried until acknowledged.

Exactly Once

Every message is delivered exactly one time. No duplicates, no losses.

Pure exactly-once delivery is impossible in async distributed systems. We cannot distinguish “message lost“ from “acknowledgement lost“. The sender must retry, that may cause duplicates.

❝

But exactly-once delivery is possible through idempotency:
`At-least-once delivery + Idempotent processing = Exactly-once semantics

We deliver the message multiple times but the receiver processes it only once.

# Consumer with exactly-once semantics via idempotency
def consume_message(message):
    message_id = message['id']

    # Check if already processed
    if database.exists(f"processed:{message_id}"):
        logger.info(f"Duplicate message {message_id}, skipping")
        return

    # Process the message
    result = process(message)

    # Mark as processed (atomically with the processing, ideally)
    database.set(f"processed:{message_id}", True, ttl=SEVEN_DAYS)

    return result

When someone claims “exactly-once delivery“, they almost always mean “at-least-once delivery with deduplication”. I am not at all criticising it. This is how it has to work. The main focus should be to make deduplication reliable.

Database Patterns For Idempotency

Idempotency ultimately comes down to what we do at the database level. Below are the patterns that work.

Idempotency: Pattern That Makes Distributed Systems Actually Work

What Idempotency Actually Means

Idempotent

Not Idempotent

Surprisingly Not Idempotent

Surprisingly Idempotent

Why This Matters In Distributed Systems

Idempotency Keys: The Solution

Key Design Principles

1. Who generates the key?

2. What makes a good key?

3. How long to store keys?

4. What to store with the keys?

At-Least-Once vs Exactly-Once

At-Least-Once

Exactly Once

Database Patterns For Idempotency

Pattern 1: Unique Constraints

Reply

Keep Reading

Liked it? Subscribe.

Idempotency: Pattern That Makes Distributed Systems Actually Work

What Idempotency Actually Means

Idempotent

Not Idempotent

Surprisingly Not Idempotent

Surprisingly Idempotent

Why This Matters In Distributed Systems

Idempotency Keys: The Solution

Key Design Principles

1. Who generates the key?

2. What makes a good key?

3. How long to store keys?

4. What to store with the keys?

At-Least-Once vs Exactly-Once

At-Least-Once

Exactly Once

Database Patterns For Idempotency

Pattern 1: Unique Constraints

Subscribe to keep reading

Reply

Keep Reading

Liked it? Subscribe.

The Main Thread