Hello, fellow engineers and scale chasers 😄

We are diving into the sixth issue of The Main Thread, building on our caching series from Caches Lie: Consistency Isn’t Free. In the last issue, we explored the foundational patterns that let caches deliver speed while flirting with consistency. Today, we shift focus to the practical realities of running a cache: how to manage precious memory, decide what gets evicted when the space tightens, and navigate the pitfalls that can turn our optimizations into a headache. These aren’t abstract concepts - they are the levers that determine whether our cache saves our system or silently sabotages it.

Memory isn’t infinite, and in the heat of production, poor management can lead to thrashing, spikes in latency or wasted resources. I will walk you through eviction strategies, expiration techniques, and those all-too-common traps, with code you can tweak and insights drawn from real-world battles. My goal is to make it engaging and straightforward, so you walk away with tools that deliver real value - whether you are tuning a Redis cluster or debugging a custom in-memory store.

Let’s get into it.

Memory Management: Keep Your Cache Lean And Mean

At its core, memory management in caching is about efficiency: you have a fixed budge of RAM and every entry competes for space. When the cache fills up, you need smart rules to evict old data without losing the hot items that drive your performance gains. This is where eviction policies come in - they are the algorithms that decide what to kick out, based on how the data is accessed.

Take Least Recently Used (LRU) cache for example, it removes the data that hasn’t been touched in the longest time, assuming that recent accesses are good predictor of what is needed next. This shines in workloads with temporal locality, like user sessions where recent activity matters most. On the flip side, it might evict bulky items that are accessed infrequently but critically.

Least Frequently Used (LFU) cache flips the script by tracking access counts and by evicting less popular entries, which is great for data with stable popularity but can falter during sudden spikes, like viral content overwhelming the system.

Then there is TTL-aware eviction, which prioritizes clearing expired items first. This is perfect for time-bound data, though it relies on setting accurate TTL to avoid holding onto useless garbage.

Below is a simple implementation of LRU in practice. It’s educational and easy to adapt for prototyping:

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = OrderedDict()

    def get(self, key):
        if key in self.cache:
            self.cache.move_to_end(key)  # Promote to most recent
            return self.cache[key]
        return None

    def set(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        elif len(self.cache) >= self.capacity:
            self.cache.popitem(last=False)  # Remove least recent
        self.cache[key] = value

# Try it out
cache = LRUCache(capacity=3)
cache.set("user:1", {"name": "Alice"})
cache.set("user:2", {"name": "Bob"})
cache.set("user:3", {"name": "Carol"})
cache.get("user:1")  # Alice is now most recent
cache.set("user:4", {"name": "Dave"})  # Bob gets evicted

Running this, you will notice how it naturally favours recency, helping you realize why it works for many apps, and where it needs tuning.

Expiration: Timing Out the Old to Welcome the New

Eviction handles overflow, but expiration proactively clears data based on time, ensuring nothing lingers indefinitely and becomes dangerously stale. TTL is the star here: we set a time after which the entry automatically expires.

For fast changing data like live scores or trending posts, keep TTLs short, say one to five minutes - to minimize inconsistency windows. Semi-static content, such as user profiles might warrant 30 to 60 minutes. For rarely updated items, like app configurations, stretch it to hours or days.

A clever approach is Sliding TTLs, where each access resets the clock. This approach is ideal for sessions that should persist as long as the user is active but vanishes during inactivity. To internalize this, check out this basic TTL cache in Python, which uses lazy expiration (cleaning only on access) for simplicity and efficiency:

import time
from typing import Any, Optional

class SimpleCache:
    def __init__(self, default_ttl: float = 60.0):
        self._cache: dict[str, dict[str, Any]] = {}
        self._default_ttl = default_ttl

    def set(self, key: str, value: Any, ttl: Optional[float] = None) -> None:
        expiration_time = time.time() + (ttl if ttl is not None else self._default_ttl)
        self._cache[key] = {"value": value, "expires_at": expiration_time}

    def get(self, key: str) -> Optional[Any]:
        if key not in self._cache:
            return None
        entry = self._cache[key]
        if time.time() > entry["expires_at"]:
            del self._cache[key]
            return None
        return entry["value"]

# Quick test
cache = SimpleCache(default_ttl=2.0)
cache.set("user:123", {"name": "Alice"})
print(cache.get("user:123"))  # Should return the value
time.sleep(1.5)
print(cache.get("user:123"))  # Still there if under 2s

In real systems, add background threads for proactive cleanup to reclaim memory faster. Tuning TTL is not a guesswork - base it on your data’s change rate and tolerance for staleness.

Common Pitfalls: Trap That Catch Even Pros

No cache is immune to mishaps, and understanding these can save you hours of debugging.

Stale Reads

They are the classic: the cache serves the outdated info, which might be benign (like an old “last seen“ timestamp) but catastrophic (wrong pricing leading to revenue loss). We can fight them by keeping short TTLs for dynamic data, invalidating keys explicitly on updates, or embedding versions in keys so readers can spot and refresh mismatches.

Thundering Herd

Then there’s thundering herd or cache stampede: a hot key expires, and a swarm of requests all miss at once, hammering out backing store. Manage with request coalescing - one request loads while others wait, or jitters in TTL to spread out expirations. Race conditions on writes are sneaky too: concurrent updates can leave the cache in inconsistent state. Use compare-and-set operations or sequence numbers to ensure only latest version sticks.

Bad Key Design

This also often trips teams up - raw keys without structure make invalidation messy. Build standardized keys with versions or timestamps, like user:profile:123:v2 to make busting caches intuitive. Finally, watch for eviction churn: if eviction happens too often, your cache is undersized or misconfigured, costing memory without delivering speed. Scale up or refine your keys to fix it.

Key Metrics: What to Watch to Stay Ahead

Metrics turn guesswork into science. Hit rate is a start - aim for 90% or better - but it is meaningless without freshness. Track staleness ratio by sampling cache values against the source to cache inconsistencies. Miss penalties show the real cost of failures in latency and load.

Keep an eye on evictions per second and memory utilization to spot capacity crunches early. For write heavy patterns, monitor amplification - the extra operations a single write triggers.

❝

Remember the basics: cache works because memory is fast and access is localized. but if writes are frequent or patterns uniform, the overhead might outweigh the benefits - reassess then.

Looking Forawrd: Tackling Inconsistencies Head-On

We are now equipped to manage and troubleshoot cache's operations. In the next issue, we'll confront inconsistency - its origins, real impacts, and how to measure it with hands-on simulations.

Got a caching pitfall story that changed how you build? Hit reply - your experiences could inspire the community.

Keep subscribing for these deep dives. Namaste,

Anirudh

P.S. All examples are in my Cache Mechanics repo - fork and experiment.

Subscribe | Full Blog

Caching: Memory Management, Eviction, and Common Pitfalls