May 20, 2026

Caching: trading freshness for speed

Caches make reads fast by letting the system be wrong in controlled ways. Keys, TTLs, invalidation, and stampede protection explored.

Caching is a common technique for reducing latency: put Redis in front of the database, put a CDN in front of the app, or put a memoized function in front of expensive code. Latency drops and load on the source of truth decreases.

The tradeoffs show up in specific failure modes: a user updates their profile and sees the old name for five minutes, a hot key expires and 20,000 requests hit the database at the same time, or a private response gets cached under a public key. These are the cases where caching becomes an architectural concern rather than a simple optimization.

A cache is not just a faster place to put data. It is an agreement about how stale, shared, and recoverable a value is allowed to be.

Start with the stale promise

The first caching question is not "what should we put in Redis?" but rather:

How wrong is this value allowed to be, and for how long?

This is the stale promise. A product logo can be stale for a day, a user display name for a minute, and a home feed for a few seconds. A checkout price should not be stale unless it is revalidated before purchase, and a bank balance might be cached for display but cannot be used by the ledger to decide whether money can move.

Caching without a stale promise produces correctness that is incidental rather than designed, and tends to surface when two screens disagree.

The cache is a read model

A cache is a derived read model with a short memory, which means the same questions from the database post apply:

What is the read path?
What key identifies the answer?
How much stale data is acceptable?
What writes make the answer wrong?
Can the answer be rebuilt?
What happens when many callers rebuild it at once?

The last question is the one most often skipped. A cache improves the normal path but can make the failure path significantly worse: if a hot key serves 10,000 QPS and the cached value expires, every request can become a database request unless something stops the herd.

Cache layers

Caching is not one thing. The layer decides who shares the value and who owns invalidation.

Layer	Typical latency	Good for	Main risk
Browser or mobile cache	0-10ms	Static assets, user-local state, repeated reads	Hard to revoke quickly
CDN or edge cache	10-80ms	Public pages, images, API responses with clear variance	Caching private data by mistake
App process memory	sub-ms	Small hot config, per-instance memoization	Inconsistent across instances
Redis or Memcached	1-5ms inside region	Shared hot reads, sessions, expensive aggregates	Hot keys, stampedes, serialization cost
Database buffer/cache	varies	Indexed reads and repeated pages	Treating it like an explicit product cache

Use the highest layer that can safely answer the request. A static image belongs on a CDN. A public product catalog page belongs on a CDN with purge or a short TTL. An authenticated account dashboard belongs in an application cache keyed by tenant and user permissions. An expensive analytics number should be precomputed, stored, and given an age.

The closer the cache is to the user, the faster it is, and the harder it is to coordinate.

Cache keys are data models

The key is the schema.

Bad key:

profile:123

Better key:

profile:v3:tenant:88:user:123:viewer:456:locale:en-US

The second key looks unwieldy because it reflects all the dimensions that change the answer. If the response varies by tenant, viewer permissions, locale, feature flag, or API version, the key must include that dimension. Without those dimensions, the cache mixes answers rather than storing one.

Common key dimensions:

Dimension	Include it when
Version	The serialized shape or meaning can change
Tenant id	Data is scoped to an organization
User id	Response is personalized
Viewer id or role	Permissions affect visible fields
Locale	Text, currency, dates, or formatting changes
Page and sort	Lists are paginated or ordered
Feature flag cohort	Experiments change the response

Versioned keys are the simplest deploy-time invalidation strategy:

CACHE_VERSION = 3


def profile_cache_key(tenant_id, user_id, viewer_id, locale):
    return (
        f"profile:v{CACHE_VERSION}:"
        f"tenant:{tenant_id}:"
        f"user:{user_id}:"
        f"viewer:{viewer_id}:"
        f"locale:{locale}"
    )

When the representation changes, bump the version and old values expire naturally. This is a simple and effective approach to deploy-time invalidation.

TTL is not invalidation

TTL is a fuse. It guarantees the cache entry dies eventually, but it does not guarantee the entry is correct until then. If a user changes their display name and the cached profile has a 10-minute TTL, the old name can remain visible for 10 minutes unless the write path invalidates it. That may be acceptable, but it should be a product decision.

There are three basic choices:

Strategy	What happens	Use it when
TTL only	Data expires after time passes	Staleness is harmless and writes are hard to track
Invalidate on write	Delete affected keys after canonical write	User-visible freshness matters
Update on write	Write DB and cache together	Read-after-write matters and write latency can afford it

A reasonable default is TTL plus invalidate on write: TTL protects against missed invalidations, and invalidation protects the user from waiting for TTL.

Cache-aside

Cache-aside is the common pattern:

read:
  check cache
  if hit, return
  if miss, load from source of truth
  write cache
  return

write:
  update source of truth
  delete affected cache keys

Here is the small version:

def get_profile(user_id):
    key = f"profile:v1:user:{user_id}"

    cached = cache.get_json(key)
    if cached is not None:
        return cached

    profile = db.get_profile(user_id)
    cache.set_json(key, profile, ttl_seconds=300)
    return profile

This works for cold or low-traffic keys, but a hot key that is missing requires stampede protection. A production version looks like this:

import random
import time


def ttl_with_jitter(base_seconds, jitter_ratio=0.15):
    spread = base_seconds * jitter_ratio
    return int(base_seconds + random.uniform(-spread, spread))


def get_profile(user_id):
    key = f"profile:v2:user:{user_id}"
    stale_key = f"{key}:stale"
    lock_key = f"lock:{key}"

    cached = cache.get_json(key)
    if cached is not None:
        return cached

    lock_acquired = cache.set_nx(lock_key, "1", ttl_seconds=10)

    if not lock_acquired:
        stale = cache.get_json(stale_key)
        if stale is not None:
            return stale

        time.sleep(0.050)
        cached = cache.get_json(key)
        if cached is not None:
            return cached

        return db.get_profile(user_id)

    try:
        profile = db.get_profile(user_id)
        cache.set_json(key, profile, ttl_seconds=ttl_with_jitter(300))
        cache.set_json(stale_key, profile, ttl_seconds=1800)
        return profile
    finally:
        cache.delete(lock_key)

This code is still simplified. In a real Redis implementation, the lock should have a unique token so one caller does not delete another caller's lock after a timeout.

The general shape is:

One request rebuilds the missing value.
Other requests use stale data or wait briefly.
TTL jitter prevents many keys from expiring at the same second.
The database is the fallback, not the first victim.

The stampede calculation

The arithmetic is straightforward:

hot key traffic:       8,000 requests/sec
database read time:    250ms
cache entry expires:   now

concurrent DB reads = 8,000 x 0.250 = 2,000

One expired key can create 2,000 concurrent database reads. If the query holds a connection for 250ms, the connection pool is exhausted, unrelated endpoints start waiting, clients begin retrying, and the failure cascades into territory covered by the API reliability post. This is why "just add a five minute TTL" is not enough on its own.

Stampede defenses

Useful defenses:

Defense	Idea	Tradeoff
Single-flight lock	One caller rebuilds while others wait	Lock bugs can block refresh
Stale-while-revalidate	Serve old value while one caller refreshes	Users may see stale data
TTL jitter	Spread expirations over time	Values expire less predictably
Probabilistic early refresh	Refresh before expiry under load	More cache writes
Prewarming	Fill cache before traffic arrives	Needs deployment or job discipline
Negative caching	Cache "not found" briefly	Can hide newly created data for the TTL

Probabilistic early refresh looks like this:

import random


def should_refresh_early(ttl_remaining, original_ttl):
    age_ratio = 1.0 - (ttl_remaining / original_ttl)
    return random.random() < max(0.0, age_ratio ** 3)

Early in the TTL, refresh is unlikely. Near the end, one of the readers is likely to refresh before the key fully expires. This is not needed everywhere; use it for hot keys where an expiry cliff is dangerous.

Write-through, write-behind, and friends

The pattern names matter less than where the acknowledgment boundary sits.

Pattern	Read path	Write path	Risk
Cache-aside	App checks cache, then DB	App writes DB, then deletes cache	Miss stampede and stale reads
Read-through	Cache loads from DB on miss	App writes DB, cache loads later	Cache library owns more behavior
Write-through	App writes cache and DB before success	User waits for both	Higher write latency
Write-behind	App writes cache, DB is updated later	Fast write response	Data loss if cache fails before flush
Refresh-ahead	Cache refreshes before expiry	Background refresh	Extra work for unused keys

For most product systems, cache-aside is a reasonable default. Write-through is useful when read-after-write matters and the write rate is manageable. Write-behind is risky for canonical data but can work for metrics, counters, or logs where loss and replay rules are explicit. Avoid letting the cache become the only place the truth exists unless the system is intentionally designed as an in-memory store with its own durability story.

Invalidation is part of the write path

The write path post described the flow as:

command -> fact -> effects -> projections

Cache invalidation is one of those effects, which means it needs the same seriousness as search indexing or notifications.

For a profile update:

def update_profile(user_id, display_name):
    with db.transaction() as tx:
        tx.update_profile(
            user_id=user_id,
            display_name=display_name,
        )
        tx.insert_outbox_event(
            event_type="profile.updated",
            aggregate_id=user_id,
            payload={"user_id": user_id},
        )

    return {"status": "updated"}

Then a worker invalidates derived keys:

def handle_profile_updated(event):
    user_id = event.payload["user_id"]

    cache.delete(f"profile:v2:user:{user_id}")
    cache.delete(f"profile:v2:user:{user_id}:stale")
    cache.delete(f"feed_header:v4:user:{user_id}")

Invalidation can be synchronous if freshness is critical, but with many derived keys, doing it inside the user request can turn a simple update into a slow integration path. The choice comes back to the freshness promise.

Delete usually beats update

On writes, deleting the cached value is often simpler than updating it because one write can affect many read shapes. Changing a display name may affect the user's profile, their posts, comment rows, notification previews, search snippets, and team member lists. Updating all of those cache entries correctly is difficult, while deleting the known keys and letting reads rebuild is generally safer.

The catch is key discovery: if the affected keys cannot be named, they cannot be invalidated precisely. That leaves three options:

Use shorter TTLs.
Use versioned namespace keys.
Maintain an index of dependent keys.

Namespace versioning is a practical middle ground:

def get_profile_namespace(user_id):
    version = cache.get_int(f"profile_namespace:user:{user_id}") or 1
    return f"profile:v{version}:user:{user_id}"


def invalidate_profile_namespace(user_id):
    cache.incr(f"profile_namespace:user:{user_id}")

Instead of deleting every possible profile key, increment the namespace version. New reads use new keys, and old keys expire later. This trades memory for simpler invalidation.

What I notice

Caching bugs are often ambiguity bugs. The team knows a value is cached, but nobody knows:

Whether the cached value is public or personalized.
Whether the key includes permissions.
Which write invalidates it.
Whether stale data is acceptable.
What happens when the cache is cold.

The cache makes the system faster by hiding work, which also makes it easier to hide ownership. The remedy is to document the contract alongside the cache itself.

HTTP caching is a contract too

CDN and browser caches use headers as their API.

For a public static asset:

Cache-Control: public, max-age=31536000, immutable

This works when the filename is content-hashed:

/assets/app.7f3a21c.css

If the content changes, the URL changes.

For an authenticated API response:

Cache-Control: private, max-age=30
Vary: Authorization, Accept-Language

private tells shared caches not to store the response, and Vary tells them which request headers change the answer.

For sensitive data:

Cache-Control: no-store

Vague headers around private content should be avoided. The most severe class of cache bug is not stale data; it is one user seeing another user's data because the cache key ignored identity.

Negative caching

Caching misses can also help. If a crawler requests the same nonexistent product id 10,000 times, the database should not have to prove absence 10,000 times.

def get_product(product_id):
    key = f"product:v1:{product_id}"
    cached = cache.get_json(key)

    if cached == {"missing": True}:
        return None

    if cached is not None:
        return cached

    product = db.get_product(product_id)
    if product is None:
        cache.set_json(key, {"missing": True}, ttl_seconds=30)
        return None

    cache.set_json(key, product, ttl_seconds=300)
    return product

Keep negative TTLs short. If the product is created one second after the miss is cached, users may keep seeing "not found" until the negative entry expires. Negative caching is a tool for reducing pressure, not a substitute for the source of truth.

Hot keys

A cache can be healthy overall and still fail on one key. Examples:

global leaderboard
homepage config
latest exchange rate
celebrity profile
tenant with 80 percent of traffic

A single Redis key can become the bottleneck even if the cluster has plenty of total capacity.

Symptoms:

High p99 on one endpoint.
Redis CPU or network spikes on one shard.
One key dominates command stats.
Database spikes whenever that key expires.

Mitigations:

Replicate the value under several keys and pick one randomly for reads.
Cache locally in each app process for a very short TTL.
Precompute the value and push updates.
Split the value into smaller pieces if reads do not need the whole object.
Give the key stronger stampede protection than normal keys.

Local memory caching is often the right choice for small, frequently accessed data. Even a two-second per-process cache can remove significant Redis pressure, with the tradeoff that every app instance has a slightly different view of the world.

Counters are special

Counters look cache-friendly: likes, views, followers, notifications. They are also where correctness arguments become slippery.

If the number is decorative, approximate is fine:

1,204 likes
about 1.2K views

If the number controls access or money, approximate is not fine:

remaining credits
available inventory
account balance

For decorative counters, batch writes:

def record_view(post_id):
    cache.incr(f"views_buffer:post:{post_id}")


def flush_view_counts(limit=1000):
    for key, count in cache.scan_counts("views_buffer:post:*", limit=limit):
        post_id = key.rsplit(":", 1)[1]
        db.increment_post_views(post_id, count)
        cache.delete(key)

This reduces database write pressure but creates a loss window: if Redis loses the buffer before flush, those views disappear. This is acceptable for views and unacceptable for payments.

When not to cache

Caching should not be a reflex. Avoid caching when:

The read is already cheap and low volume.
The value changes constantly.
The key would need too many personalization dimensions.
The response contains sensitive data and the cache layer is shared.
The miss path is more dangerous than the hit path is useful.
The team cannot name the invalidation trigger.

The simplest cache is the one that is not added. Sometimes the right fix is an index, a smaller response, pagination, a read model, or a better query. Caching is a pressure valve rather than a substitute for modeling the data correctly.

Metrics that actually help

Hit rate is useful but not sufficient on its own. A more complete set of metrics:

Metric	Why it matters
Hit rate by key family	Shows whether the cache is doing useful work
Miss rate by key family	Shows source-of-truth pressure
Miss latency	Shows how expensive cold reads are
Stampede lock wait time	Shows herd pressure
Stale responses served	Shows how often degraded freshness happens
Invalidation count	Shows write-driven churn
Evictions	Shows memory pressure or bad sizing
Hot key distribution	Shows whether one key dominates
Serialized payload size	Shows network and memory cost
Source-of-truth QPS saved	Shows the actual value of the cache

Hit rate can be misleading on its own: a cache with 99 percent hit rate can still cause problems if the 1 percent misses all happen on one hot path with expensive queries.

What I think

The practical sequence is:

Fix the obvious query or data model first.
Name the stale promise.
Choose the highest safe cache layer.
Design the key like a schema.
Add TTL with jitter.
Invalidate on writes that break the promise.
Protect hot misses with single-flight or stale-while-revalidate.
Measure misses, not just hits.
Keep a repair path back to the source of truth.

Caching is reasonable to underengineer only when the contract is small and explicit:

This value may be 60 seconds stale.
This write invalidates it.
This key includes tenant and viewer.
One caller rebuilds it.
Others get stale data.

That is a design. Without something like it, the cache is operating on assumptions rather than guarantees.

Tutorial checklist

For any cache, fill this out:

Question	Example answer
Cached value	User profile card
Source of truth	`users` and `profile_settings` tables
Cache layer	Redis plus 2-second app memory for hot users
Key	`profile:v2:tenant:{tenant_id}:user:{user_id}:viewer:{viewer_id}:locale:{locale}`
Freshness promise	Up to 60 seconds stale, except self-view after edit
TTL	300 seconds with 15 percent jitter
Write invalidation	`profile.updated` event deletes profile and feed header keys
Stampede defense	Single-flight Redis lock and stale fallback
Negative cache	Missing users cached for 30 seconds
Sensitive fields	Email hidden unless viewer has permission, included in key by viewer
Metrics	hit rate, miss latency, stale served, lock waits, hot keys
Repair path	Delete namespace version or flush key family

Then ask:

If this cache is empty during peak traffic, what breaks first?

If the answer is "the database," the cache is not just an optimization; it is part of capacity planning.

Summary

Caching is an agreement about acceptable staleness.
A cache key is a data model; include every dimension that changes the answer.
TTL is a safety fuse, not a complete invalidation strategy.
Cache-aside is a good default, but hot misses need stampede protection.
Serve stale data intentionally when it is better than taking down the source of truth.
Invalidation belongs in the write path.
Delete or version cached values on writes unless updating every affected key is truly simple.
CDN and browser caching depend on precise Cache-Control and Vary headers.
Do not cache sensitive or highly personalized data without a precise isolation key.
Measure misses, hot keys, stale responses, lock waits, evictions, and source-of-truth load saved.

Pop quiz

Interactive quiz

Caching for scale

A randomized review of cache keys, TTLs, invalidation, stampede protection, and stale data promises.

4of 12 questions