Back to blog

Caching: trading freshness for speed

Caches make reads fast by letting the system be wrong in controlled ways. Keys, TTLs, invalidation, and stampede protection explored.

Caching is a common technique for reducing latency: put Redis in front of the database, put a CDN in front of the app, or put a memoized function in front of expensive code. Latency drops and load on the source of truth decreases.

The tradeoffs show up in specific failure modes: a user updates their profile and sees the old name for five minutes, a hot key expires and 20,000 requests hit the database at the same time, or a private response gets cached under a public key. These are the cases where caching becomes an architectural concern rather than a simple optimization.

A cache is not just a faster place to put data. It is an agreement about how stale, shared, and recoverable a value is allowed to be.

Cache layers and stampede protectionAnimated cache rings show fast cache hits, a cache miss reaching the database, an expiry stampede, and lock protection during refill.Cache layers are freshness boundariesfast when the answer is nearby, costly when every layer misses togetherClient cacheCDN/App cacheDBuserhit: 5msmiss: 500msstampede after expiryone loader, others waitStatewarmmissherdrefill

Start with the stale promise

The first caching question is not "what should we put in Redis?" but rather:

How wrong is this value allowed to be, and for how long?

This is the stale promise. A product logo can be stale for a day, a user display name for a minute, and a home feed for a few seconds. A checkout price should not be stale unless it is revalidated before purchase, and a bank balance might be cached for display but cannot be used by the ledger to decide whether money can move.

Caching without a stale promise produces correctness that is incidental rather than designed, and tends to surface when two screens disagree.

The cache is a read model

A cache is a derived read model with a short memory, which means the same questions from the database post apply:

  1. What is the read path?
  2. What key identifies the answer?
  3. How much stale data is acceptable?
  4. What writes make the answer wrong?
  5. Can the answer be rebuilt?
  6. What happens when many callers rebuild it at once?

The last question is the one most often skipped. A cache improves the normal path but can make the failure path significantly worse: if a hot key serves 10,000 QPS and the cached value expires, every request can become a database request unless something stops the herd.

Cache layers

Caching is not one thing. The layer decides who shares the value and who owns invalidation.

LayerTypical latencyGood forMain risk
Browser or mobile cache0-10msStatic assets, user-local state, repeated readsHard to revoke quickly
CDN or edge cache10-80msPublic pages, images, API responses with clear varianceCaching private data by mistake
App process memorysub-msSmall hot config, per-instance memoizationInconsistent across instances
Redis or Memcached1-5ms inside regionShared hot reads, sessions, expensive aggregatesHot keys, stampedes, serialization cost
Database buffer/cachevariesIndexed reads and repeated pagesTreating it like an explicit product cache

Use the highest layer that can safely answer the request. A static image belongs on a CDN. A public product catalog page belongs on a CDN with purge or a short TTL. An authenticated account dashboard belongs in an application cache keyed by tenant and user permissions. An expensive analytics number should be precomputed, stored, and given an age.

The closer the cache is to the user, the faster it is, and the harder it is to coordinate.

Cache keys are data models

The key is the schema.

Bad key:

profile:123

Better key:

profile:v3:tenant:88:user:123:viewer:456:locale:en-US

The second key looks unwieldy because it reflects all the dimensions that change the answer. If the response varies by tenant, viewer permissions, locale, feature flag, or API version, the key must include that dimension. Without those dimensions, the cache mixes answers rather than storing one.

Common key dimensions:

DimensionInclude it when
VersionThe serialized shape or meaning can change
Tenant idData is scoped to an organization
User idResponse is personalized
Viewer id or rolePermissions affect visible fields
LocaleText, currency, dates, or formatting changes
Page and sortLists are paginated or ordered
Feature flag cohortExperiments change the response

Versioned keys are the simplest deploy-time invalidation strategy:

CACHE_VERSION = 3


def profile_cache_key(tenant_id, user_id, viewer_id, locale):
    return (
        f"profile:v{CACHE_VERSION}:"
        f"tenant:{tenant_id}:"
        f"user:{user_id}:"
        f"viewer:{viewer_id}:"
        f"locale:{locale}"
    )

When the representation changes, bump the version and old values expire naturally. This is a simple and effective approach to deploy-time invalidation.

TTL is not invalidation

TTL is a fuse. It guarantees the cache entry dies eventually, but it does not guarantee the entry is correct until then. If a user changes their display name and the cached profile has a 10-minute TTL, the old name can remain visible for 10 minutes unless the write path invalidates it. That may be acceptable, but it should be a product decision.

There are three basic choices:

StrategyWhat happensUse it when
TTL onlyData expires after time passesStaleness is harmless and writes are hard to track
Invalidate on writeDelete affected keys after canonical writeUser-visible freshness matters
Update on writeWrite DB and cache togetherRead-after-write matters and write latency can afford it

A reasonable default is TTL plus invalidate on write: TTL protects against missed invalidations, and invalidation protects the user from waiting for TTL.

Cache-aside

Cache-aside is the common pattern:

read:
  check cache
  if hit, return
  if miss, load from source of truth
  write cache
  return

write:
  update source of truth
  delete affected cache keys

Here is the small version:

def get_profile(user_id):
    key = f"profile:v1:user:{user_id}"

    cached = cache.get_json(key)
    if cached is not None:
        return cached

    profile = db.get_profile(user_id)
    cache.set_json(key, profile, ttl_seconds=300)
    return profile

This works for cold or low-traffic keys, but a hot key that is missing requires stampede protection. A production version looks like this:

import random
import time


def ttl_with_jitter(base_seconds, jitter_ratio=0.15):
    spread = base_seconds * jitter_ratio
    return int(base_seconds + random.uniform(-spread, spread))


def get_profile(user_id):
    key = f"profile:v2:user:{user_id}"
    stale_key = f"{key}:stale"
    lock_key = f"lock:{key}"

    cached = cache.get_json(key)
    if cached is not None:
        return cached

    lock_acquired = cache.set_nx(lock_key, "1", ttl_seconds=10)

    if not lock_acquired:
        stale = cache.get_json(stale_key)
        if stale is not None:
            return stale

        time.sleep(0.050)
        cached = cache.get_json(key)
        if cached is not None:
            return cached

        return db.get_profile(user_id)

    try:
        profile = db.get_profile(user_id)
        cache.set_json(key, profile, ttl_seconds=ttl_with_jitter(300))
        cache.set_json(stale_key, profile, ttl_seconds=1800)
        return profile
    finally:
        cache.delete(lock_key)

This code is still simplified. In a real Redis implementation, the lock should have a unique token so one caller does not delete another caller's lock after a timeout.

The general shape is:

  1. One request rebuilds the missing value.
  2. Other requests use stale data or wait briefly.
  3. TTL jitter prevents many keys from expiring at the same second.
  4. The database is the fallback, not the first victim.

The stampede calculation

The arithmetic is straightforward:

hot key traffic:       8,000 requests/sec
database read time:    250ms
cache entry expires:   now

concurrent DB reads = 8,000 x 0.250 = 2,000

One expired key can create 2,000 concurrent database reads. If the query holds a connection for 250ms, the connection pool is exhausted, unrelated endpoints start waiting, clients begin retrying, and the failure cascades into territory covered by the API reliability post. This is why "just add a five minute TTL" is not enough on its own.

Stampede defenses

Useful defenses:

DefenseIdeaTradeoff
Single-flight lockOne caller rebuilds while others waitLock bugs can block refresh
Stale-while-revalidateServe old value while one caller refreshesUsers may see stale data
TTL jitterSpread expirations over timeValues expire less predictably
Probabilistic early refreshRefresh before expiry under loadMore cache writes
PrewarmingFill cache before traffic arrivesNeeds deployment or job discipline
Negative cachingCache "not found" brieflyCan hide newly created data for the TTL

Probabilistic early refresh looks like this:

import random


def should_refresh_early(ttl_remaining, original_ttl):
    age_ratio = 1.0 - (ttl_remaining / original_ttl)
    return random.random() < max(0.0, age_ratio ** 3)

Early in the TTL, refresh is unlikely. Near the end, one of the readers is likely to refresh before the key fully expires. This is not needed everywhere; use it for hot keys where an expiry cliff is dangerous.

Write-through, write-behind, and friends

The pattern names matter less than where the acknowledgment boundary sits.

PatternRead pathWrite pathRisk
Cache-asideApp checks cache, then DBApp writes DB, then deletes cacheMiss stampede and stale reads
Read-throughCache loads from DB on missApp writes DB, cache loads laterCache library owns more behavior
Write-throughApp writes cache and DB before successUser waits for bothHigher write latency
Write-behindApp writes cache, DB is updated laterFast write responseData loss if cache fails before flush
Refresh-aheadCache refreshes before expiryBackground refreshExtra work for unused keys

For most product systems, cache-aside is a reasonable default. Write-through is useful when read-after-write matters and the write rate is manageable. Write-behind is risky for canonical data but can work for metrics, counters, or logs where loss and replay rules are explicit. Avoid letting the cache become the only place the truth exists unless the system is intentionally designed as an in-memory store with its own durability story.

Invalidation is part of the write path

The write path post described the flow as:

command -> fact -> effects -> projections

Cache invalidation is one of those effects, which means it needs the same seriousness as search indexing or notifications.

For a profile update:

def update_profile(user_id, display_name):
    with db.transaction() as tx:
        tx.update_profile(
            user_id=user_id,
            display_name=display_name,
        )
        tx.insert_outbox_event(
            event_type="profile.updated",
            aggregate_id=user_id,
            payload={"user_id": user_id},
        )

    return {"status": "updated"}

Then a worker invalidates derived keys:

def handle_profile_updated(event):
    user_id = event.payload["user_id"]

    cache.delete(f"profile:v2:user:{user_id}")
    cache.delete(f"profile:v2:user:{user_id}:stale")
    cache.delete(f"feed_header:v4:user:{user_id}")

Invalidation can be synchronous if freshness is critical, but with many derived keys, doing it inside the user request can turn a simple update into a slow integration path. The choice comes back to the freshness promise.

Delete usually beats update

On writes, deleting the cached value is often simpler than updating it because one write can affect many read shapes. Changing a display name may affect the user's profile, their posts, comment rows, notification previews, search snippets, and team member lists. Updating all of those cache entries correctly is difficult, while deleting the known keys and letting reads rebuild is generally safer.

The catch is key discovery: if the affected keys cannot be named, they cannot be invalidated precisely. That leaves three options:

  1. Use shorter TTLs.
  2. Use versioned namespace keys.
  3. Maintain an index of dependent keys.

Namespace versioning is a practical middle ground:

def get_profile_namespace(user_id):
    version = cache.get_int(f"profile_namespace:user:{user_id}") or 1
    return f"profile:v{version}:user:{user_id}"


def invalidate_profile_namespace(user_id):
    cache.incr(f"profile_namespace:user:{user_id}")

Instead of deleting every possible profile key, increment the namespace version. New reads use new keys, and old keys expire later. This trades memory for simpler invalidation.

What I notice

Caching bugs are often ambiguity bugs. The team knows a value is cached, but nobody knows:

  1. Whether the cached value is public or personalized.
  2. Whether the key includes permissions.
  3. Which write invalidates it.
  4. Whether stale data is acceptable.
  5. What happens when the cache is cold.

The cache makes the system faster by hiding work, which also makes it easier to hide ownership. The remedy is to document the contract alongside the cache itself.

HTTP caching is a contract too

CDN and browser caches use headers as their API.

For a public static asset:

Cache-Control: public, max-age=31536000, immutable

This works when the filename is content-hashed:

/assets/app.7f3a21c.css

If the content changes, the URL changes.

For an authenticated API response:

Cache-Control: private, max-age=30
Vary: Authorization, Accept-Language

private tells shared caches not to store the response, and Vary tells them which request headers change the answer.

For sensitive data:

Cache-Control: no-store

Vague headers around private content should be avoided. The most severe class of cache bug is not stale data; it is one user seeing another user's data because the cache key ignored identity.

Negative caching

Caching misses can also help. If a crawler requests the same nonexistent product id 10,000 times, the database should not have to prove absence 10,000 times.

def get_product(product_id):
    key = f"product:v1:{product_id}"
    cached = cache.get_json(key)

    if cached == {"missing": True}:
        return None

    if cached is not None:
        return cached

    product = db.get_product(product_id)
    if product is None:
        cache.set_json(key, {"missing": True}, ttl_seconds=30)
        return None

    cache.set_json(key, product, ttl_seconds=300)
    return product

Keep negative TTLs short. If the product is created one second after the miss is cached, users may keep seeing "not found" until the negative entry expires. Negative caching is a tool for reducing pressure, not a substitute for the source of truth.

Hot keys

A cache can be healthy overall and still fail on one key. Examples:

global leaderboard
homepage config
latest exchange rate
celebrity profile
tenant with 80 percent of traffic

A single Redis key can become the bottleneck even if the cluster has plenty of total capacity.

Symptoms:

  1. High p99 on one endpoint.
  2. Redis CPU or network spikes on one shard.
  3. One key dominates command stats.
  4. Database spikes whenever that key expires.

Mitigations:

  1. Replicate the value under several keys and pick one randomly for reads.
  2. Cache locally in each app process for a very short TTL.
  3. Precompute the value and push updates.
  4. Split the value into smaller pieces if reads do not need the whole object.
  5. Give the key stronger stampede protection than normal keys.

Local memory caching is often the right choice for small, frequently accessed data. Even a two-second per-process cache can remove significant Redis pressure, with the tradeoff that every app instance has a slightly different view of the world.

Counters are special

Counters look cache-friendly: likes, views, followers, notifications. They are also where correctness arguments become slippery.

If the number is decorative, approximate is fine:

1,204 likes
about 1.2K views

If the number controls access or money, approximate is not fine:

remaining credits
available inventory
account balance

For decorative counters, batch writes:

def record_view(post_id):
    cache.incr(f"views_buffer:post:{post_id}")


def flush_view_counts(limit=1000):
    for key, count in cache.scan_counts("views_buffer:post:*", limit=limit):
        post_id = key.rsplit(":", 1)[1]
        db.increment_post_views(post_id, count)
        cache.delete(key)

This reduces database write pressure but creates a loss window: if Redis loses the buffer before flush, those views disappear. This is acceptable for views and unacceptable for payments.

When not to cache

Caching should not be a reflex. Avoid caching when:

  1. The read is already cheap and low volume.
  2. The value changes constantly.
  3. The key would need too many personalization dimensions.
  4. The response contains sensitive data and the cache layer is shared.
  5. The miss path is more dangerous than the hit path is useful.
  6. The team cannot name the invalidation trigger.

The simplest cache is the one that is not added. Sometimes the right fix is an index, a smaller response, pagination, a read model, or a better query. Caching is a pressure valve rather than a substitute for modeling the data correctly.

Metrics that actually help

Hit rate is useful but not sufficient on its own. A more complete set of metrics:

MetricWhy it matters
Hit rate by key familyShows whether the cache is doing useful work
Miss rate by key familyShows source-of-truth pressure
Miss latencyShows how expensive cold reads are
Stampede lock wait timeShows herd pressure
Stale responses servedShows how often degraded freshness happens
Invalidation countShows write-driven churn
EvictionsShows memory pressure or bad sizing
Hot key distributionShows whether one key dominates
Serialized payload sizeShows network and memory cost
Source-of-truth QPS savedShows the actual value of the cache

Hit rate can be misleading on its own: a cache with 99 percent hit rate can still cause problems if the 1 percent misses all happen on one hot path with expensive queries.

What I think

The practical sequence is:

  1. Fix the obvious query or data model first.
  2. Name the stale promise.
  3. Choose the highest safe cache layer.
  4. Design the key like a schema.
  5. Add TTL with jitter.
  6. Invalidate on writes that break the promise.
  7. Protect hot misses with single-flight or stale-while-revalidate.
  8. Measure misses, not just hits.
  9. Keep a repair path back to the source of truth.

Caching is reasonable to underengineer only when the contract is small and explicit:

This value may be 60 seconds stale.
This write invalidates it.
This key includes tenant and viewer.
One caller rebuilds it.
Others get stale data.

That is a design. Without something like it, the cache is operating on assumptions rather than guarantees.

Tutorial checklist

For any cache, fill this out:

QuestionExample answer
Cached valueUser profile card
Source of truthusers and profile_settings tables
Cache layerRedis plus 2-second app memory for hot users
Keyprofile:v2:tenant:{tenant_id}:user:{user_id}:viewer:{viewer_id}:locale:{locale}
Freshness promiseUp to 60 seconds stale, except self-view after edit
TTL300 seconds with 15 percent jitter
Write invalidationprofile.updated event deletes profile and feed header keys
Stampede defenseSingle-flight Redis lock and stale fallback
Negative cacheMissing users cached for 30 seconds
Sensitive fieldsEmail hidden unless viewer has permission, included in key by viewer
Metricshit rate, miss latency, stale served, lock waits, hot keys
Repair pathDelete namespace version or flush key family

Then ask:

If this cache is empty during peak traffic, what breaks first?

If the answer is "the database," the cache is not just an optimization; it is part of capacity planning.

Summary

  1. Caching is an agreement about acceptable staleness.
  2. A cache key is a data model; include every dimension that changes the answer.
  3. TTL is a safety fuse, not a complete invalidation strategy.
  4. Cache-aside is a good default, but hot misses need stampede protection.
  5. Serve stale data intentionally when it is better than taking down the source of truth.
  6. Invalidation belongs in the write path.
  7. Delete or version cached values on writes unless updating every affected key is truly simple.
  8. CDN and browser caching depend on precise Cache-Control and Vary headers.
  9. Do not cache sensitive or highly personalized data without a precise isolation key.
  10. Measure misses, hot keys, stale responses, lock waits, evictions, and source-of-truth load saved.

Pop quiz

Interactive quiz

Caching for scale

A randomized review of cache keys, TTLs, invalidation, stampede protection, and stale data promises.

4of 12 questions
Question 1 of 425%
What is the first question to ask before adding a cache?