Caching: trading freshness for speed
Caches make reads fast by letting the system be wrong in controlled ways. Keys, TTLs, invalidation, and stampede protection explored.
Caching is a common technique for reducing latency: put Redis in front of the database, put a CDN in front of the app, or put a memoized function in front of expensive code. Latency drops and load on the source of truth decreases.
The tradeoffs show up in specific failure modes: a user updates their profile and sees the old name for five minutes, a hot key expires and 20,000 requests hit the database at the same time, or a private response gets cached under a public key. These are the cases where caching becomes an architectural concern rather than a simple optimization.
A cache is not just a faster place to put data. It is an agreement about how stale, shared, and recoverable a value is allowed to be.
Start with the stale promise
The first caching question is not "what should we put in Redis?" but rather:
How wrong is this value allowed to be, and for how long?
This is the stale promise. A product logo can be stale for a day, a user display name for a minute, and a home feed for a few seconds. A checkout price should not be stale unless it is revalidated before purchase, and a bank balance might be cached for display but cannot be used by the ledger to decide whether money can move.
Caching without a stale promise produces correctness that is incidental rather than designed, and tends to surface when two screens disagree.
The cache is a read model
A cache is a derived read model with a short memory, which means the same questions from the database post apply:
- What is the read path?
- What key identifies the answer?
- How much stale data is acceptable?
- What writes make the answer wrong?
- Can the answer be rebuilt?
- What happens when many callers rebuild it at once?
The last question is the one most often skipped. A cache improves the normal path but can make the failure path significantly worse: if a hot key serves 10,000 QPS and the cached value expires, every request can become a database request unless something stops the herd.
Cache layers
Caching is not one thing. The layer decides who shares the value and who owns invalidation.
| Layer | Typical latency | Good for | Main risk |
|---|---|---|---|
| Browser or mobile cache | 0-10ms | Static assets, user-local state, repeated reads | Hard to revoke quickly |
| CDN or edge cache | 10-80ms | Public pages, images, API responses with clear variance | Caching private data by mistake |
| App process memory | sub-ms | Small hot config, per-instance memoization | Inconsistent across instances |
| Redis or Memcached | 1-5ms inside region | Shared hot reads, sessions, expensive aggregates | Hot keys, stampedes, serialization cost |
| Database buffer/cache | varies | Indexed reads and repeated pages | Treating it like an explicit product cache |
Use the highest layer that can safely answer the request. A static image belongs on a CDN. A public product catalog page belongs on a CDN with purge or a short TTL. An authenticated account dashboard belongs in an application cache keyed by tenant and user permissions. An expensive analytics number should be precomputed, stored, and given an age.
The closer the cache is to the user, the faster it is, and the harder it is to coordinate.
Cache keys are data models
The key is the schema.
Bad key:
profile:123
Better key:
profile:v3:tenant:88:user:123:viewer:456:locale:en-US
The second key looks unwieldy because it reflects all the dimensions that change the answer. If the response varies by tenant, viewer permissions, locale, feature flag, or API version, the key must include that dimension. Without those dimensions, the cache mixes answers rather than storing one.
Common key dimensions:
| Dimension | Include it when |
|---|---|
| Version | The serialized shape or meaning can change |
| Tenant id | Data is scoped to an organization |
| User id | Response is personalized |
| Viewer id or role | Permissions affect visible fields |
| Locale | Text, currency, dates, or formatting changes |
| Page and sort | Lists are paginated or ordered |
| Feature flag cohort | Experiments change the response |
Versioned keys are the simplest deploy-time invalidation strategy:
CACHE_VERSION = 3
def profile_cache_key(tenant_id, user_id, viewer_id, locale):
return (
f"profile:v{CACHE_VERSION}:"
f"tenant:{tenant_id}:"
f"user:{user_id}:"
f"viewer:{viewer_id}:"
f"locale:{locale}"
)
When the representation changes, bump the version and old values expire naturally. This is a simple and effective approach to deploy-time invalidation.
TTL is not invalidation
TTL is a fuse. It guarantees the cache entry dies eventually, but it does not guarantee the entry is correct until then. If a user changes their display name and the cached profile has a 10-minute TTL, the old name can remain visible for 10 minutes unless the write path invalidates it. That may be acceptable, but it should be a product decision.
There are three basic choices:
| Strategy | What happens | Use it when |
|---|---|---|
| TTL only | Data expires after time passes | Staleness is harmless and writes are hard to track |
| Invalidate on write | Delete affected keys after canonical write | User-visible freshness matters |
| Update on write | Write DB and cache together | Read-after-write matters and write latency can afford it |
A reasonable default is TTL plus invalidate on write: TTL protects against missed invalidations, and invalidation protects the user from waiting for TTL.
Cache-aside
Cache-aside is the common pattern:
read:
check cache
if hit, return
if miss, load from source of truth
write cache
return
write:
update source of truth
delete affected cache keys
Here is the small version:
def get_profile(user_id):
key = f"profile:v1:user:{user_id}"
cached = cache.get_json(key)
if cached is not None:
return cached
profile = db.get_profile(user_id)
cache.set_json(key, profile, ttl_seconds=300)
return profile
This works for cold or low-traffic keys, but a hot key that is missing requires stampede protection. A production version looks like this:
import random
import time
def ttl_with_jitter(base_seconds, jitter_ratio=0.15):
spread = base_seconds * jitter_ratio
return int(base_seconds + random.uniform(-spread, spread))
def get_profile(user_id):
key = f"profile:v2:user:{user_id}"
stale_key = f"{key}:stale"
lock_key = f"lock:{key}"
cached = cache.get_json(key)
if cached is not None:
return cached
lock_acquired = cache.set_nx(lock_key, "1", ttl_seconds=10)
if not lock_acquired:
stale = cache.get_json(stale_key)
if stale is not None:
return stale
time.sleep(0.050)
cached = cache.get_json(key)
if cached is not None:
return cached
return db.get_profile(user_id)
try:
profile = db.get_profile(user_id)
cache.set_json(key, profile, ttl_seconds=ttl_with_jitter(300))
cache.set_json(stale_key, profile, ttl_seconds=1800)
return profile
finally:
cache.delete(lock_key)
This code is still simplified. In a real Redis implementation, the lock should have a unique token so one caller does not delete another caller's lock after a timeout.
The general shape is:
- One request rebuilds the missing value.
- Other requests use stale data or wait briefly.
- TTL jitter prevents many keys from expiring at the same second.
- The database is the fallback, not the first victim.
The stampede calculation
The arithmetic is straightforward:
hot key traffic: 8,000 requests/sec
database read time: 250ms
cache entry expires: now
concurrent DB reads = 8,000 x 0.250 = 2,000
One expired key can create 2,000 concurrent database reads. If the query holds a connection for 250ms, the connection pool is exhausted, unrelated endpoints start waiting, clients begin retrying, and the failure cascades into territory covered by the API reliability post. This is why "just add a five minute TTL" is not enough on its own.
Stampede defenses
Useful defenses:
| Defense | Idea | Tradeoff |
|---|---|---|
| Single-flight lock | One caller rebuilds while others wait | Lock bugs can block refresh |
| Stale-while-revalidate | Serve old value while one caller refreshes | Users may see stale data |
| TTL jitter | Spread expirations over time | Values expire less predictably |
| Probabilistic early refresh | Refresh before expiry under load | More cache writes |
| Prewarming | Fill cache before traffic arrives | Needs deployment or job discipline |
| Negative caching | Cache "not found" briefly | Can hide newly created data for the TTL |
Probabilistic early refresh looks like this:
import random
def should_refresh_early(ttl_remaining, original_ttl):
age_ratio = 1.0 - (ttl_remaining / original_ttl)
return random.random() < max(0.0, age_ratio ** 3)
Early in the TTL, refresh is unlikely. Near the end, one of the readers is likely to refresh before the key fully expires. This is not needed everywhere; use it for hot keys where an expiry cliff is dangerous.
Write-through, write-behind, and friends
The pattern names matter less than where the acknowledgment boundary sits.
| Pattern | Read path | Write path | Risk |
|---|---|---|---|
| Cache-aside | App checks cache, then DB | App writes DB, then deletes cache | Miss stampede and stale reads |
| Read-through | Cache loads from DB on miss | App writes DB, cache loads later | Cache library owns more behavior |
| Write-through | App writes cache and DB before success | User waits for both | Higher write latency |
| Write-behind | App writes cache, DB is updated later | Fast write response | Data loss if cache fails before flush |
| Refresh-ahead | Cache refreshes before expiry | Background refresh | Extra work for unused keys |
For most product systems, cache-aside is a reasonable default. Write-through is useful when read-after-write matters and the write rate is manageable. Write-behind is risky for canonical data but can work for metrics, counters, or logs where loss and replay rules are explicit. Avoid letting the cache become the only place the truth exists unless the system is intentionally designed as an in-memory store with its own durability story.
Invalidation is part of the write path
The write path post described the flow as:
command -> fact -> effects -> projections
Cache invalidation is one of those effects, which means it needs the same seriousness as search indexing or notifications.
For a profile update:
def update_profile(user_id, display_name):
with db.transaction() as tx:
tx.update_profile(
user_id=user_id,
display_name=display_name,
)
tx.insert_outbox_event(
event_type="profile.updated",
aggregate_id=user_id,
payload={"user_id": user_id},
)
return {"status": "updated"}
Then a worker invalidates derived keys:
def handle_profile_updated(event):
user_id = event.payload["user_id"]
cache.delete(f"profile:v2:user:{user_id}")
cache.delete(f"profile:v2:user:{user_id}:stale")
cache.delete(f"feed_header:v4:user:{user_id}")
Invalidation can be synchronous if freshness is critical, but with many derived keys, doing it inside the user request can turn a simple update into a slow integration path. The choice comes back to the freshness promise.
Delete usually beats update
On writes, deleting the cached value is often simpler than updating it because one write can affect many read shapes. Changing a display name may affect the user's profile, their posts, comment rows, notification previews, search snippets, and team member lists. Updating all of those cache entries correctly is difficult, while deleting the known keys and letting reads rebuild is generally safer.
The catch is key discovery: if the affected keys cannot be named, they cannot be invalidated precisely. That leaves three options:
- Use shorter TTLs.
- Use versioned namespace keys.
- Maintain an index of dependent keys.
Namespace versioning is a practical middle ground:
def get_profile_namespace(user_id):
version = cache.get_int(f"profile_namespace:user:{user_id}") or 1
return f"profile:v{version}:user:{user_id}"
def invalidate_profile_namespace(user_id):
cache.incr(f"profile_namespace:user:{user_id}")
Instead of deleting every possible profile key, increment the namespace version. New reads use new keys, and old keys expire later. This trades memory for simpler invalidation.
What I notice
Caching bugs are often ambiguity bugs. The team knows a value is cached, but nobody knows:
- Whether the cached value is public or personalized.
- Whether the key includes permissions.
- Which write invalidates it.
- Whether stale data is acceptable.
- What happens when the cache is cold.
The cache makes the system faster by hiding work, which also makes it easier to hide ownership. The remedy is to document the contract alongside the cache itself.
HTTP caching is a contract too
CDN and browser caches use headers as their API.
For a public static asset:
Cache-Control: public, max-age=31536000, immutable
This works when the filename is content-hashed:
/assets/app.7f3a21c.css
If the content changes, the URL changes.
For an authenticated API response:
Cache-Control: private, max-age=30
Vary: Authorization, Accept-Language
private tells shared caches not to store the response, and Vary tells them which request headers change the answer.
For sensitive data:
Cache-Control: no-store
Vague headers around private content should be avoided. The most severe class of cache bug is not stale data; it is one user seeing another user's data because the cache key ignored identity.
Negative caching
Caching misses can also help. If a crawler requests the same nonexistent product id 10,000 times, the database should not have to prove absence 10,000 times.
def get_product(product_id):
key = f"product:v1:{product_id}"
cached = cache.get_json(key)
if cached == {"missing": True}:
return None
if cached is not None:
return cached
product = db.get_product(product_id)
if product is None:
cache.set_json(key, {"missing": True}, ttl_seconds=30)
return None
cache.set_json(key, product, ttl_seconds=300)
return product
Keep negative TTLs short. If the product is created one second after the miss is cached, users may keep seeing "not found" until the negative entry expires. Negative caching is a tool for reducing pressure, not a substitute for the source of truth.
Hot keys
A cache can be healthy overall and still fail on one key. Examples:
global leaderboard
homepage config
latest exchange rate
celebrity profile
tenant with 80 percent of traffic
A single Redis key can become the bottleneck even if the cluster has plenty of total capacity.
Symptoms:
- High p99 on one endpoint.
- Redis CPU or network spikes on one shard.
- One key dominates command stats.
- Database spikes whenever that key expires.
Mitigations:
- Replicate the value under several keys and pick one randomly for reads.
- Cache locally in each app process for a very short TTL.
- Precompute the value and push updates.
- Split the value into smaller pieces if reads do not need the whole object.
- Give the key stronger stampede protection than normal keys.
Local memory caching is often the right choice for small, frequently accessed data. Even a two-second per-process cache can remove significant Redis pressure, with the tradeoff that every app instance has a slightly different view of the world.
Counters are special
Counters look cache-friendly: likes, views, followers, notifications. They are also where correctness arguments become slippery.
If the number is decorative, approximate is fine:
1,204 likes
about 1.2K views
If the number controls access or money, approximate is not fine:
remaining credits
available inventory
account balance
For decorative counters, batch writes:
def record_view(post_id):
cache.incr(f"views_buffer:post:{post_id}")
def flush_view_counts(limit=1000):
for key, count in cache.scan_counts("views_buffer:post:*", limit=limit):
post_id = key.rsplit(":", 1)[1]
db.increment_post_views(post_id, count)
cache.delete(key)
This reduces database write pressure but creates a loss window: if Redis loses the buffer before flush, those views disappear. This is acceptable for views and unacceptable for payments.
When not to cache
Caching should not be a reflex. Avoid caching when:
- The read is already cheap and low volume.
- The value changes constantly.
- The key would need too many personalization dimensions.
- The response contains sensitive data and the cache layer is shared.
- The miss path is more dangerous than the hit path is useful.
- The team cannot name the invalidation trigger.
The simplest cache is the one that is not added. Sometimes the right fix is an index, a smaller response, pagination, a read model, or a better query. Caching is a pressure valve rather than a substitute for modeling the data correctly.
Metrics that actually help
Hit rate is useful but not sufficient on its own. A more complete set of metrics:
| Metric | Why it matters |
|---|---|
| Hit rate by key family | Shows whether the cache is doing useful work |
| Miss rate by key family | Shows source-of-truth pressure |
| Miss latency | Shows how expensive cold reads are |
| Stampede lock wait time | Shows herd pressure |
| Stale responses served | Shows how often degraded freshness happens |
| Invalidation count | Shows write-driven churn |
| Evictions | Shows memory pressure or bad sizing |
| Hot key distribution | Shows whether one key dominates |
| Serialized payload size | Shows network and memory cost |
| Source-of-truth QPS saved | Shows the actual value of the cache |
Hit rate can be misleading on its own: a cache with 99 percent hit rate can still cause problems if the 1 percent misses all happen on one hot path with expensive queries.
What I think
The practical sequence is:
- Fix the obvious query or data model first.
- Name the stale promise.
- Choose the highest safe cache layer.
- Design the key like a schema.
- Add TTL with jitter.
- Invalidate on writes that break the promise.
- Protect hot misses with single-flight or stale-while-revalidate.
- Measure misses, not just hits.
- Keep a repair path back to the source of truth.
Caching is reasonable to underengineer only when the contract is small and explicit:
This value may be 60 seconds stale.
This write invalidates it.
This key includes tenant and viewer.
One caller rebuilds it.
Others get stale data.
That is a design. Without something like it, the cache is operating on assumptions rather than guarantees.
Tutorial checklist
For any cache, fill this out:
| Question | Example answer |
|---|---|
| Cached value | User profile card |
| Source of truth | users and profile_settings tables |
| Cache layer | Redis plus 2-second app memory for hot users |
| Key | profile:v2:tenant:{tenant_id}:user:{user_id}:viewer:{viewer_id}:locale:{locale} |
| Freshness promise | Up to 60 seconds stale, except self-view after edit |
| TTL | 300 seconds with 15 percent jitter |
| Write invalidation | profile.updated event deletes profile and feed header keys |
| Stampede defense | Single-flight Redis lock and stale fallback |
| Negative cache | Missing users cached for 30 seconds |
| Sensitive fields | Email hidden unless viewer has permission, included in key by viewer |
| Metrics | hit rate, miss latency, stale served, lock waits, hot keys |
| Repair path | Delete namespace version or flush key family |
Then ask:
If this cache is empty during peak traffic, what breaks first?
If the answer is "the database," the cache is not just an optimization; it is part of capacity planning.
Summary
- Caching is an agreement about acceptable staleness.
- A cache key is a data model; include every dimension that changes the answer.
- TTL is a safety fuse, not a complete invalidation strategy.
- Cache-aside is a good default, but hot misses need stampede protection.
- Serve stale data intentionally when it is better than taking down the source of truth.
- Invalidation belongs in the write path.
- Delete or version cached values on writes unless updating every affected key is truly simple.
- CDN and browser caching depend on precise
Cache-ControlandVaryheaders. - Do not cache sensitive or highly personalized data without a precise isolation key.
- Measure misses, hot keys, stale responses, lock waits, evictions, and source-of-truth load saved.
Pop quiz
Interactive quiz
Caching for scale
A randomized review of cache keys, TTLs, invalidation, stampede protection, and stale data promises.