Back to blog

Databases don't scale, data models do

The database is rarely the first bottleneck. The schema, indexes, and read model decide how much work every request creates.

A slow database usually looks like a database problem first.

CPU is high. Queries queue. The app times out. Someone suggests a larger instance, read replicas, sharding, or a different database.

Sometimes that is correct. More often, the data model is forcing the database to do too much work per request.

This post is about the cheaper path: model the data around the access pattern before changing the storage engine.

Normalized and denormalized query pathsAnimated split-screen comparison between a normalized schema with join paths and a denormalized feed cache with a direct index scan.NormalizedDenormalizeduserspostscommentslikesfeed_cacheQuery time450ms890ms1200msQuery time12ms15ms18ms

Start with the read path

A schema is not good or bad in isolation. It is good or bad for a workload.

Before adding tables or indexes, write down the read path:

  1. What endpoint or job is reading?
  2. Which columns does it filter by?
  3. Which column decides sort order?
  4. How many rows can match before LIMIT applies?
  5. Does the read need fresh data, or is stale data acceptable?
  6. Is this path hit 10 times per minute, or 10,000 times per second?

The answer changes the model.

An admin report can join six tables and run for two seconds. A feed endpoint cannot. A nightly billing job can scan a large table. A signup request cannot wait on the same scan.

The normalized version

Start with a normal schema:

CREATE TABLE users (
  id bigint PRIMARY KEY,
  display_name text NOT NULL
);

CREATE TABLE posts (
  id bigint PRIMARY KEY,
  user_id bigint NOT NULL REFERENCES users (id),
  title text NOT NULL,
  created_at timestamptz NOT NULL
);

CREATE TABLE comments (
  id bigint PRIMARY KEY,
  post_id bigint NOT NULL REFERENCES posts (id),
  author_id bigint NOT NULL REFERENCES users (id),
  content text NOT NULL,
  created_at timestamptz NOT NULL
);

This is clean. Each fact lives in one place.

Now add the read:

SELECT
  u.display_name AS post_author,
  p.title AS post_title,
  c.id AS comment_id,
  c.content,
  c.created_at
FROM users u
JOIN posts p ON p.user_id = u.id
JOIN comments c ON c.post_id = p.id
WHERE u.id = $1
ORDER BY c.created_at DESC
LIMIT 100;

The product question is simple: "Show the latest comments on this user's posts."

The database question is bigger:

  1. Find the user's posts.
  2. Find comments for those posts.
  3. Order comments across all matching posts.
  4. Return the newest 100 rows.
  5. Fetch display data from related tables.

LIMIT 100 does not make this automatically cheap. If the user has 50,000 posts and 80M comments across those posts (Most prolific user in history), the planner still needs a path to find the newest 100 without doing work proportional to the full match set.

Indexes help:

CREATE INDEX posts_user_id_idx
  ON posts (user_id, id);

CREATE INDEX comments_post_created_idx
  ON comments (post_id, created_at DESC);

This is better than a full scan. It is still not the same as one direct lookup. The query shape requires a join boundary between posts and comments, then an ordering problem across many post_id values.

What I notice

The expensive part is rarely the join keyword itself.

The expensive part is the amount of candidate data created before the database can return the first page. A join over 100 rows is fine. A join that creates 4M candidate rows and sorts them to return 100 rows is not fine.

This is where teams often mislabel the problem.

"Postgres is slow" is usually too broad.

"This endpoint needs latest comments by user, but comments are keyed by post" is specific enough to fix.

Build the read model

For a hot feed path, create a read-optimized table:

CREATE TABLE comment_feed (
  user_id bigint NOT NULL,
  created_at timestamptz NOT NULL,
  comment_id bigint NOT NULL,
  post_id bigint NOT NULL,
  post_title text NOT NULL,
  post_author_name text NOT NULL,
  comment_author_id bigint NOT NULL,
  comment_author_name text NOT NULL,
  content text NOT NULL,
  PRIMARY KEY (user_id, created_at, comment_id)
);

CREATE INDEX comment_feed_user_recent_idx
  ON comment_feed (user_id, created_at DESC, comment_id);

The read becomes:

SELECT
  post_title,
  post_author_name,
  comment_author_name,
  content,
  created_at
FROM comment_feed
WHERE user_id = $1
ORDER BY created_at DESC
LIMIT 100;

Now the database can walk one index range in the same order the endpoint needs.

That is the main idea. Not "avoid joins always." Not "duplicate everything." The idea is to make the common read path proportional to the page size, not proportional to the historical data behind the page.

The cost of denormalization

Denormalization moves work from read time to write time.

When a comment is created, more than one thing may happen:

BEGIN;

INSERT INTO comments (
  id,
  post_id,
  author_id,
  content,
  created_at
) VALUES (
  $1,
  $2,
  $3,
  $4,
  now()
);

INSERT INTO comment_feed (
  user_id,
  created_at,
  comment_id,
  post_id,
  post_title,
  post_author_name,
  comment_author_id,
  comment_author_name,
  content
)
SELECT
  p.user_id,
  now(),
  $1,
  p.id,
  p.title,
  post_author.display_name,
  comment_author.id,
  comment_author.display_name,
  $4
FROM posts p
JOIN users post_author ON post_author.id = p.user_id
JOIN users comment_author ON comment_author.id = $3
WHERE p.id = $2;

COMMIT;

One logical write now creates multiple physical writes. That is write amplification.

It is acceptable when:

  1. Reads dominate writes.
  2. The read path has strict latency needs.
  3. The duplicated fields are small and stable enough.
  4. You have a repair path for rebuilding the derived table.

It is risky when:

  1. The duplicated fields change constantly.
  2. Writes are already the bottleneck.
  3. The read model needs to update thousands of rows synchronously.
  4. No one owns backfills and consistency checks.

Optimize for reads, batch writes

In many product systems, read paths outnumber write paths. That does not mean writes are unimportant. It means the write path can often absorb extra work asynchronously.

For example:

comment.created
  -> insert canonical comment
  -> enqueue feed projection update
  -> enqueue notification update
  -> enqueue search indexing

The request path can commit the canonical row and enqueue durable work. Workers can update read models in batches.

This creates an explicit product decision:

  1. If the user must see the new data immediately, update the read model synchronously.
  2. If a delay of a few seconds is acceptable, update it asynchronously.
  3. If the read model is only for analytics, update it in larger batches.

Freshness is part of schema design. Treat it as a requirement, not an implementation detail.

Indexes are data models too

An index is a stored projection of the table. It has a shape. It has maintenance cost. It is not a general speed switch.

Use the query to choose the index order.

For this query:

SELECT *
FROM comment_feed
WHERE user_id = $1
ORDER BY created_at DESC
LIMIT 100;

This index matches the access pattern:

CREATE INDEX comment_feed_user_recent_idx
  ON comment_feed (user_id, created_at DESC);

This one looks similar but is weaker for the same query:

CREATE INDEX comment_feed_recent_user_idx
  ON comment_feed (created_at DESC, user_id);

The first index groups one user's rows together, already sorted by time. The second index starts with time across all users. The database may still need to skip over many rows from other users.

Column order matters.

B-tree, hash, covering, partial

Most everyday Postgres indexes are B-tree indexes. They support equality, range scans, and ordered reads. That makes them the default choice for feed queries, lookup pages, account lists, and most filters.

Hash indexes are narrower. They are for equality lookup only. They do not support order. If you need ORDER BY, a hash index is the wrong shape.

Covering indexes use INCLUDE columns to avoid extra table fetches for small, commonly returned fields:

CREATE INDEX comment_feed_covering_idx
  ON comment_feed (user_id, created_at DESC)
  INCLUDE (comment_id, post_id, post_title, comment_author_name);

Do not put large text blobs into covering indexes by default. You are copying data into the index. That can increase storage, cache pressure, and write cost.

Partial indexes index only the rows a query actually needs:

CREATE INDEX comments_visible_recent_idx
  ON comments (post_id, created_at DESC)
  WHERE deleted_at IS NULL;

This is useful when most queries ignore deleted, archived, private, or expired rows.

The wrong index can be worse than no index because every index adds write cost. Each insert, update, and delete has more structures to maintain. Too many unused indexes slow writes and make maintenance heavier.

When to add a column vs. a table

Add a column when:

  1. The value is one-to-one with the row.
  2. It is read with the row most of the time.
  3. It changes at the same lifecycle as the row.
  4. It has bounded size.

Create another table when:

  1. The relationship is one-to-many.
  2. The data has a different lifecycle.
  3. The data changes at a much higher rate.
  4. The data is optional, sparse, or large.

Example:

-- Usually fine: one post has one current moderation state.
ALTER TABLE posts
ADD COLUMN moderation_state text NOT NULL DEFAULT 'pending';

-- Better as a table: one post has many moderation events.
CREATE TABLE post_moderation_events (
  id bigint PRIMARY KEY,
  post_id bigint NOT NULL REFERENCES posts (id),
  actor_id bigint NOT NULL REFERENCES users (id),
  action text NOT NULL,
  created_at timestamptz NOT NULL
);

The decision is not about purity. It is about read cost, write cost, and lifecycle.

Partitioning

Partitioning splits one logical table into smaller physical pieces.

Horizontal partitioning splits by rows:

comments_2026_05
comments_2026_06
comments_2026_07

Vertical partitioning splits by columns:

posts
post_bodies
post_embeddings

Horizontal partitioning helps when queries can exclude partitions. A query for May 2026 comments does not need to inspect June 2026 partitions.

Vertical partitioning helps when a small hot row carries large cold fields. A list page may need post_id, title, and created_at, but not a 40 KB body or a 1,536-dimension embedding.

Partitioning helps only if the query includes the partition key.

-- Good partition fit: includes created_at.
SELECT *
FROM comments
WHERE created_at >= '2026-05-01'
  AND created_at < '2026-06-01';

-- Poor partition fit: no partition key.
SELECT *
FROM comments
WHERE author_id = $1;

If the table is partitioned by month, the second query may still touch many partitions.

Hot partitions

Partitioning can create a new bottleneck.

If you partition writes by created_at, almost every new row goes into the current partition. Historical partitions are quiet. The current partition is hot.

If you partition by tenant_id, one large tenant can dominate one partition.

If you hash by user_id, writes spread more evenly, but range queries by date become harder.

There is no free partition key. Pick the key that matches the dominant access pattern and the dominant failure mode.

Sharding decision point

Sharding is partitioning across database servers. It is operationally expensive, so delay it until the numbers justify it.

Consider sharding when:

  1. One primary cannot sustain the write rate.
  2. One database cannot hold the working set.
  3. Backups, restores, vacuum, or migrations no longer fit operational windows.
  4. Most critical queries include a shard key.
  5. Cross-shard transactions are rare or can be redesigned.

Avoid sharding when:

  1. The slow query is missing an index.
  2. The app has N+1 queries.
  3. One hot customer or hot key is the real issue.
  4. The product needs frequent global ordering, global uniqueness, or cross-tenant reports.

Sharding does not remove data modeling work. It makes bad data models harder to fix.

What I think

The practical sequence is:

  1. Measure the slow path.
  2. Write the query shape in plain language.
  3. Add the obvious index.
  4. Check the plan with EXPLAIN (ANALYZE, BUFFERS).
  5. If the endpoint is still doing work proportional to history, build a read model.
  6. If write amplification becomes the bottleneck, batch or queue derived updates.
  7. Partition for operations and query exclusion.
  8. Shard only when one database is no longer a reasonable unit.

This is why "database scaling" is often a misleading phrase. The database executes the work your model asks for. Change the work first.

Tutorial checklist

For any hot read path, fill this out:

QuestionExample answer
Product readLatest comments on a user's posts
Filteruser_id = $1
Sortcreated_at DESC
Page size100 rows
Candidate setUp to tens of millions for large users
FreshnessA few seconds stale is acceptable
Source of truthcomments, posts, users
Read modelcomment_feed
Primary index(user_id, created_at DESC)
Repair pathRebuild feed rows from canonical comments

Then choose the lowest-complexity design that makes the common read proportional to the page size.

Summary

  1. Schema design is where scale begins.
  2. Normalization is good for source-of-truth data, but hot reads often need derived models.
  3. Denormalization trades read speed for write amplification and consistency work.
  4. Index order should match filter columns, sort columns, and page limits.
  5. Partial and covering indexes help when they match specific query shapes.
  6. Partitioning helps when queries include the partition key.
  7. Sharding is a last step, not a first fix.

Pop quiz

Interactive quiz

Data modeling for scale

A randomized review of schema, index, denormalization, and partitioning decisions from this post.

4of 10 questions
Question 1 of 425%
What should you write down before changing the schema for a hot endpoint?