May 12, 2026

Scale lives in the data model

The schema, indexes, and read model quietly decide how much work each request creates. Shape the work, and the database follows.

A slow database tends to draw all the attention to itself.

CPU is high. Queries queue. The app times out. The first ideas that arrive are usually about the database itself: a larger instance, read replicas, sharding, a different storage engine.

Any of those can be the right move in the right situation. But the move that helps most often sits a little upstream. The data model is asking the database to do a large amount of work for every request, and reshaping the request shrinks the work before it ever reaches the engine.

That is the path this post follows. Model the data around the way it is actually read, and the storage engine gets to carry a lighter load.

Start with the read path

A schema only ever fits a particular workload, and the workload is what reveals the fit. So it helps to begin there, with how the data is read, rather than with the tables themselves.

Before adding tables or indexes, write the read path down:

What endpoint or job is reading?
Which columns does it filter by?
Which column decides sort order?
How many rows can match before LIMIT applies?
Does the read need fresh data, or is stale data acceptable?
Is this path hit 10 times per minute, or 10,000 times per second?

Each answer reshapes the model.

An admin report has room to join six tables and run for two seconds. A feed endpoint answers in tens of milliseconds, so it earns a model that fits inside that budget. A nightly billing job has hours to scan a large table. A signup request returns while the user waits, so it reads from a path sized for that wait. Different budgets, different shapes. None of them wrong.

The normalized version

Start with an ordinary, normalized schema:

CREATE TABLE users (
  id bigint PRIMARY KEY,
  display_name text NOT NULL
);

CREATE TABLE posts (
  id bigint PRIMARY KEY,
  user_id bigint NOT NULL REFERENCES users (id),
  title text NOT NULL,
  created_at timestamptz NOT NULL
);

CREATE TABLE comments (
  id bigint PRIMARY KEY,
  post_id bigint NOT NULL REFERENCES posts (id),
  author_id bigint NOT NULL REFERENCES users (id),
  content text NOT NULL,
  created_at timestamptz NOT NULL
);

This is clean. Each fact lives in exactly one place, which is a good place for facts to live.

Now add the read:

SELECT
  u.display_name AS post_author,
  p.title AS post_title,
  c.id AS comment_id,
  c.content,
  c.created_at
FROM users u
JOIN posts p ON p.user_id = u.id
JOIN comments c ON c.post_id = p.id
WHERE u.id = $1
ORDER BY c.created_at DESC
LIMIT 100;

The product question is simple: "Show the latest comments on this user's posts."

The question the database hears is larger:

Find the user's posts.
Find comments for those posts.
Order comments across all matching posts.
Return the newest 100 rows.
Fetch display data from related tables.

LIMIT 100 does not make this cheap on its own. If the user has 50,000 posts and 80M comments across them, the planner still needs some path to the newest 100 that does not first touch the whole match set.

Indexes help here:

CREATE INDEX posts_user_id_idx
  ON posts (user_id, id);

CREATE INDEX comments_post_created_idx
  ON comments (post_id, created_at DESC);

With these indexes the planner walks a narrow range to find the user's posts, then walks comment ranges keyed by each post. The query still carries a join boundary between posts and comments, and after that an ordering problem across many post_id values. A direct lookup, by contrast, walks a single index range in the order the page already needs.

What I notice

The cost lives in the amount of candidate data created before the database can return the first page.

A join over 100 rows returns at the cost of 100 rows. A join that builds 4M candidate rows carries the cost of sorting all four million just to surface the first 100. Same query, same LIMIT; the work simply scales with the candidate set.

And the words a team reaches for tend to shape the fix it finds.

"Postgres is slow" describes the whole system at once, so the repair stays vague and the conversation goes in circles.

"This endpoint needs latest comments by user, and comments are keyed by post" names the exact shape. Now the repair is a concrete schema change, and the path forward is clear.

Build the read model

For a hot feed path, give the read its own table, shaped for exactly how it is read:

CREATE TABLE comment_feed (
  user_id bigint NOT NULL,
  created_at timestamptz NOT NULL,
  comment_id bigint NOT NULL,
  post_id bigint NOT NULL,
  post_title text NOT NULL,
  post_author_name text NOT NULL,
  comment_author_id bigint NOT NULL,
  comment_author_name text NOT NULL,
  content text NOT NULL,
  PRIMARY KEY (user_id, created_at, comment_id)
);

CREATE INDEX comment_feed_user_recent_idx
  ON comment_feed (user_id, created_at DESC, comment_id);

The read becomes:

SELECT
  post_title,
  post_author_name,
  comment_author_name,
  content,
  created_at
FROM comment_feed
WHERE user_id = $1
ORDER BY created_at DESC
LIMIT 100;

Now the database walks one index range, in the same order the endpoint hands back to the user.

That is the whole idea, really: make the common read proportional to the page size, so the work scales with the page returned rather than with all of history.

The cost of denormalization

Denormalization does not remove work. It moves it from read time to write time, where there is often more room for it.

When a comment is created, several writes now happen together:

BEGIN;

INSERT INTO comments (
  id,
  post_id,
  author_id,
  content,
  created_at
) VALUES (
  $1,
  $2,
  $3,
  $4,
  now()
);

INSERT INTO comment_feed (
  user_id,
  created_at,
  comment_id,
  post_id,
  post_title,
  post_author_name,
  comment_author_id,
  comment_author_name,
  content
)
SELECT
  p.user_id,
  now(),
  $1,
  p.id,
  p.title,
  post_author.display_name,
  comment_author.id,
  comment_author.display_name,
  $4
FROM posts p
JOIN users post_author ON post_author.id = p.user_id
JOIN users comment_author ON comment_author.id = $3
WHERE p.id = $2;

COMMIT;

One logical write becomes several physical writes. That is write amplification, and it is worth naming plainly rather than discovering by surprise later.

This design earns its keep when:

Reads dominate writes.
The read path has strict latency needs.
The duplicated fields are small and stable enough.
You have a repair path for rebuilding the derived table.

And it asks for more care, and closer watching, when:

The duplicated fields change constantly.
Writes already carry the heaviest load.
The read model needs to update thousands of rows synchronously.
Backfills and consistency checks lack an owner.

Optimize for reads, batch writes

In many product systems reads arrive far more often than writes, sometimes hundreds of reads for every write. That imbalance is an opening: the write path usually has slack to absorb extra work, and much of that work can happen asynchronously.

For example:

comment.created
  -> insert canonical comment
  -> enqueue feed projection update
  -> enqueue notification update
  -> enqueue search indexing

The request path commits the canonical row and enqueues durable work, then returns. Workers update the read models in batches, on their own time.

This turns freshness into an explicit product decision rather than an accident of implementation:

If the user must see the new data immediately, update the read model synchronously.
If a delay of a few seconds is acceptable, update it asynchronously.
If the read model is only for analytics, update it in larger batches.

Freshness is part of schema design, not separate from it. It is worth deciding on purpose.

Indexes are data models too

An index is itself a small data model: a stored projection of the table. It has a shape, it has a maintenance cost, and it speeds up only the query shapes it was built for. Nothing more, nothing less.

So let the query choose the index order.

For this query:

SELECT *
FROM comment_feed
WHERE user_id = $1
ORDER BY created_at DESC
LIMIT 100;

This index matches the access pattern:

CREATE INDEX comment_feed_user_recent_idx
  ON comment_feed (user_id, created_at DESC);

This index has the same columns in a different order:

CREATE INDEX comment_feed_recent_user_idx
  ON comment_feed (created_at DESC, user_id);

The first index groups one user's rows together, already sorted by time, so the query walks one contiguous range and stops at the page limit. The second orders rows by time across all users, so the query reads down that shared timeline and steps past everyone else's rows along the way. Same columns, very different amount of work.

Column order shapes the work.

B-tree, hash, covering, partial

Most everyday Postgres indexes are B-tree indexes. They handle equality, range scans, and ordered reads, which is why they are the sensible default for feed queries, lookup pages, account lists, and most filters.

Hash indexes serve equality lookup. They answer "find the row where this column equals this value" directly. When a query needs rows in sorted order, that is a job for a B-tree, since a B-tree keeps its entries ordered and a hash index does not.

Covering indexes use INCLUDE columns to avoid extra table fetches for small, commonly returned fields:

CREATE INDEX comment_feed_covering_idx
  ON comment_feed (user_id, created_at DESC)
  INCLUDE (comment_id, post_id, post_title, comment_author_name);

Keep covering indexes to small, frequently read fields. An INCLUDE column copies its data into the index, so a large text blob placed there grows storage, adds cache pressure, and raises write cost. Small and well-chosen is the whole point.

Partial indexes store only the rows a query actually needs:

CREATE INDEX comments_visible_recent_idx
  ON comments (post_id, created_at DESC)
  WHERE deleted_at IS NULL;

This pays off when most queries quietly ignore deleted, archived, private, or expired rows anyway.

It is worth remembering that every index carries a write cost. Each insert, update, and delete maintains every index on the table, so an index that no query ever reads still charges its maintenance on every single write. A drawer full of unused indexes slowly becomes slower writes and heavier maintenance. Indexes are easy to add and easy to forget; the occasional review is kind to your future self.

Choosing a column or a table

Add a column when:

The value is one-to-one with the row.
It is read with the row most of the time.
It changes at the same lifecycle as the row.
It has bounded size.

Create another table when:

The relationship is one-to-many.
The data has a different lifecycle.
The data changes at a much higher rate.
The data is optional, sparse, or large.

Example:

-- Usually fine: one post has one current moderation state.
ALTER TABLE posts
ADD COLUMN moderation_state text NOT NULL DEFAULT 'pending';

-- Better as a table: one post has many moderation events.
CREATE TABLE post_moderation_events (
  id bigint PRIMARY KEY,
  post_id bigint NOT NULL REFERENCES posts (id),
  actor_id bigint NOT NULL REFERENCES users (id),
  action text NOT NULL,
  created_at timestamptz NOT NULL
);

The decision comes down to three quiet questions: read cost, write cost, and lifecycle.

Partitioning

Partitioning splits one logical table into smaller physical pieces.

Horizontal partitioning splits by rows:

comments_2026_05
comments_2026_06
comments_2026_07

Vertical partitioning splits by columns:

posts
post_bodies
post_embeddings

Horizontal partitioning helps when queries can exclude whole partitions. A query for May 2026 comments has no reason to inspect the June 2026 partitions, so it simply does not.

Vertical partitioning helps when a small hot row drags along large cold fields. A list page may need post_id, title, and created_at, but not a 40 KB body or a 1,536-dimension embedding riding next to them.

Either way, partitioning only helps if the query includes the partition key.

-- Good partition fit: includes created_at.
SELECT *
FROM comments
WHERE created_at >= '2026-05-01'
  AND created_at < '2026-06-01';

-- Poor partition fit: no partition key.
SELECT *
FROM comments
WHERE author_id = $1;

If the table is partitioned by month, the second query may still have to touch many partitions to find its answer.

Hot partitions

Partitioning can also quietly create a new bottleneck, so it helps to see it coming.

Partition writes by created_at, and almost every new row lands in the current partition. The historical partitions go quiet; the current one runs hot.

Partition by tenant_id, and one large tenant can dominate a single partition.

Hash by user_id, and writes spread out more evenly, but range queries by date become harder.

Every partition key carries its own trade-off; there is no key without one. The honest move is to pick the key that matches the dominant access pattern and the failure mode you most want to avoid.

Sharding decision point

Sharding is partitioning across database servers. It is genuinely expensive to operate, so it is worth waiting until the numbers ask for it rather than reaching for it early.

Sharding fits when:

One primary cannot sustain the write rate.
One database cannot hold the working set.
Backups, restores, vacuum, or migrations no longer fit operational windows.
Most critical queries include a shard key.
Cross-shard transactions are rare or can be redesigned.

And these situations call for a smaller fix first, while sharding waits:

The slow query is missing an index, so add the index.
The app has N+1 queries, so collapse them into one.
One hot customer or hot key is the real load, so isolate that key.
The product needs frequent global ordering, global uniqueness, or cross-tenant reports, which a single database serves cleanly.

Sharding does not replace the data modeling work; it carries that work along with it. A clean data model stays the quiet prerequisite that keeps sharding manageable.

What I think

In practice, the sequence tends to settle into this:

Measure the slow path.
Write the query shape in plain language.
Add the obvious index.
Check the plan with EXPLAIN (ANALYZE, BUFFERS).
If the endpoint is still doing work proportional to history, build a read model.
If write amplification becomes the bottleneck, batch or queue derived updates.
Partition for operations and query exclusion.
Shard once one database has stopped being a reasonable unit.

This is where the phrase "database scaling" quietly points back at the model. The database only executes the work the model asks of it, so scale begins, calmly, by shaping that work.

Tutorial checklist

For any hot read path, it helps to fill this out before changing anything:

Question	Example answer
Product read	Latest comments on a user's posts
Filter	`user_id = $1`
Sort	`created_at DESC`
Page size	100 rows
Candidate set	Up to tens of millions for large users
Freshness	A few seconds stale is acceptable
Source of truth	`comments`, `posts`, `users`
Read model	`comment_feed`
Primary index	`(user_id, created_at DESC)`
Repair path	Rebuild feed rows from canonical comments

Then choose the simplest design that makes the common read proportional to the page size. The simplest one that works is usually the right one.

Summary

Scale is shaped by the data model first, and the storage engine carries that shape.
Normalization keeps source-of-truth data clean; hot reads run on derived models built for the access pattern.
Denormalization trades read speed for write amplification and consistency work.
Index order should match filter columns, sort columns, and page limits.
Partial and covering indexes help when they match specific query shapes.
Partitioning helps when queries include the partition key.
Sharding is a last step, not a first fix.

Pop quiz

Interactive quiz

Data modeling for scale

A randomized review of schema, index, denormalization, and partitioning decisions from this post.

4of 10 questions