Case Study: Scaling a Web App to Millions of Users¶

From one box to a million users, in seven steps.

The hook¶

When you launch, one server runs everything. The web app, the database, the background jobs, the static files — all on the same box. It's beautiful. It fits in your head.

Then growth happens. And the bottlenecks reveal themselves in a predictable order: the web server first (split it from the DB), then the DB (replicate the reads), then static assets (push them to a CDN), then sessions (move them into a cache), then writes (shard, eventually).

This is the canonical scaling progression. Every architecture interview asks it because every real growth story walks it. The trick isn't memorizing the stages — it's knowing which one you're in and resisting the next one until the data forces your hand.

The concept¶

Scaling isn't a destination, it's a sequence of forced moves. Each stage is triggered by a specific bottleneck and unlocks the next horizon of growth.

Stage 1 — One box. Web server, database, files, cron jobs all together. Cheapest possible setup. Works fine up to roughly 10K monthly users for most CRUD apps.

Stage 2 — Split web tier from DB. Two boxes. The web server and the database stop fighting each other for RAM and CPU. Same code, double the headroom.

Stage 3 — Load balancer + replicated web tier. Web servers go horizontal. A load balancer fronts them. Now a deploy doesn't take the site down, and a single dead box doesn't either.

Stage 4 — CDN for static assets. Images, CSS, JS, video — push them to the edge. Your origin servers stop serving bytes they shouldn't be serving.

Stage 5 — Cache layer (Redis). Hot reads stop hammering the DB. Sessions move out of the web servers so any box can handle any request.

Stage 6 — Database read replicas. Reads fan out to replicas. The primary handles writes only. Read-heavy workloads (most of them) get a 5-10x boost.

Stage 7 — Sharding. When the write primary itself can't keep up, the data gets split across multiple primaries by some key — user ID, tenant ID, region. Joins get awkward. You earned this.

Stage 8 — Microservices and multi-region. The monolith breaks into services owned by different teams. Traffic gets routed across regions for latency and failover. You now have an infra team.

Each stage is forced by a bottleneck the previous stage exposes. Skipping ahead is how startups burn runway.

Diagram¶

flowchart TB
    subgraph Stage1[Stage 1 — One box]
        S1[Web + DB + files]
    end

    subgraph Stage4[Stage 4 — Web tier scaled]
        U1[Users] --> CDN1[CDN]
        U1 --> LB1[Load Balancer]
        LB1 --> W1[Web 1]
        LB1 --> W2[Web 2]
        LB1 --> W3[Web 3]
        W1 --> DB1[(Primary DB)]
        W2 --> DB1
        W3 --> DB1
    end

    subgraph Stage8[Stage 8 — Multi-region, sharded, microservices]
        U2[Users] --> CDN2[CDN edge]
        CDN2 --> GLB[Global LB / DNS]
        GLB --> R1[Region A]
        GLB --> R2[Region B]
        R1 --> GW[API Gateway]
        GW --> SVC1[User svc]
        GW --> SVC2[Order svc]
        GW --> SVC3[Inventory svc]
        SVC1 --> CACHE[Redis]
        SVC1 --> SH1[(Shard 1)]
        SVC1 --> SH2[(Shard 2)]
        SVC2 --> SH3[(Orders DB)]
        SVC3 --> SH4[(Inventory DB)]
    end

    Stage1 --> Stage4 --> Stage8

The arrows between subgraphs aren't network paths — they're the timeline. You don't deploy stage 8 on day one. You arrive there one forced move at a time.

The startup at each milestone¶

1K users — one Heroku dyno fits fine.

You're shipping features, not infra. Postgres add-on, one web dyno, maybe a worker. The whole thing costs $50/month. The bottleneck at this stage is you, not the architecture. Every minute spent on Kubernetes is a minute not spent on the product.

What broke first: nothing yet. What fixed it: shipping more.

100K users — web tier split, Postgres read replicas.

Page loads got sluggish during peak hours. The database was eating all the CPU on the shared box, leaving the Rails workers swapping. You moved the DB to its own host, then added a read replica because dashboards and feed reads were dominating traffic. The app code learned the difference between Model.read and Model.write.

What broke first: DB and web fighting for the same CPU. What fixed it: separation, then a read replica.

1M users — CDN for assets, Redis cache, sharding decisions loom.

Marketing put a 4MB hero image on the homepage. Origin bandwidth bill spiked. You moved static assets to CloudFront. Then session lookups started hitting the DB hard, so sessions and hot user data moved into Redis. Your team started arguing about sharding — you held the line and added more replicas instead. Sharding was still a year out.

What broke first: origin bandwidth, then session DB load. What fixed it: a CDN and a cache, in that order.

10M+ users — microservices, regional failover, full infra team.

The monolith got too big for any one team to own. Deploys were scary. The checkout team wanted to ship independently of the search team. You broke out services along team lines — not theoretical bounded contexts, team boundaries. You added a second region for failover after a single AZ outage cost you four hours of revenue. You hired an infra team because someone has to be on call for the shared platform now.

What broke first: deploy coordination across teams, then single-region risk. What fixed it: service boundaries that match team boundaries, plus multi-region.

Mechanics: the stages, side by side¶

Stage	Trigger (signal)	What you add	What you don't add yet	Common over-engineering
1. One box	Launch day	App + DB on one host	Anything else	"Microservices from day one"
2. Split tiers	DB and web fighting for RAM/CPU	DB on its own host	Replicas, cache, LB	Adding Redis before the split helps
3. LB + web cluster	One web box can't take peak load; deploys cause downtime	Load balancer, 2-3 web servers	Read replicas, sharding	Kubernetes for two web servers
4. CDN	Bandwidth bill from origin, slow asset loads for distant users	CloudFront/Fastly/Cloudflare	Multi-region origin	Edge compute before edge caching works
5. Cache (Redis)	Repeated hot reads, session lookup latency	Redis for sessions + hot data	Sharded Redis, write-through	Caching everything; cache invalidation hell
6. Read replicas	Read load saturates DB primary; reports kill production	1-3 read replicas, read/write split in app	Sharding	Six replicas when two would do
7. Sharding	Write primary saturated even with cache; storage limit looming	Shard key, shard router, rebalancing tooling	Cross-shard joins	Sharding before you've maxed vertical Postgres
8. Microservices + multi-region	Team coordination cost > infra cost; single-region risk is unacceptable	Service boundaries, API gateway, regional failover	Service mesh, fancy choreography	30 services for 30 engineers

The pattern: each stage adds one thing and only when the previous stage stops working. The over-engineering column is what kills startups.

Concept	Why it matters here
load-balancers	Stage 3. The first piece of infra most teams add when one box stops being enough.
caching-strategies	Stage 5. Redis (or Memcached) is the workhorse of this transition. Write-through, write-back, cache-aside — pick deliberately.
database-sharding	Stage 7. The hard one. Most apps never need it; the ones that do regret waiting.
cdn	Stage 4. Cheapest scaling win in the entire progression. Should arguably move earlier on bandwidth-heavy apps.
case-netflix	What stage 8+ looks like at planet scale. The microservices + multi-region pattern, fully expressed.
case-stack-overflow	The counter-example. Stack Overflow famously stayed on a small number of beefy servers. Vertical scaling, fewer moving parts.
microservices	The architectural shift in stage 8. Worth it when team coordination > service coordination. Not before.
distributed-patterns	Circuit breakers, retries, idempotency keys — the survival kit you need once you're past stage 6.

When (and when not) to scale ahead¶

Scale ahead when:

You have real evidence — load tests with realistic traffic shape, a measured growth curve, a known marketing event that will 10x demand
A specific bottleneck is days away — your current p99 is 1.8s and trending up; you'll hit timeout territory before next quarter
The cost of being late is concrete — losing the front page of Hacker News at the wrong moment, or a contracted SLA you'll breach

Don't scale ahead when:

You're guessing — "we might need to handle 10M users someday" is not evidence
You're copying someone else's architecture — Netflix's stack is right for Netflix, not for your 5K-user SaaS
The product hasn't found its shape yet — premature infrastructure freezes a design you should still be free to change
You're bored — the most expensive scaling work in the industry is engineers solving problems they haven't earned

Premature scaling kills startups by burning the runway you need to actually find the customers who would have created the scaling problem. Walk the stages. Let bottlenecks force the moves.

Key takeaway¶

Scale when bottlenecks force you, not when blog posts suggest you should.
Order matters. Split tiers, then LB, then CDN, then cache, then replicas, then shard. Skipping creates rework.
Each stage adds one thing. The over-engineering at every stage is doing two stages at once.
Vertical scaling is underrated. A bigger box buys you a year. Use that year to ship product, not infra.
Microservices are a team-size problem, not a scale problem. Break out services when team coordination hurts more than service coordination would.

Quiz available in the SLAM OG app — three questions on first moves, when to shard, and the most common startup scaling mistake.