Cloud Networking — VPC, Subnets, Security Groups¶

Your cloud workload lives in a VPC. Understanding that is non-negotiable.

The hook¶

Every cloud workload — every EC2 instance, every Lambda, every container — runs inside a Virtual Private Cloud. Get the networking wrong and your services can't talk to each other. Get it really wrong and they can talk to the entire internet.

The networking concepts you skipped in your CCNA class come back as VPCs, subnets, and security groups. CIDR blocks, route tables, firewall rules — same fundamentals, new wrappers. The good news: cloud networking is software-defined, so you build it with config, not cables.

The concept¶

A VPC is your isolated slice of the cloud provider's network. You pick the IP range. You decide what's reachable from the internet and what isn't. Five primitives do most of the work.

Primitive	What it is
VPC	An isolated network in one region. You assign a CIDR block (e.g., `10.0.0.0/16` = 65,536 addresses).
Subnet	A chunk of the VPC tied to one Availability Zone. Public subnets route to an internet gateway; private don't.
Route Table	Rules that say "traffic for this destination goes that way" (e.g., `0.0.0.0/0` → IGW).
Security Group	A stateful firewall at the instance level. Default-deny inbound, allow what you list.
NACL	A stateless firewall at the subnet level. You'll rarely customize these — start with security groups.

The mental model: the VPC is the building, subnets are the floors, the route table is the elevator system, security groups are the keycards on each office door.

Diagram¶

flowchart TB
    Internet((Internet)) --> IGW[Internet Gateway]
    IGW --> VPC

    subgraph VPC["VPC 10.0.0.0/16 — us-east-1"]
        subgraph AZ1["AZ us-east-1a"]
            PubA[Public Subnet<br/>10.0.1.0/24]
            PrivA[Private Subnet<br/>10.0.10.0/24]
        end
        subgraph AZ2["AZ us-east-1b"]
            PubB[Public Subnet<br/>10.0.2.0/24]
            PrivB[Private Subnet<br/>10.0.11.0/24]
        end
        ALB[Application<br/>Load Balancer]
        NAT[NAT Gateway]
        App1[App EC2]
        App2[App EC2]
        DB[(RDS Postgres)]

        PubA --- ALB
        PubB --- ALB
        PubA --- NAT
        PrivA --- App1
        PrivB --- App2
        ALB --> App1
        ALB --> App2
        App1 --> DB
        App2 --> DB
        App1 -.outbound.-> NAT
        App2 -.outbound.-> NAT
    end

Public subnets host the load balancer and NAT gateway — anything that needs to face the internet. Private subnets host the app servers and the database. Outbound calls (package updates, third-party APIs) hop through the NAT gateway. Nothing on the internet can dial the app or DB directly.

Example — a production web app on AWS¶

This is the layout you'll see in 90% of production AWS accounts. Walk through it once, and you've seen most VPCs you'll meet.

VPC: 10.0.0.0/16 in us-east-1. Plenty of room (~65K IPs) without colliding with the typical office network.

Subnets across two AZs:

Subnet	CIDR	AZ	Type	Hosts
`public-a`	`10.0.1.0/24`	us-east-1a	Public	ALB, NAT Gateway
`public-b`	`10.0.2.0/24`	us-east-1b	Public	ALB
`app-a`	`10.0.10.0/24`	us-east-1a	Private	EC2 app servers
`app-b`	`10.0.11.0/24`	us-east-1b	Private	EC2 app servers
`db-a`	`10.0.20.0/24`	us-east-1a	Private	RDS Postgres (primary)
`db-b`	`10.0.21.0/24`	us-east-1b	Private	RDS Postgres (standby)

Two AZs because if us-east-1a goes down (it has, more than once), traffic shifts to us-east-1b automatically.

Security groups — the firewall layer:

sg-alb        inbound:  443  from  0.0.0.0/0
sg-app        inbound:  80   from  sg-alb
sg-db         inbound:  5432 from  sg-app

Read those rules out loud. The internet talks to the ALB on 443. The ALB talks to the app servers on 80. The app servers talk to the database on 5432. Nothing else is allowed inbound, anywhere. If an attacker gets a shell on an app server, they still can't reach the database from their laptop — the DB security group doesn't trust their IP.

Route tables:

Public subnets: 0.0.0.0/0 → Internet Gateway
Private subnets: 0.0.0.0/0 → NAT Gateway (in public-a)
All subnets: 10.0.0.0/16 → local (so subnets can talk to each other)

The result:

Users hit https://app.example.com → DNS resolves to the ALB → ALB load-balances to a healthy app server in either AZ → app server queries RDS → response flows back.
App servers can apt update and call third-party APIs because the NAT Gateway gives them outbound internet — but inbound from the internet is blocked.
The database has no public IP. Period. The only way in is through the chain of security groups.

That's the standard production layout: public subnets host things the internet talks to; private subnets host what shouldn't be reachable from outside; security groups are the in-between firewall.

Mechanics — VPC building blocks reference¶

Block	What it does	When to use	Common gotcha
VPC	Isolated network in one region with a CIDR block	Every workload — even Lambdas live in one if they need to reach private resources	CIDRs can't overlap with on-prem or other VPCs you'll peer with later. Pick wisely day one.
Subnet	A CIDR slice of the VPC bound to one AZ	Two per AZ minimum (public + private), across at least two AZs	A subnet lives in one AZ. "Multi-AZ" means multiple subnets, not one big subnet.
Route Table	Tells subnets where to send traffic	Customize when you add an IGW, NAT, peering, or VPN	Forgetting to attach the route table to the subnet — a classic "why doesn't this work" moment.
Internet Gateway (IGW)	Lets a VPC reach the internet	One per VPC for public-facing workloads	A subnet isn't public until its route table points `0.0.0.0/0` at the IGW.
NAT Gateway	Outbound-only internet for private subnets	When private resources need to call out (updates, APIs) but shouldn't be reachable	Costs ~$32/month plus data charges per AZ. NAT instances are cheaper but you maintain them.
Security Group	Stateful firewall at the instance level	Always — this is your default firewall	Stateful means return traffic is automatically allowed. You don't add an outbound rule for the response.
NACL	Stateless firewall at the subnet level	Compliance scenarios, blocking specific IPs broadly	Stateless = you must allow both directions. Easy to misconfigure. Leave defaults unless you have a reason.
VPC Endpoint	Private connection to AWS services (S3, DynamoDB, etc.)	When you don't want traffic to S3 leaving the VPC	Saves NAT data charges and keeps traffic off the public internet.
VPC Peering	Connects two VPCs so they can route to each other	Joining two accounts/regions without going over the internet	No transitive routing. A↔B and B↔C does not mean A↔C. Use Transit Gateway for that.
Transit Gateway	A hub for connecting many VPCs and on-prem networks	Three or more VPCs, hybrid setups, multi-account orgs	Costs add up fast. Worth it past ~3 VPCs; overkill for two.

A few rules worth memorizing:

Security groups are allow-only. There's no "deny" rule. If it's not allowed, it's denied.
A security group can reference another security group as the source. That's how you say "the app servers can talk to the DB" without hardcoding IPs.
The default VPC every AWS account ships with is fine for prototypes, sketchy for production. It has public subnets in every AZ by default.

Concept	What it is	How it relates
Cloud IAM	Identity and access for cloud APIs	Controls who can change networking. A bad IAM policy lets a junior engineer delete your VPC.
Load Balancers	Traffic routing across backends	Live in public subnets, gate traffic into private. ALB is the most common front door for a VPC.
DNS & Routing (Route 53)	Public DNS plus VPC-internal name resolution	Route 53 maps `app.example.com` to your ALB; the VPC's internal DNS resolves private hostnames between services.
Cloud Security — Shared Responsibility	Provider secures the network fabric; you secure how you wire it up	Misconfigured security groups are your fault. AWS gives you the tools, not the policy.
Networking Fundamentals (System Design course)	OSI / TCP-IP / how packets actually move	The layer below the cloud abstractions. Knowing the basics makes VPC behavior less mysterious — see Networking Fundamentals.
VPN & Direct Connect	Private links from on-prem networks into your VPC	Hybrid cloud connectivity. Site-to-site VPN over the internet, or Direct Connect for a dedicated line.
mTLS / Service Mesh	Mutual TLS between services	Application-layer trust on top of network-layer isolation. Belt-and-suspenders for sensitive workloads.
PrivateLink	Expose a service into another VPC without peering	How SaaS vendors deliver "private" endpoints into your VPC without giving you their network.

When (and when not) to design VPCs deeply¶

Design it carefully when:

You're building a production workload that will outlive your patience for redesigns.
You have compliance requirements — HIPAA, PCI, SOC 2 — that demand network isolation and audit trails.
You need hybrid connectivity to on-prem (VPN, Direct Connect). Bad CIDR choices early bite you forever.
You're running multi-region failover and need consistent network design across regions.
You have multiple accounts and need a hub-and-spoke topology with Transit Gateway.

Skip the deep design when:

You're prototyping. The default VPC is fine. Ship the feature, refactor the network later.
Your platform abstracts networking — Vercel, Cloudflare Workers, Render, AWS App Runner. They handle the VPC; you ship code.
You don't need custom routing. If "public load balancer in front, private app and DB behind it" covers you, the standard layout is the answer. Don't invent a new one.
You're a solo developer on a side project. A custom three-tier VPC for your weekend app is overkill that you'll regret maintaining.

The honest take: most teams should copy the standard two-AZ public/private layout, parameterize it in Terraform, and move on. Network design is the wrong place to be clever.

Key takeaway¶

Public subnets host things the internet talks to (load balancers, NATs); private subnets host what they shouldn't (app servers, databases); security groups are the in-between firewall.
Pick CIDRs carefully on day one — they're hard to change and they collide with peering, VPNs, and on-prem networks later.
Security groups are stateful and allow-only. Reference other security groups as sources instead of hardcoding IPs.
Two AZs minimum for anything you'd be paged about at 3 AM. One AZ is one outage from a bad night.
Default VPC is fine for prototypes, not production. Custom VPC, public/private split, locked-down security groups — that's the bar.

Quiz available in the SLAM OG app — three questions on public vs private subnets, security group debugging, and when to skip custom VPC design.