Cloud Storage Services (Block, File, Object)¶

Three storage families, three jobs — pick by access pattern, not by default.

The hook¶

"Where do I store this?" is one of the first questions in any cloud architecture, and most engineers default to S3 for everything. That works until it doesn't.

Cloud storage actually splits into three families — block, file, object — each optimized for a specific access pattern. Pick the wrong one and you'll pay 10x for half the performance, or worse, ship something that won't work at all (you can't run Postgres on S3).

The good news: the rules for picking are simple once you know the families.

The concept¶

Cloud providers offer three storage families that map to three access patterns.

Block storage — raw disks attached to a VM. The VM's filesystem reads and writes blocks. Single-host, low-latency. AWS EBS, Azure Managed Disks, GCP Persistent Disk.
File storage — networked filesystem with POSIX semantics, mounted by multiple VMs over NFS or SMB. AWS EFS, Azure Files, GCP Filestore.
Object storage — HTTP API for blobs. Infinite scale, key-value access, no filesystem semantics. AWS S3, Azure Blob Storage, GCP Cloud Storage.

Two more tiers sit alongside the big three:

Archive storage for cold data measured in cents per terabyte per month, with retrieval times in minutes to hours. S3 Glacier, Azure Archive, GCP Coldline / Archive.
Instance / ephemeral storage — local SSD on the VM itself. Fastest possible I/O, but the data dies when the VM stops. Good for scratch space, bad for anything you need to keep.

The conceptual deep-dive on object storage lives in the System Design course's Object Storage page. This page is about choosing between the three families.

Diagram¶

flowchart LR
    subgraph Block["Block storage — one VM"]
        VM1[VM] -->|disk I/O| EBS[(EBS volume)]
    end
    subgraph File["File storage — many VMs"]
        VM2[VM] -->|NFS mount| EFS[(EFS share)]
        VM3[VM] -->|NFS mount| EFS
        VM4[VM] -->|NFS mount| EFS
    end
    subgraph Object["Object storage — anything with HTTP"]
        APP1[App] -->|PUT / GET| S3[(S3 bucket)]
        APP2[Browser] -->|PUT / GET| S3
        CDN[CDN] -->|GET| S3
    end

Three patterns: one VM owns a disk, many VMs share a filesystem, anything on the internet talks to a bucket.

Example — a SaaS app picking storage for three different jobs¶

A typical SaaS app ends up using all three families in the same architecture. Here's what that looks like for an imaginary photo-sharing platform.

Database files — block storage (EBS)

The Postgres data directory sits on an EBS gp3 volume attached to the DB instance. Why block:

Postgres needs low-latency random I/O. A single millisecond per read matters.
Only one host writes to the disk at a time — block storage's single-attach model is fine.
Snapshots are a one-line API call, and they're incremental.

You wouldn't put this on S3 (wrong API, no filesystem) or EFS (NFS adds latency Postgres hates).

User uploads — object storage (S3)

User-uploaded photos go to a bucket — call it my-app-uploads. The app stores a row in Postgres pointing at the S3 key:

photos
-------
id        | uuid
owner_id  | uuid
s3_key    | text   -- e.g. "uploads/abc123.jpg"
created   | timestamp

CloudFront sits in front of the bucket so users stream from an edge node, not from S3 directly. Storage is a few cents per GB per month. Bytes scale to petabytes without thinking about it.

You wouldn't put this on EBS (single-attach, capped size) or EFS (10x the cost, no built-in CDN integration).

Shared CMS content — file storage (EFS)

The marketing site runs on five web servers behind a load balancer, all running WordPress. The wp-content/uploads folder needs to be readable and writable by every server. EFS mounts on all five hosts as a normal NFS filesystem. WordPress doesn't know it's not local disk.

You wouldn't put this on S3 (would require rewriting WordPress to use the S3 API) or EBS (single-attach — only one server could see the files).

The lesson: each job has a natural storage type. "Use S3 for everything" looks fine until you try to host a database on it (you can't) or share a directory across VMs (it's painful and slow).

Mechanics — storage tier comparison¶

The three big clouds offer the same three families under different names. Here's the cheat sheet.

Family	AWS	Azure	GCP	Typical latency	Max scale	Pick when
Block	EBS	Managed Disks	Persistent Disk	<1 ms	~64 TB / volume	One VM owns the disk; need low-latency I/O
File	EFS	Azure Files	Filestore	1–10 ms	Petabytes	Multiple VMs share a POSIX filesystem
Object	S3	Blob Storage	Cloud Storage	10–100 ms first byte	Effectively unlimited	Blobs accessed by key, served at scale
Archive	S3 Glacier / Deep Archive	Archive tier	Coldline / Archive	Minutes to hours	Effectively unlimited	Compliance, long-term backup
Ephemeral	Instance Store	Temp Disk	Local SSD	<0.1 ms	Tied to VM	Scratch space, caches, anything disposable

All three providers advertise eleven nines of durability on object storage — losing a file is a write-a-postmortem event, not a routine risk.

Storage classes within object storage

Object storage isn't one price. Each provider has tiers based on how often you read the data:

Class (AWS naming)	Use for	Trade-off
S3 Standard	Hot data, served constantly	Highest storage cost, lowest retrieval cost
S3 Standard-IA (Infrequent Access)	Backups read once a quarter	Cheaper at rest, retrieval fee per GB
S3 Glacier Instant / Flexible	Compliance archives, rare retrieval	Cheap at rest, minutes-to-hours to retrieve
S3 Glacier Deep Archive	"Never read but legally required"	Pennies per TB, hours to retrieve

Lifecycle policies move objects between classes automatically — uploaded today goes to Standard, after 30 days drops to IA, after 180 days drops to Glacier. This single setting cuts storage bills 80–90% on cold data.

Concept	What it is	How it relates
Object Storage (System Design)	Conceptual deep-dive on blob storage, immutability, and the database/CDN pairing	The "why" behind the object storage column on this page. Read it for the mental model.
Cloud Cost Management	The discipline of tracking and trimming cloud spend	Storage looks cheap per GB but egress and cross-region replication are the surprise lines on the bill.
Database Types	Relational, document, key-value, graph	Some databases run on block storage (Postgres, MySQL). Some are managed services that hide their storage entirely (DynamoDB, BigQuery).
Cloud Networking	VPCs, subnets, security groups	Mounting EFS or Azure Files requires VPC + security group setup so VMs can reach the filesystem. Object storage is reachable over the public API endpoint by default.
Backups & Snapshots	Point-in-time copies of disks or databases	Block storage takes incremental snapshots into object storage automatically. Object storage is the natural archive destination for everything else.
CDN	Geographic cache of static assets at edge nodes	Pairs with object storage. Bytes live in S3, get cached at the edge, never round-trip to the bucket on a hot read.
Pre-signed URLs	Time-limited URLs for direct browser uploads to a bucket	How you let users upload straight to S3 without proxying through your app servers.
Encryption at rest	All three providers encrypt by default	Free, on, no excuse to skip it. Bring-your-own-key (KMS, Key Vault) is one toggle further.

The recurring theme: storage is one piece of a system. Pair it with networking, databases, and a CDN.

When (and when not) to use each¶

The decision tree is short.

Pick block storage when:

One VM owns the disk and you need low-latency I/O — databases, OS root volumes, anything filesystem-heavy on a single host.
You want fast snapshots and easy restore.

Pick file storage when:

Multiple VMs need a shared POSIX filesystem — legacy apps that expect a local directory, content sharing across a fleet, build artifacts.
You don't want to rewrite the app to talk to an HTTP API.

Pick object storage when:

You're storing blobs accessed by key — user uploads, backups, logs, ML datasets, static assets.
You need durability, scale, and cheap-per-GB more than you need filesystem semantics.

The anti-patterns:

Don't use block for "shared storage." A single EBS volume can't be mounted by two VMs (with rare multi-attach exceptions for clustering). Use file storage.
Don't use file for "I just need to dump some data." EFS is roughly 10x more expensive than S3 per GB. If the access pattern is "write once, read by key," use object.
Don't use object for "I need a database." No transactions, no queries, immutable on write. Use a database. Use object storage to hold the bytes the database points at.
Don't ignore egress. Storing data is cheap. Moving it out — to users, to another region, to another cloud — is where the bill bites. Plan egress before you commit.

Key takeaway¶

Block for one VM, file for many VMs, object for the internet — match storage to access pattern, not to defaults.
S3 isn't a database. Pair it with one. Database holds the metadata, object storage holds the bytes.
Lifecycle policies are free money. Move cold data to the archive tier and watch the bill drop.
Egress is the surprise line. Storage at rest is pennies; getting bytes out costs real money.
Pick your VM disk type with the workload in mind. Block storage has multiple performance tiers (gp3, io2, premium SSD) — defaults are rarely optimal for databases.

Quiz available in the SLAM OG app — three questions on which family fits which job, multi-attach vs. single-attach, and where the bill actually goes.