Skip to content

Object Storage

Cheap, durable, infinite — and the wrong tool for half the things people use it for.

The hook

Your app lets users upload videos. You're sitting at 100TB and growing. You're not putting that in Postgres — your DBA would quit.

Object storage is the default. S3, Google Cloud Storage, Azure Blob, MinIO if you're self-hosting. Cheap per GB, effectively infinite, and durable enough that losing a file is a once-in-a-lifetime story. Every cloud-native app uses it.

The trap: engineers also reach for it as a slow, expensive database. Updating tiny JSON files on every request. Trying to "list all objects where user_id = 42." That's where the system starts hurting.

The concept

Object storage is blob storage with an HTTP API. You PUT bytes at a key. You GET them back. You DELETE when done. That's the whole interface.

No queries. No transactions. No schema. No joins. Just keys mapped to bytes.

What you trade away in features, you get back in scale:

  • Durability. S3 advertises 11 nines (99.999999999%). Lose a file, write a postmortem.
  • Cost. S3 standard runs around $0.023/GB/month. Glacier deep archive drops to ~$0.00099/GB/month. Compare to managed Postgres at 10x+.
  • Scale. S3 stores trillions of objects across millions of customers. There is no "running out of room."
  • Consistency. Strongly consistent for read-after-write since 2020. Older docs still say "eventually consistent" — ignore them.
  • Immutability. Objects don't get patched. To "update," you write a new object at the same key and the old bytes get replaced wholesale.

That last point is the one engineers keep forgetting.

Diagram

flowchart LR
    App[Your App] -->|PUT /videos/abc123.mp4| API[S3 API]
    App -->|GET /videos/abc123.mp4| API
    API --> B[(Bucket: my-bucket)]
    B --> O1[videos/abc123.mp4]
    B --> O2[videos/def456.mp4]
    B --> O3[avatars/user-42.jpg]
    B --> O4[backups/2026-05-08.sql.gz]

Compared to a relational DB:

Object store Relational DB
Lookup by Key (string) Any indexed column
Query None — fetch by key SQL
Item size Bytes to terabytes per object Bytes to megabytes per row
Update semantics Replace whole object Patch a field
Cost per GB Cents Dollars
Best for Files, blobs, archives Queryable state

Object storage is a hash table the size of the internet. Don't ask it to be more than that.

Example — Netflix on S3

Netflix runs the largest single S3 footprint on the planet. Last public number was tens of exabytes — call it "more than you can fit in your head." The architecture is the lesson.

What lives in S3

  • Encoded video. Every title is encoded into dozens of bitrates, resolutions, and codecs. A single movie can produce 1,000+ output files. All of those go to S3.
  • Source masters. The pristine pre-encode files studios deliver.
  • Logs and analytics events. Petabytes of viewing data feeding their data platform.
  • ML training data. Recommendation models train on data sitting in S3.

What does NOT live in S3

  • The "what should I watch next" lookup. That's a recommendation service backed by a real database.
  • User accounts, viewing history, billing. Cassandra and other operational stores.
  • Anything on the request path where you can't tolerate ~100ms latency.

The pairing

A user hits play. Cassandra resolves the user, the title, and the entitlement. A metadata service points at the right encoded file in S3 — by key. That key gets handed to Open Connect, Netflix's purpose-built CDN, which long ago pre-positioned the bytes on appliances inside ISPs. The user streams from a box a few miles away.

S3 is the source of truth. The CDN is the delivery layer. The database is the brain that knows which key to ask for.

That's the pattern: object storage holds the bytes, the database holds the metadata, the CDN delivers it close to the user. Drop any one of those and the system breaks.

Mechanics — object stores, warehouses, and lakes

People confuse these three constantly. They're not the same thing.

Type Examples What it is Query model
Object store S3, GCS, Azure Blob, MinIO Generic blob storage By key only — no search
Data warehouse Snowflake, BigQuery, Redshift Structured analytics on transformed data SQL, schema-on-write
Data lake S3 + Athena, Iceberg, Delta Lake Raw data queried in place SQL over files, schema-on-read

The relationship: a data lake is built on an object store. You dump raw files (JSON, Parquet, CSV) into S3, then layer a query engine on top so you can run SQL without ever moving the bytes. A warehouse, by contrast, ingests data through an ETL pipeline into its own optimized storage — you pay more, you get faster queries.

When to use which:

  • Object store — files that get read as files. Videos, images, backups, ML inputs.
  • Data lake — large volumes of raw or semi-structured data you'll query occasionally. Cost-optimized for storage.
  • Warehouse — clean, structured data you query constantly for dashboards and BI. Cost-optimized for queries.

A common stack uses all three: raw events land in the lake, get transformed into the warehouse, and feed dashboards. The object store is the foundation under everything.

Concept What it is How it relates
CDN Geographic cache of static assets at edge nodes Object store + CDN is the canonical pattern for serving images and video. Bytes live in S3, get cached at the edge.
Database types Relational, document, key-value, graph Object store handles blobs. Database handles queryable state. Use both.
Backups Point-in-time snapshots of databases or filesystems Object storage is the natural destination — durable, cheap, and easy to lifecycle into colder tiers.
Replication Copying data to multiple physical locations Object stores handle this internally. S3 stores every object across multiple availability zones by default. Cross-region is opt-in.
Versioning Keeping old copies of an object after overwrite or delete One toggle on the bucket. Saves you from the "wait, who deleted that?" call.
Lifecycle policies Automatic rules to transition or delete objects over time Move objects from S3 Standard → Infrequent Access → Glacier as they age. Cuts storage cost 90%+ for cold data.
Pre-signed URLs Time-limited URLs that grant temporary read or write access How you let a browser upload directly to S3 without proxying through your server.
Search index Inverted-index store for full-text or attribute search Object stores can't search. If you need to find blobs by content, pair S3 with Elasticsearch or OpenSearch.

The recurring theme: object storage is one piece of a system, not the whole system.

When (and when not) to use it

Use object storage for:

  • User uploads — images, videos, documents, anything a user hands you as a file
  • Backups — database dumps, filesystem snapshots, application archives
  • Logs — long-term retention beyond what your log platform keeps hot
  • ML training data — datasets too large to fit anywhere else, read in batches
  • Static assets — images, JS bundles, fonts (paired with a CDN)
  • Data lake foundations — raw event data queried by Athena, Spark, or similar

Skip it for:

  • Anything you need to query. No SELECT, no WHERE, no joins. Use a database.
  • Small, frequently-updated state. Counters, session data, feature flags. Object stores rewrite the whole object on every change — that's slow and expensive. Use Redis or a real DB.
  • Transactional data. No ACID, no row-level locking. Money moves through Postgres, not S3.
  • Searching by content. "Find all images tagged 'sunset'" needs a search index. The object store holds the bytes; a separate index holds the searchable metadata.
  • Latency-critical reads. First-byte latency to S3 is tens of milliseconds. Fine for video, terrible for a hot path. Cache it in front, or pick a faster store.

The rule of thumb: if you'd describe the data as a file, S3 is right. If you'd describe it as a record, use a database.

Key takeaway

  • Object storage is for blobs, not state. Files yes, counters no.
  • Pair it with a database for metadata. S3 holds the video. Postgres holds who owns it, who can see it, and when it was uploaded.
  • Objects are immutable. "Updating" means rewriting the whole thing. Build for that or pick a different store.
  • Lifecycle policies are free money. Move cold data to Glacier and watch the bill drop 90%+.
  • Use pre-signed URLs for direct uploads. Don't proxy 100MB videos through your app server.

Quiz available in the SLAM OG app — three questions on file vs. record, immutability, and when object storage is the wrong answer.