Serverless & FaaS¶

Stop running servers. Bring code, leave operations.

The hook¶

You write a function. You upload it. You never log into a server again.

When a request comes in, the platform runs your code. When no requests come in, it runs nothing — and charges you nothing. Auto-scaling, patching, OS updates, capacity planning: all the provider's problem.

That's the pitch behind AWS Lambda, Google Cloud Functions, Azure Functions, and Cloudflare Workers. Serverless flipped the operating model — you pay per millisecond of execution instead of per hour of idle VM. The catch: cold starts, vendor lock-in, and a hard ceiling on how long any single invocation can run.

The concept¶

Function-as-a-Service (FaaS) is the purest form of serverless. You write a single function with a defined input and output, deploy it, and the provider runs it on demand.

Three things matter:

Trigger — what causes the function to run (HTTP request, queue message, file upload, cron schedule)
Code — your function, usually <100 lines, stateless, single-purpose
Output — return value, side effect (DB write, message published), or both

The runtime, the OS, the scaling decisions, the patching, the monitoring agents — none of that is yours. Auto-scales from 0 to 10,000+ concurrent instances. Bills in 1ms increments. When traffic dies, instances die with it.

The mental shift: you're not running a server that handles requests. You're handing the platform a piece of logic and a list of triggers, and saying run this when those happen.

Diagram¶

flowchart LR
    API[API Gateway] -->|HTTP| F1[Lambda: getUser]
    F1 --> DDB[(DynamoDB)]

    S3[S3 bucket] -->|object created| F2[Lambda: makeThumbnail]
    F2 --> S3o[(S3 thumbnails)]

    SQS[SQS queue] -->|message| F3[Lambda: processOrder]
    F3 --> DDB

    CRON[EventBridge cron] -->|every 5min| F4[Lambda: cleanup]
    F4 --> DDB

Same function shape every time. The trigger source is what changes — HTTP, object event, queue message, schedule. The platform handles the plumbing.

Example — a real serverless web app¶

A modern startup builds their backend without provisioning a single server.

The stack:

CloudFront CDN — caches static assets globally
S3 — hosts the React frontend (HTML, JS, CSS)
API Gateway — routes /api/* requests to Lambda functions
Lambda — one function per endpoint: getUser, createOrder, listProducts
DynamoDB — serverless NoSQL for app data
Cognito — managed auth (also serverless)

What the bill looks like at 1M requests/month:

Service	Free tier	Cost beyond free tier
Lambda	1M requests + 400K GB-sec	~$0 (covered)
API Gateway	1M REST calls	~$3.50
DynamoDB	25 GB + on-demand reads/writes	~$1–5
S3 + CloudFront	5 GB + 50 GB egress	~$1

You're paying single-digit dollars per month for an app that can absorb a 100x traffic spike without you touching anything.

The trade-offs that bite:

Cold start latency — first invocation after idle takes 100–1000ms while the runtime boots. Bad for user-facing APIs that get sparse traffic.
15-minute hard timeout — Lambda kills any invocation past 15 minutes. Long jobs need Fargate, Batch, or Step Functions.
Vendor lock-in — your handler signature, the SDK quirks, the IAM model — all AWS-specific. Porting to GCP isn't a recompile, it's a rewrite.
Cost inversion at scale — at 100M+ steady requests/month, an always-on container fleet beats Lambda on raw compute price.

Who actually runs this way: most modern startups for backend APIs (the "serverless-first" cohort), IFTTT for cross-service integrations, A Cloud Guru for their entire learning platform. Cloudflare powers serverless at the edge for millions of sites — including this kind of architecture moved closer to the user.

Mechanics — picking a serverless platform¶

The big four, plus the frontend-friendly wrappers.

Platform	Cold start	Max execution	Languages	Pick when
AWS Lambda	100–1000ms (10s+ for Java/.NET)	15 min	Node, Python, Java, Go, Ruby, .NET, custom runtimes	You're already on AWS, want the biggest ecosystem
Google Cloud Functions	100–500ms	60 min (gen2)	Node, Python, Go, Java, .NET, Ruby, PHP	You're on GCP, integrating with BigQuery/Pub/Sub
Azure Functions	200–2000ms	10 min (consumption) / unlimited (premium)	C#, JS, Python, Java, PowerShell	Enterprise .NET shops, deep AD integration
Cloudflare Workers	<5ms	50ms CPU (paid: 30s)	JS/TS, Rust/WASM	Edge latency matters, request → response is fast
Vercel / Netlify Functions	100–500ms	10–60s	Node, Python, Go (mostly Lambda underneath)	Frontend-first, Next.js/Nuxt apps

Lambda is the default if you're not sure. Most mature, biggest community, most triggers, most language support. The downside: cold starts hurt for user-facing APIs in less popular runtimes.

Cloudflare Workers is the outlier. Workers run on V8 isolates — same engine that runs Chrome — instead of containers. One V8 process, many isolated JS contexts. No container boot, no language VM init. Result: cold starts measured in milliseconds, not seconds. Trade-off: tighter CPU limits and a smaller runtime API (no full Node.js, no native binaries).

The cold start problem¶

A cold start happens when the platform has no warm instance ready. It has to:

Pull your container image
Start the language runtime (JVM, .NET CLR, Python interpreter)
Load your dependencies
Run your initialization code
Then finally invoke your handler

That's where the 100ms–10s tail comes from. Three workarounds:

Provisioned concurrency (Lambda) / Minimum instances (Cloud Functions) — pay to keep N instances warm. Defeats the "pay only when running" pitch, but kills cold starts.
SnapStart (Lambda for Java) — snapshot a pre-initialized JVM and restore it instead of booting from scratch. Cuts Java cold starts ~10x.
Pick a faster runtime — Node and Python cold-start in ~100ms; Java and .NET can hit seconds. Switch language for the latency-critical paths.

If your function is on a hot path with strict latency, FaaS may be the wrong tool. Reach for Workers (edge isolates), Fargate (always-on container), or just a regular service.

Concept	What it is	How it relates
Edge computing	Running code at CDN POPs near users	Workers and Lambda@Edge are FaaS at the edge — covered next page
Cloud AI services	Managed AI APIs (Bedrock, Vertex, OpenAI)	Lambda is the glue — function triggers an AI call, formats the response, stores the result
Cloud cost management	Tracking and controlling cloud spend	Serverless can be cheaper at low/spiky scale and more expensive at steady high scale — the cost model isn't intuitive
Event sourcing	Storing state as a log of immutable events	Serverless is the natural compute layer for event-driven systems — events trigger functions, functions emit events
Microservices	Decomposing apps into small services	FaaS pushes microservices to the limit: every function is a microservice. Cross-link to System Design's microservices page.
API Gateway	The front door for HTTP/REST APIs	The most common Lambda trigger — routes URLs to functions, handles auth, rate limiting, throttling
Step Functions / Workflows	Orchestration for multi-step serverless jobs	What you reach for when one function isn't enough and you need state machines, retries, and >15min execution
Queues (SQS, Pub/Sub)	Async messaging between services	The decoupling layer — queue absorbs bursts, Lambda drains it at its own pace

When (and when not) to use serverless¶

Use serverless when:

Spiky or unpredictable traffic — auto-scale-to-zero means you don't pay for idle capacity at 3am
Event-driven workloads — file uploads, queue messages, cron jobs, webhooks. The trigger model is the whole point.
Glue code between services — "when X happens, call Y, write to Z" — write 30 lines instead of provisioning a service
Prototyping speed matters — get a functioning backend in an afternoon without thinking about infra
Background jobs — image processing, log aggregation, async ETL

Skip serverless when:

Steady high traffic — at 100M+ requests/month with stable load, an always-on Fargate or EKS fleet is cheaper. Lambda's per-invocation pricing stops being a deal.
Long-running compute — anything past 15 minutes (Lambda) or 60 minutes (Cloud Functions). Use Batch, Fargate, or VMs.
Tight latency requirements — cold starts can hit 1s+. If your SLO is p99 < 200ms and traffic is bursty, you'll either pay for provisioned concurrency or pick a different platform.
Heavy local state or large in-memory caches — functions are stateless and short-lived. Loading a 2GB model on every cold start is a non-starter. Use a long-running service.
Strict portability requirements — every FaaS platform has its own handler shape, trigger model, and SDK. Multi-cloud abstraction is hard.

The honest take: serverless isn't free, isn't latency-free, isn't lock-in-free. It's the right tool for spiky, event-driven, glue work — and the wrong tool for everything else.

Key takeaway¶

Serverless wins on spiky workloads + dev speed. It loses on steady high traffic + tight latency.
Cold starts are the price of scale-to-zero. Provisioned concurrency, SnapStart, or Workers if you can't tolerate them.
15-minute timeout is a hard ceiling. Long jobs go to Batch, Fargate, or Step Functions.
Lock-in is real. Handler signatures, IAM, triggers — all platform-specific. Pick the cloud you're already on.
The cost crossover bites. Cheap at low scale and burst. Watch the bill once traffic goes steady and high.

Quiz available in the SLAM OG app — three questions on cold starts, Lambda's hard limits, and why Workers are different.