Serverless & FaaS¶
Stop running servers. Bring code, leave operations.
The hook¶
You write a function. You upload it. You never log into a server again.
When a request comes in, the platform runs your code. When no requests come in, it runs nothing — and charges you nothing. Auto-scaling, patching, OS updates, capacity planning: all the provider's problem.
That's the pitch behind AWS Lambda, Google Cloud Functions, Azure Functions, and Cloudflare Workers. Serverless flipped the operating model — you pay per millisecond of execution instead of per hour of idle VM. The catch: cold starts, vendor lock-in, and a hard ceiling on how long any single invocation can run.
The concept¶
Function-as-a-Service (FaaS) is the purest form of serverless. You write a single function with a defined input and output, deploy it, and the provider runs it on demand.
Three things matter:
- Trigger — what causes the function to run (HTTP request, queue message, file upload, cron schedule)
- Code — your function, usually <100 lines, stateless, single-purpose
- Output — return value, side effect (DB write, message published), or both
The runtime, the OS, the scaling decisions, the patching, the monitoring agents — none of that is yours. Auto-scales from 0 to 10,000+ concurrent instances. Bills in 1ms increments. When traffic dies, instances die with it.
The mental shift: you're not running a server that handles requests. You're handing the platform a piece of logic and a list of triggers, and saying run this when those happen.
Diagram¶
flowchart LR
API[API Gateway] -->|HTTP| F1[Lambda: getUser]
F1 --> DDB[(DynamoDB)]
S3[S3 bucket] -->|object created| F2[Lambda: makeThumbnail]
F2 --> S3o[(S3 thumbnails)]
SQS[SQS queue] -->|message| F3[Lambda: processOrder]
F3 --> DDB
CRON[EventBridge cron] -->|every 5min| F4[Lambda: cleanup]
F4 --> DDB
Same function shape every time. The trigger source is what changes — HTTP, object event, queue message, schedule. The platform handles the plumbing.
Example — a real serverless web app¶
A modern startup builds their backend without provisioning a single server.
The stack:
- CloudFront CDN — caches static assets globally
- S3 — hosts the React frontend (HTML, JS, CSS)
- API Gateway — routes
/api/*requests to Lambda functions - Lambda — one function per endpoint:
getUser,createOrder,listProducts - DynamoDB — serverless NoSQL for app data
- Cognito — managed auth (also serverless)
What the bill looks like at 1M requests/month:
| Service | Free tier | Cost beyond free tier |
|---|---|---|
| Lambda | 1M requests + 400K GB-sec | ~$0 (covered) |
| API Gateway | 1M REST calls | ~$3.50 |
| DynamoDB | 25 GB + on-demand reads/writes | ~$1–5 |
| S3 + CloudFront | 5 GB + 50 GB egress | ~$1 |
You're paying single-digit dollars per month for an app that can absorb a 100x traffic spike without you touching anything.
The trade-offs that bite:
- Cold start latency — first invocation after idle takes 100–1000ms while the runtime boots. Bad for user-facing APIs that get sparse traffic.
- 15-minute hard timeout — Lambda kills any invocation past 15 minutes. Long jobs need Fargate, Batch, or Step Functions.
- Vendor lock-in — your handler signature, the SDK quirks, the IAM model — all AWS-specific. Porting to GCP isn't a recompile, it's a rewrite.
- Cost inversion at scale — at 100M+ steady requests/month, an always-on container fleet beats Lambda on raw compute price.
Who actually runs this way: most modern startups for backend APIs (the "serverless-first" cohort), IFTTT for cross-service integrations, A Cloud Guru for their entire learning platform. Cloudflare powers serverless at the edge for millions of sites — including this kind of architecture moved closer to the user.
Mechanics — picking a serverless platform¶
The big four, plus the frontend-friendly wrappers.
| Platform | Cold start | Max execution | Languages | Pick when |
|---|---|---|---|---|
| AWS Lambda | 100–1000ms (10s+ for Java/.NET) | 15 min | Node, Python, Java, Go, Ruby, .NET, custom runtimes | You're already on AWS, want the biggest ecosystem |
| Google Cloud Functions | 100–500ms | 60 min (gen2) | Node, Python, Go, Java, .NET, Ruby, PHP | You're on GCP, integrating with BigQuery/Pub/Sub |
| Azure Functions | 200–2000ms | 10 min (consumption) / unlimited (premium) | C#, JS, Python, Java, PowerShell | Enterprise .NET shops, deep AD integration |
| Cloudflare Workers | <5ms | 50ms CPU (paid: 30s) | JS/TS, Rust/WASM | Edge latency matters, request → response is fast |
| Vercel / Netlify Functions | 100–500ms | 10–60s | Node, Python, Go (mostly Lambda underneath) | Frontend-first, Next.js/Nuxt apps |
Lambda is the default if you're not sure. Most mature, biggest community, most triggers, most language support. The downside: cold starts hurt for user-facing APIs in less popular runtimes.
Cloudflare Workers is the outlier. Workers run on V8 isolates — same engine that runs Chrome — instead of containers. One V8 process, many isolated JS contexts. No container boot, no language VM init. Result: cold starts measured in milliseconds, not seconds. Trade-off: tighter CPU limits and a smaller runtime API (no full Node.js, no native binaries).
The cold start problem¶
A cold start happens when the platform has no warm instance ready. It has to:
- Pull your container image
- Start the language runtime (JVM, .NET CLR, Python interpreter)
- Load your dependencies
- Run your initialization code
- Then finally invoke your handler
That's where the 100ms–10s tail comes from. Three workarounds:
- Provisioned concurrency (Lambda) / Minimum instances (Cloud Functions) — pay to keep N instances warm. Defeats the "pay only when running" pitch, but kills cold starts.
- SnapStart (Lambda for Java) — snapshot a pre-initialized JVM and restore it instead of booting from scratch. Cuts Java cold starts ~10x.
- Pick a faster runtime — Node and Python cold-start in ~100ms; Java and .NET can hit seconds. Switch language for the latency-critical paths.
If your function is on a hot path with strict latency, FaaS may be the wrong tool. Reach for Workers (edge isolates), Fargate (always-on container), or just a regular service.
Related concepts¶
| Concept | What it is | How it relates |
|---|---|---|
| Edge computing | Running code at CDN POPs near users | Workers and Lambda@Edge are FaaS at the edge — covered next page |
| Cloud AI services | Managed AI APIs (Bedrock, Vertex, OpenAI) | Lambda is the glue — function triggers an AI call, formats the response, stores the result |
| Cloud cost management | Tracking and controlling cloud spend | Serverless can be cheaper at low/spiky scale and more expensive at steady high scale — the cost model isn't intuitive |
| Event sourcing | Storing state as a log of immutable events | Serverless is the natural compute layer for event-driven systems — events trigger functions, functions emit events |
| Microservices | Decomposing apps into small services | FaaS pushes microservices to the limit: every function is a microservice. Cross-link to System Design's microservices page. |
| API Gateway | The front door for HTTP/REST APIs | The most common Lambda trigger — routes URLs to functions, handles auth, rate limiting, throttling |
| Step Functions / Workflows | Orchestration for multi-step serverless jobs | What you reach for when one function isn't enough and you need state machines, retries, and >15min execution |
| Queues (SQS, Pub/Sub) | Async messaging between services | The decoupling layer — queue absorbs bursts, Lambda drains it at its own pace |
When (and when not) to use serverless¶
Use serverless when:
- Spiky or unpredictable traffic — auto-scale-to-zero means you don't pay for idle capacity at 3am
- Event-driven workloads — file uploads, queue messages, cron jobs, webhooks. The trigger model is the whole point.
- Glue code between services — "when X happens, call Y, write to Z" — write 30 lines instead of provisioning a service
- Prototyping speed matters — get a functioning backend in an afternoon without thinking about infra
- Background jobs — image processing, log aggregation, async ETL
Skip serverless when:
- Steady high traffic — at 100M+ requests/month with stable load, an always-on Fargate or EKS fleet is cheaper. Lambda's per-invocation pricing stops being a deal.
- Long-running compute — anything past 15 minutes (Lambda) or 60 minutes (Cloud Functions). Use Batch, Fargate, or VMs.
- Tight latency requirements — cold starts can hit 1s+. If your SLO is p99 < 200ms and traffic is bursty, you'll either pay for provisioned concurrency or pick a different platform.
- Heavy local state or large in-memory caches — functions are stateless and short-lived. Loading a 2GB model on every cold start is a non-starter. Use a long-running service.
- Strict portability requirements — every FaaS platform has its own handler shape, trigger model, and SDK. Multi-cloud abstraction is hard.
The honest take: serverless isn't free, isn't latency-free, isn't lock-in-free. It's the right tool for spiky, event-driven, glue work — and the wrong tool for everything else.
Key takeaway¶
- Serverless wins on spiky workloads + dev speed. It loses on steady high traffic + tight latency.
- Cold starts are the price of scale-to-zero. Provisioned concurrency, SnapStart, or Workers if you can't tolerate them.
- 15-minute timeout is a hard ceiling. Long jobs go to Batch, Fargate, or Step Functions.
- Lock-in is real. Handler signatures, IAM, triggers — all platform-specific. Pick the cloud you're already on.
- The cost crossover bites. Cheap at low scale and burst. Watch the bill once traffic goes steady and high.
Quiz available in the SLAM OG app — three questions on cold starts, Lambda's hard limits, and why Workers are different.