AWS Lambda MicroVMs: run untrusted code with VM-level isolation (no infra to manage)
AWS just shipped Lambda MicroVMs, a new serverless primitive that gives each user or session a VM-level isolated sandbox, with near-instant launch and state preserved for up to 8 hours, all on Firecracker. Here is what it is, when to reach for it instead of a plain Lambda Function, and how to architect on top of it.

Staff Engineer @Serverless Guru | AWS Community Builder | Specialist in Serverless, AWS & Event-Driven Architectures | Speaker & Content Creator @willpeixoto.dev
Let me put you in a situation. You need to run a piece of code you did not write. Maybe it is the script your user pasted into your platform, maybe it is the snippet an AI agent just generated and wants to execute. And then comes the question that keeps anyone working with multi-tenant up at night: how do I run this without handing a stranger the keys to the house?
Until last week you had three paths, each with a catch. A VM gives you strong isolation but takes minutes to boot. A container starts in seconds but shares a kernel, so running untrusted code there takes a pile of hardening. And the Lambda Function was built for short request-response, not for a session that has to keep live state between one interaction and the next (externalizing it to DynamoDB stores the data, not the live runtime: the running process, the loaded packages, the memory). In the end you chose between performance and isolation. No way around it. Or there was.
Container, VM, or Lambda: the trade-off none of them solved alone
This pattern got common: AI coding assistants, interactive code environments, analytics, vulnerability scanners, game servers running player scripts. They all need the same thing: give each user their own environment to run code the team did not write, safely and without lag.
The knot is that real isolation and low latency pull in opposite directions. From a security angle you want a hard boundary between tenants (the Security pillar of the Well-Architected Framework: isolate what is not trusted). From an experience angle you want that environment up the instant the user shows up. Reconciling the two was the expensive work.
And there is a nice irony in this story. We spent years learning to build stateless apps, and now state is a requirement again.
The solution to the future was hiding in the past.
That is a line a friend dropped in a conversation, and it has not left my head since. Ever felt that way? Because I have. And it is roughly what Lambda MicroVMs does: it brings state back, without handing you the weight of a full VM.
What Lambda MicroVMs is
Lambda MicroVMs is a new primitive inside Lambda, built exactly for that gap. Each MicroVM gives a single user or session its own isolated environment that boots fast, keeps memory and disk for the whole session, and pauses to a low cost when the user steps away.
The magic comes from Firecracker, the same lightweight virtualization that already runs over 15 trillion Lambda invocations a month. This is not raw new tech, it is the mature foundation of Lambda itself, exposed in a new way.
The model is image-then-launch:
You build the image once (AWS runs your Dockerfile, initializes the app, and takes a snapshot of memory and disk). After that, every MicroVM you launch resumes from that snapshot instead of cold-booting. That is why launch and resume are near-instant, even for a multi-gigabyte session.
What it is actually for (with examples you will recognize)
The main cue: this only enters the picture if you are building a platform that runs third-party code. If your app does not execute outside code, you do not need it. It is a building block for people who build that kind of product:
Replit, CodeSandbox, "VS Code in the browser": the user types code in the browser and it runs isolated, per user, holding state while the tab is open. That "runs isolated" is the MicroVM.
Code interpreter (like ChatGPT's or Claude's): you ask "plot this CSV", the AI writes Python and runs it to answer you. The runtime that executes that generated code, isolated per conversation, is the use case.
CI/CD runner (and relatives): a job runs the code of a Pull Request that may come from any stranger's fork, untrusted by definition, so you want an isolated, disposable runner per job. Same family: a scanner that runs a suspicious binary, a coding-interview platform (the candidate's code runs isolated), an AI agent that runs shell commands. The thread tying it all together: each user, session, or job needs its own isolated environment, and the code running there is not code you wrote. That is the cue to use a MicroVM instead of a Lambda Function.
Lambda Function or Lambda MicroVM?
They do not compete, they complete each other. The official comparison:
| Lambda Functions | Lambda MicroVMs | |
|---|---|---|
| Best for | request-response or event-driven (APIs, data processing, automation) | persistent environments running user or AI-produced untrusted code |
| Programming model | function handler invoked in a supported runtime | any application: run your own binaries, listen on ports, use Linux OS capabilities |
| Duration | up to 15 min per invocation; multi-step workflows up to a year with Lambda Durable Functions | up to 8 hours per session; suspend and resume across sessions |
| Runtime | service-provided runtimes (or customer-provided) | customer-provided MicroVM images |
| Inbound networking | direct invocations or event-source integrations; response streaming | inbound access to any port using OSI Layer 7 protocols |
| Concurrency | one request per execution environment at a time | multiple concurrent connections per MicroVM |
| Environment state | warm starts may reuse the environment, but state may not persist across invocations | memory and disk state preserved on suspend, restored on resume |
| Scaling | automatic: Lambda creates and destroys environments in response to traffic | developer-controlled: you create, suspend, resume, and terminate via API |
| Lifecycle | fully managed by Lambda | developer-controlled, with optional idle policies |
| Pricing | per-request + GB-seconds | per-second of compute while running + snapshot storage while suspended |
The most common confusion: people assume the duration is the same as Lambda's. The startup is similar (both resume from a snapshot), but a Function dies at 15 minutes while a MicroVM holds a session for up to 8 hours with state intact. The real design: your app keeps Lambda Functions for the event-driven backbone, and calls MicroVMs only for the steps that need to run untrusted code in isolation.
How it works in practice: from endpoint to orchestration
Three things that trip people up at first, together.
The endpoint has a status. When you call run-microvm, you get an ID and a dedicated HTTPS endpoint for that MicroVM. But it is not ready instantly: it goes through states, from launch to RUNNING (about 2 seconds), and when idle it moves to suspended, coming back on resume. The endpoint is per MicroVM, per session.
One image, many MicroVMs. You build the image once (create-microvm-image) and each MicroVM is a run-microvm. Want two? Call it twice, and you get two independent instances. Idle behavior is governed by the idle-policy: maxIdleDurationSeconds (suspend after X idle) and autoResumeEnabled (the next request wakes the MicroVM on its own, in about 1s, no manual restart). When you are done, terminate-microvm releases everything.
You become the orchestrator. Since the endpoint is per session, something has to decide when to launch and where to route. Typically a Lambda Function in the backbone does it: it keeps a session -> MicroVM map (a store like DynamoDB in production), calls RunMicrovm on a user's first access, stores the ID and endpoint, mints a short-lived token with CreateMicrovmAuthToken, and proxies the request to the MicroVM's endpoint with the X-aws-proxy-auth header. If the instance is suspended and autoResume is on, the request itself wakes it. Add a routine to terminate orphan MicroVMs and you have the skeleton. The backbone code is in the next post in the series. And do not confuse this with Step Functions: MicroVM is the execution environment, Step Functions is an orchestrator, different layers.
Cost, limits, and what is still missing
Cost is a decision, not a detail. Werner Vogels keeps hammering in the Frugal Architect that cost is an architecture requirement, not a number you discover on the bill. The suspend is exactly that in practice: you pay a lot for VM-level isolation, but only while the user is active. When they leave, the MicroVM suspends and the cost drops, with no loss of state. Designing your idle-policy on purpose is a cost decision. The model, from the official table: you pay per second of compute while it runs, and only snapshot storage while it is suspended. Unit prices are on the Lambda pricing page.
Limits: ARM64, up to 16 vCPUs, 32 GB of memory, and 32 GB of disk per MicroVM, and up to 8 hours of total runtime. Provisioning is flexible: you set a baseline and burst up to 4x at peak, paying the baseline while it runs.
IaC: you can use the console, CloudFormation, and CDK.
Why Dockerfile + zip, and not a prebuilt ECR image? Aidan Steele dug into it: Lambda builds two copies of the image, one for Graviton 3 and one for Graviton 4, so it needs the source to recompile. The base comes from ECR Public, but pushing your own prebuilt image from a private ECR as the artifact is not the path. One thing that confuses people coming from containers: ECR does not leave your life. You do not deliver the MicroVM image via ECR, but inside the running MicroVM you can run Docker and docker pull your private ECR images at runtime. ECR is for consumption inside, not for delivering the image itself.
Networking and region: inbound traffic on configurable ports (HTTP/2, gRPC, WebSockets), service-provided JWE auth, outbound to the internet or your VPC. And it is available so far only in US East (N. Virginia, Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo).
When NOT to use it
If the workload is short request-response with no state, it stays a Lambda Function. A MicroVM there is a cannon for a mosquito. And if you just need more than 15 minutes with your own (trusted) code, a MicroVM is also overkill: for a long job, look at Fargate; for a multi-step workflow, Lambda Durable Functions (up to a year, as the table shows). MicroVMs are for when the differentiator is isolating untrusted code, not just going past 15 minutes.
There is also a gotcha AWS itself flags, and it rhymes with the determinism conversation: since the MicroVM boots from a pre-initialized snapshot (the equivalent of Lambda SnapStart, as Aidan Steele confirmed by testing), apps that generate unique content, open connections, or load ephemeral data at init may diverge. The snapshot froze a moment; whatever needs to be fresh per session cannot be frozen along with it. The fix has a name: lifecycle hooks to re-initialize randomness when each MicroVM is created. Map that out before assuming it just works.
Does it kill the container? No, and the reason is even better.
The hype of the week is "containers are obsolete." They are not. Quite the opposite: Aidan Steele tested it and you can run Docker inside a MicroVM, with OS capabilities enabled. So the MicroVM does not kill the container, it is more isolated and still runs containers inside. The honest cut is different: there is one specific spot, running untrusted code in isolation, where you will no longer want to harden a container by hand. There the MicroVM wins. Everywhere else, the container is still king.
The details the docs leave out
Aidan Steele spent launch day poking at the service and found some really interesting things that are not in the official docs. I read it and figured it was worth bringing here:
You can get a shell into the MicroVM, via the
CreateMicrovmShellAuthTokenAPI, with pty as a first-class citizen (Lambda Functions do not have it). Gold for IDE and coding-agent use cases.Outbound UDP is blocked by default and DNS is a local stub, so DNS inside a container falls back to 8.8.8.8 and fails. The fix is to run with Lambda's DNS:
docker run --dns 169.254.169.253, or go via VPC.Lambda network connectors: a reified VPC config (subnets, security groups, an IAM role for the ENI) with its own lifecycle. The network team creates it, the developer just consumes it.
Performance (his tests): image build 2-3 min;
RunMicrovmto RUNNING about 2s, plus 2s to serve; suspend and resume about 1s each.
What you take away
Lambda MicroVMs fills a real gap: VM-level isolation with near-instant launch and per-session state, which no single service delivered together.
It does not replace the Lambda Function, it complements it. Function in the backbone, MicroVM for the untrusted code.
The idle suspend is a deliberate cost lever, design your
idle-policyon purpose.Before locking in architecture: check the region (no São Paulo yet), the limits (ARM64, 16 vCPU, 32 GB, 8h), and the snapshot caveat. This post was the map. In the next one in the series I actually spin up a MicroVM and we prove the isolation in practice, launching two MicroVMs and testing whether one can reach the other, with the repo on GitHub for you to run along.
Got a case where you run user or AI code that today is duct-taped onto a container or a hand-rolled VM? Does this primitive fit? Drop a like, share it with whoever is building a multi-tenant platform, and let's talk. Cheers! =D
Originally published on willpeixoto.dev.





