Reference architecture
This is the architecture we ran at Prism before building Herm, and it’s a reasonable starting point:| Component | Role |
|---|---|
| Host server | A long-lived instance (we used EC2) running an HTTP server (we used Hono) |
| Agent containers | One Docker container per customer, each running its own Hermes instance |
| Reverse proxy | The same server proxies messages between your app and each container’s Hermes gateway |
| Transport | A WebSocket connection per customer between your backend and their Hermes gateway |
| Storage | A persistent volume per container for the agent’s workspace, memory, and skills |
What you take on
The harness is free; the runtime is the work. Self-hosting means owning everything Herm’s API otherwise provides:- Provisioning and lifecycle — creating, recycling, and tearing down containers as customers come and go
- Durable execution — runs that survive your deploys, crashes, and host migrations
- Credential management — injecting connector secrets without putting them inside the sandbox, plus rotation
- Event transport — bridging each Hermes gateway’s WebSocket to whatever your product consumes (we expose SSE), with reconnection and replay
- Scheduling — keeping automations firing reliably, including retry semantics for runs that fail at 3am
- Observability — capturing per-run traces and surfacing agent activity to your team
- Upgrades — tracking Hermes releases across your whole container fleet, for every new harness feature your customers ask about
When self-hosting makes sense
- Data residency and compliance — customer data can’t leave your infrastructure or region
- Air-gapped or VPC-only environments — the agent must reach internal systems that no managed service can
- An existing platform team — you already operate container fleets and the runtime work lands on paved roads
- Deep harness customization — you’re patching Hermes itself, not just configuring it
deployments.create call.
Moving between self-hosted and managed
Nothing in Herm is a black box, so migration in either direction is file transfer, not re-engineering:- Memory and workspace are plain files. Export them with the Files API (managed → self-hosted) or copy them into a new deployment’s workspace (self-hosted → managed).
- Skills are markdown you already own — point
source: "file"orsource: "url"at them. - System prompts and MCP servers are configuration you bring in either case; your MCP servers don’t change at all.
- No model lock-in — the same
provider/modelchoice applies in both worlds.

