Skip to main content
Hermes is open source, and there’s no lock-in anywhere in Herm’s design — so self-hosting the whole stack is a legitimate option. This page describes the reference architecture, what you take on by running it yourself, and how to move between self-hosted and managed without losing anything.

Reference architecture

This is the architecture we ran at Prism before building Herm, and it’s a reasonable starting point:
ComponentRole
Host serverA long-lived instance (we used EC2) running an HTTP server (we used Hono)
Agent containersOne Docker container per customer, each running its own Hermes instance
Reverse proxyThe same server proxies messages between your app and each container’s Hermes gateway
TransportA WebSocket connection per customer between your backend and their Hermes gateway
StorageA persistent volume per container for the agent’s workspace, memory, and skills
The per-customer container is the load-bearing decision: it gives you data isolation, credential scoping, and blast-radius control by construction. Resist the temptation to multiplex customers onto one Hermes instance.

What you take on

The harness is free; the runtime is the work. Self-hosting means owning everything Herm’s API otherwise provides:
  • Provisioning and lifecycle — creating, recycling, and tearing down containers as customers come and go
  • Durable execution — runs that survive your deploys, crashes, and host migrations
  • Credential management — injecting connector secrets without putting them inside the sandbox, plus rotation
  • Event transport — bridging each Hermes gateway’s WebSocket to whatever your product consumes (we expose SSE), with reconnection and replay
  • Scheduling — keeping automations firing reliably, including retry semantics for runs that fail at 3am
  • Observability — capturing per-run traces and surfacing agent activity to your team
  • Upgrades — tracking Hermes releases across your whole container fleet, for every new harness feature your customers ask about
None of this is exotic engineering, but all of it is mandatory before the agent is production-grade — budget a quarter for a solid first version.

When self-hosting makes sense

  • Data residency and compliance — customer data can’t leave your infrastructure or region
  • Air-gapped or VPC-only environments — the agent must reach internal systems that no managed service can
  • An existing platform team — you already operate container fleets and the runtime work lands on paved roads
  • Deep harness customization — you’re patching Hermes itself, not just configuring it
If none of these apply, the managed API gets you the same stack in one deployments.create call.

Moving between self-hosted and managed

Nothing in Herm is a black box, so migration in either direction is file transfer, not re-engineering:
  • Memory and workspace are plain files. Export them with the Files API (managed → self-hosted) or copy them into a new deployment’s workspace (self-hosted → managed).
  • Skills are markdown you already own — point source: "file" or source: "url" at them.
  • System prompts and MCP servers are configuration you bring in either case; your MCP servers don’t change at all.
  • No model lock-in — the same provider/model choice applies in both worlds.

Talk to us

Running a hybrid setup, or self-hosting with a path to managed later? We’re happy to compare notes on the architecture either way — book a call or email rajit@prismvideos.com.