Sessions - Herm

A session is a conversation with a customer’s agent. Sessions are exposed in the API as threads: each thread maintains its own conversation history, and every unit of work the agent does on it is a run. Threads belong to a single deployment, so one customer’s sessions never share state with another’s.

Creating a thread

Every deployment is created with a default thread — for most products, that’s the customer’s ongoing conversation with their agent, and you never need another. Create additional threads for parallel or scoped conversations:

const thread = await herm.threads.create("dep_7xK9s2", {
  title: "Holiday campaign",
});

A new thread starts with empty conversation history, but memory and files carry over — the agent still knows the customer, it just isn’t continuing the previous chat. Within a thread, Hermes manages context and compaction automatically, so long conversations don’t degrade or overflow the model’s context window. See the Sessions API for listing, inspecting, renaming, and deleting threads.

Starting work

Creating a thread doesn’t start any work. Send a message to begin a run:

const run = await herm.deployments.messages.send("dep_7xK9s2", {
  thread_id: thread.thread_id,
  content: "Plan the holiday campaign hero video.",
});

The call returns immediately; the agent’s responses arrive on the SSE stream. See Send Message for attachments and threading options.

Run lifecycle

A run starts when a message arrives or an automation fires, and moves through these statuses (delivered as run_status events):

Status	Meaning
`running`	The agent is actively reasoning and calling tools
`waiting_on_human`	The run paused on a steering request and is waiting for input
`completed`	The agent finished the task and went idle
`error`	The run failed — details arrive on the event stream

Durability

Runs survive the failures that kill naive agent loops:

Crashes and deploys. An interrupted run resumes from where it stopped — it doesn’t replay from zero or lose the tool calls it already paid for.
Long waits. A run waiting on steering input genuinely sleeps. It can wait minutes or days without holding a worker or a connection, then continues as if no time passed.

Mid-run messages

Customers don’t wait their turn — they send corrections while the agent is still working. A message sent during an active run is folded into the run as steering input rather than corrupting state or being dropped. There’s nothing to configure; it’s how the runtime behaves.

​Creating a thread

​Starting work

​Run lifecycle

​Durability

​Mid-run messages

Creating a thread

Starting work

Run lifecycle

Durability

Mid-run messages