Home Automation Without the Workflow Tax Link to heading
I’ve been running a homelab for a while now. A Beelink mini PC, Podman, Caddy, the usual stack. I have Claude Code as an agent and recently connected it to Slack - I can ask it to restart pods, update configs, whatever. It works because there’s no UI between me and the machine. Just intent, expressed in natural language, executed by an agent that has tool access.
So when I turned to home automation recently, I had one requirement: no workflow builders.
The n8n Problem Link to heading
Every time you ask about home automation orchestration, someone pitches n8n. Or Node-RED. Or Home Assistant’s YAML automations. They all share the same assumption - that you need a visual canvas to wire triggers to actions.
I don’t. I have an agent that can reason about intent. I have a codebase I control. And most importantly, I have my most favoured thing in the wonderful world of software:
State Machines.
A garage door isn’t a workflow. It’s a finite state machine with five states, a handful of transitions, guards for safety, and timeouts for fault detection. Likewise, a thermostat is a state machine (off, idle, heating, and cooling). A security system is a state machine (unarmed, armed, triggered and alarming). Modelling these as drag-and-drop workflows is a category error.
The question is: what sits between the state machines and the consumers? The agent in Slack, a dashboard on my phone, sensors reporting over MQTT - they all need to talk to the same thing. I need an API contract.
One Envelope, Every Consumer Link to heading
I’d already been working on OpenCALL - an operation-based API spec designed for a world where humans and agents are equal consumers. The core idea: one endpoint (POST /call), one envelope, one contract. No REST resource hierarchies, no separate agent protocol.
It turns out home automation is a perfect domain for it. Here’s why.
An agent doesn’t think in PUT /devices/garage-door. It thinks: open the garage door. OpenCALL models that directly:
{
"op": "v1:garage.open",
"args": {},
"ctx": {
"requestId": "a1b2c3d4-...",
"sessionId": "slack-daniel-20260218",
"idempotencyKey": "garage-open-20260218T143022Z"
}
}
The operation name carries the intent. The args carry the parameters (if needed). The context handles correlation, idempotency, and tracing. Same envelope whether it’s Claude calling from Slack, a button press on a dashboard, or a routine triggered by a location update from my phone.
State Machines as Controllers Link to heading
The plan is that every device in the house becomes an XState actor. Take the garage door:
States: closed → opening → open → closing → closed
any state → fault
Events: OPEN_REQUESTED, CLOSE_REQUESTED, SENSOR_OPEN, SENSOR_CLOSED, OBSTRUCTION_DETECTED
Guards: notObstructed
Timeout: opening/closing → fault after 30s
open → auto-close after 10 min
The OpenCALL server dispatches operations to these actors. The mapping is direct - an op name routes to an actor and an event:
const controllers = {
"v1:garage.open": {
actor: "garageDoor",
event: "OPEN_REQUESTED",
executionModel: "async",
completionState: "open",
},
"v1:garage.close": {
actor: "garageDoor",
event: "CLOSE_REQUESTED",
executionModel: "async",
completionState: "closed",
},
"v1:garage.getState": {
actor: "garageDoor",
executionModel: "sync",
handler: (actor) => ({
state: actor.getSnapshot().value,
obstructionDetected: actor.getSnapshot().context.obstructionDetected,
}),
},
};
Sync operations read state and return immediately. Async operations send an event to the actor, return 202 Accepted, and the caller polls until the state machine reaches its target state or faults.
This is where OpenCALL’s execution lifecycle maps directly to the state machine. The spec defines forward-only state transitions: accepted → pending → complete | error. XState enforces forward-only transitions by design. They’re the same idea expressed at different layers.
The Garage Door, End to End Link to heading
I already have a wifi-connected garage door, along with an AC controller, solar inverter, and a couple of smart locks. These are the starting point. Here’s how the flow will work once the OpenCALL server is up, using the garage door as the example.
I type @bot open the garage door in Slack; and yes, voice control is on the roadmap…
1. Agent discovers capabilities
On startup, Claude fetches GET /.well-known/ops and gets the full operation registry - every operation, every argument schema, every execution model. It knows v1:garage.open exists, that it’s side-effecting, that it requires an idempotency key, and that it’s async.
2. Agent invokes the operation
Claude sends POST /call with the envelope above.
3. Server dispatches to the state machine
The server validates the envelope against the registry schema, looks up the controller for v1:garage.open, and sends OPEN_REQUESTED to the garage door actor. The actor checks the notObstructed guard, transitions from closed to opening, and fires its activateRelay action - which publishes an MQTT message to the relay topic.
The server creates an operation instance, persists it, and returns:
{
"requestId": "a1b2c3d4-...",
"state": "accepted",
"location": { "uri": "/ops/a1b2c3d4-..." },
"retryAfterMs": 1000
}
Claude polls GET /ops/a1b2c3d4-... every second and gets:
{
"requestId": "a1b2c3d4-...",
"state": "pending",
"result": { "state": "opening", "triggeredBy": "agent:claude-slack" }
}
4. Physical world happens
The MQTT message hits Mosquitto, the relay fires, the garage door motor starts. After a short time the contact sensor detects the door is open and publishes back to MQTT. The bridge translates that into a SENSOR_OPEN event and sends it to the actor.
The actor transitions from opening to open. The operation instance is updated to complete.
5. Agent polls and responds
Claude polls GET /ops/a1b2c3d4-... and gets:
{
"requestId": "a1b2c3d4-...",
"state": "complete",
"result": { "state": "open", "triggeredBy": "agent:claude-slack" }
}
Claude responds in Slack: “Garage door is open.”
6. Fault handling
If the contact sensor never fires within 30 seconds, the XState timeout transitions the actor to fault. The operation instance becomes state: "error":
{
"requestId": "a1b2c3d4-...",
"state": "error",
"error": {
"code": "DEVICE_TIMEOUT",
"message": "Garage door did not reach open state within 30s",
"cause": { "device": "garageDoor", "expectedState": "open" }
}
}
Claude tells me something went wrong. Structured, machine-readable, no HTTP status code guessing. At least I’ll know it’s not open when I come up the driveway in the car.
MQTT Is Already a First-Class Binding Link to heading
This is where the spec earns its keep. OpenCALL defines an MQTT transport binding - envelope published to ops/{op}, responses on ops/{requestId}/response. The sensors and devices on my network won’t be second-class citizens adapted into an HTTP world. They’ll speak the same protocol natively.
A temperature sensor can report in by publishing to ops/v1:sensors.report:
{
"op": "v1:sensors.report",
"args": {
"sensorId": "living-room-temp",
"zone": "living_room",
"type": "temperature",
"value": 23.5,
"unit": "celsius"
},
"auth": {
"iss": "homelab.local",
"sub": "device:arduino-livingroom",
"credentialType": "apiKey",
"credential": "..."
}
}
Same envelope. Same validation. The auth block is in the envelope because MQTT doesn’t have an Authorization header - exactly as the spec prescribes. The server receives it, updates state machines, pushes to any active stream subscriptions.
The Dashboard Generates Itself Link to heading
Here’s the thing about /.well-known/ops - it’s not just for agents. It’s a complete schema for every operation in the system. A dashboard doesn’t need its own API. It fetches the registry on load and renders controls from it.
Device controls come from operations with sideEffecting: true. Sensor feeds come from operations with executionModel: "stream". The argsSchema tells you what inputs to render. The resultSchema tells you what to display.
A POST /call with v1:sensors.subscribe returns a stream object pointing to a WebSocket. The dashboard connects and renders frames as they arrive - raw encoded data, no per-frame envelope overhead. That’s defined in the spec too. Streams are one-way, server to client. If the dashboard needs to send a command, that’s a separate POST /call.
The dashboard and the agent become functionally identical consumers. They discover capabilities the same way, invoke operations the same way, and receive results the same way. The only difference is one renders pixels and the other renders sentences.
Routines Are Just Orchestrator Actors Link to heading
A “leaving home” routine isn’t a workflow. It’s a parent actor that coordinates child actors:
{
"op": "v1:routine.execute",
"args": { "routine": "leaving_home" },
"ctx": {
"requestId": "...",
"idempotencyKey": "leaving-home-20260218T180000Z"
}
}
The routine actor sends CLOSE_REQUESTED to the garage door, ARM to the alarm, ALL_OFF to the lighting controller. It waits for all child actors to reach their target states. If one faults, the parent knows - because XState actors communicate. The final result includes the status of every step.
This gives you something that n8n and friends struggle with: proper fault handling across coordinated operations. If the garage door is obstructed, the routine doesn’t silently continue. It reports the failure in a structured result, and the agent can tell you which step failed and why.
The Agent Builds the House Link to heading
Here’s the part that ties it all together. Because every device is a state machine and every interaction is an OpenCALL operation, Claude Code can build the entire system for me. The machine definitions are structured data. The controller mappings are declarative. Adding a new device is a well-defined task: define the states, define the transitions, register the operations, wire up the MQTT topics. Once Claude has done it for the first device, it can see the pattern and do it again for every device after that.
If I buy a new motion sensor for the back door, I don’t need to open a workflow editor and start wiring nodes. I tell Claude I want the back landing light to come on when someone walks past, but only after dark. Claude generates the machine definition, the controller mapping, and the MQTT bridge config. It works out the sunset time for the current day of the year, or better yet, reads the solar irradiance from the panels to determine whether there’s actually enough daylight. I don’t need to go and look up which feeds provide that data. Claude has access to the registry; it can discover the operations, read the schemas, and wire everything together. One conversation, one commit, done.
But it goes further than just building what I ask for. Because the agent has tool access to external data, it can make decisions that a static automation never would.
Take the home battery. The battery state of charge is a stream feed, and the state machine can have a hook that fires when the charge drops below 20%. In a traditional setup, that hook would send me a notification. But why tell me? The agent has more data than I do and can reason about it faster. So instead, the hook calls Claude.
Claude checks tomorrow’s weather forecast. Wall-to-wall cloud and rain - no solar coming to fill the battery. It checks the current tariff schedule - off-peak runs until 6am. It checks the forecast high for tomorrow - 18 degrees, so the heater is probably going to be working hard. Without intervention, I’ll be pulling from the grid at peak rate after 4pm when I start cooking dinner.
So the agent sends v1:battery.charge with a target state of charge and a deadline, and the battery controller schedules charging on the overnight off-peak tariff. Not because someone built a “check weather then charge battery” workflow, but because an agent with access to a weather API and the operation registry reasoned about the situation and acted on my behalf. The same agent that opens my garage door can also save me money on my electricity bill, using the same protocol, the same envelope, the same state machines.
That’s the difference between a workflow and an agent with a contract. A workflow does what you told it to do. An agent does what makes sense. And in reality, that’s probably why Peter Steinberger wrote Clawdbot/OpenClaw - Your assistant. Your machine. Your rules.
The Stack Link to heading
One pod. One Bun service to start with.
- XState v5 - state machine runtime and actor system
- Hono - HTTP server, handles
POST /call,GET /ops/{id},GET /.well-known/ops - MQTT.js + Mosquitto container - device communication via the MQTT binding
- Zigbee2MQTT - Zigbee sensor and device bridge (once I get the Zigbee modules set up)
- ESP32/Arduino/Raspberry Pi - custom sensors and relays
- PostgreSQL - operation instance persistence and event log (SQLite works, but I have a DB running in the homelab)
- React SPA - dashboard, served from the same container behind Caddy
The OpenCALL server is probably 2,000-3,000 lines of TypeScript. Most of that is machine definitions, which are structured data - JSON that Claude can read, reason about, and generate.
Why This Works Link to heading
The reason this architecture holds together is that OpenCALL doesn’t care who’s calling. The spec was designed from the start for a world where humans and agents are equal consumers. Home automation is manifest in that world, bridging the physical to the digital.
I already have the pieces: a bot that works, a Slack connection that works, wifi devices that can be controlled, and sensors that can report. What I’m building now is the layer that connects them all through a single contract. Once the first device is wired up through OpenCALL, every device after that follows the same pattern. And once the Zigbee modules are running, the number of things that can participate in this system grows dramatically - motion sensors, contact sensors, environmental monitors, smart plugs, the lot.
When I say @bot open the garage door, Claude will discover the operation from the registry, construct a valid envelope, send it, poll for completion, and report back. It won’t need a custom integration. It needs one tool - call - and a self-describing registry to ground against.
One protocol. One contract. Every consumer. That’s the pitch of OpenCALL, and home automation is where I intend to prove it works outside of a web page.
OpenCALL is an open specification. The full spec, test suite, and reference implementations are at github.com/dbryar/call-api.