Sofware will eat the World Link to heading

There’s a narrative floating around tech circles that goes something like this:

AI agents are going to replace software developers. Boards are cutting headcount. Junior roles are disappearing. The robots are coming. Link to heading

It’s wrong. Not because agents aren’t capable - they absolutely are - but because the framing misunderstands what developers actually do, and more importantly, what they should be doing.

The SDLC Has Always Been Backwards Link to heading

Every software team has a pecking order. At the top, a handful of senior developers and architects get to do the genuinely interesting work - designing new features, making architectural decisions, shaping the product. Below them, everyone else is punching tickets.

Bug fixes. Style changes. Tech debt. QA rework because someone forgot to mention a requirement in the original spec. “Make the button more round.” “The banner text wraps on mobile.” “This endpoint returns 500 when the input is empty.” You know the work. You’ve done the work. We’ve all done the work.

Every team has that repo. The one no-one wants to touch. The one with 200 tech debt tickets in the backlog that have been there so long they’ve become load-bearing. The one where a simple change triggers a cascade of test failures in code that hasn’t been meaningfully reviewed in two years, sometimes five.

This is the reality of running a software product. Roughly 80% of the development effort goes into maintenance, bug fixes, and incremental changes. The remaining 20% - the part that actually moves the product forward - is rationed out to whoever earned the privilege.

That’s backwards. Humans are creative problem solvers. We’re good at ambiguity, trade-offs, design thinking, and knowing when a requirement doesn’t make sense. We’re terrible at repetitive, well-defined tasks that require precision and patience. We get bored. We cut corners. We introduce regressions because we’re thinking about the interesting problem we’d rather be solving.

Agents don’t get bored.

Realigning the Human Developer Link to heading

The question isn’t “can agents replace developers?” It’s “what should developers be doing in the first place?”

If you accept that 80% of the SDLC is well-defined, repeatable work - work that follows known patterns, has clear acceptance criteria, and can be validated against a test suite - then you have to ask why humans are doing it at all. Not because humans can’t do it, but because it’s a misallocation of the most expensive and creative* resource on the team.

This is what my co-founder and I have been designing for the past few weeks. Not a tool that replaces developers, but a system that realigns them. We’re calling it the Forge.

The pitch is simple: the Forge handles “run the product” so humans can focus on “build the product.”

Twelve Agents, One Team Link to heading

The Forge is a software development team in a box: twelve autonomous agents, each with a specific role, orchestrated through a state machine that models the standard SDLC. The agents are built on the OpenClaw/Nanoclaw pattern - lightweight, containerised agents that interact with frontier models via the Claude Code SDK.

There are five roles in the Forge: the boss, planners, writers, reviewers, and testers. In practice these roles are executed by twelve containerised agents configured with different skill sets depending on their current assignment.

Nine agents operate as general workers, dynamically assigned as planners or writers depending on the stage of the workflow. One agent acts as a dedicated reviewer, one as a tester, and a single boss agent coordinates the system and enforces workflow transitions.

The real innovation in the Forge isn’t the agents themselves. It’s that the SDLC itself is encoded as a state machine that the agents must obey.

The State Machine Link to heading

The entire flow - from issue creation to completed PR - is a state machine. Not a loose pipeline with manual handoffs, but a formally defined automation of the standard SDLC.

flowchart TD A([triggered]) --> B{triage} B -->|rejected| R([respond]) B -->|accepted| C([specify]) C --> D([design]) D --> E([execute]) E --> F([blocked]) F -->|response| E E --> G{review} G -->|failed| H([rework]) G -->|passed| I{test} I -->|failed| H I -->|passed| J([complete]) H --> D style A fill:#4a9eff,color:#fff style B fill:#f5a623,color:#fff style C fill:#4a9eff,color:#fff style D fill:#4a9eff,color:#fff style E fill:#4a9eff,color:#fff style F fill:#e74c3c,color:#fff style G fill:#f5a623,color:#fff style H fill:#e74c3c,color:#fff style I fill:#f5a623,color:#fff style J fill:#2ecc71,color:#fff style R fill:#95a5a6,color:#fff

The boss monitors the state machine. Every transition is logged. Every agent interaction is recorded. When the state machine reaches complete, the PR is added to the issue and the human is notified.

The issue is now fully reviewed, fully tested, ready for final (human) approval and merge to staging.

The Boss Link to heading

The boss is the “team leader”. It is triggered by events emanating from three things: the state transitions, Slack (@forge), and the issue board on Git. As soon as an issue lands - a bug report, a feature request, a QA failure - the boss is notified.

The boss reads the issue and runs a triage skill. It responds directly on the issue with one of three outcomes: accepted, rejected, or needs more information. If the request contradicts existing documentation or represents a significant architectural change, the boss escalates to a human for a decision. It doesn’t guess. It asks.

Once an issue is accepted, the boss sets up the business outcomes needed to deliver it. For a bug fix, that’s a reproduction case and an expected behaviour. For a feature, that’s acceptance criteria derived from the request. Then the issue hits the queue.

From here, the boss orchestrates. It assigns workers, monitors progress, handles blockers, and manages the token burn rate across all active agents. That last part matters - it’s the guardrail against runaway or looping agents. If a worker is burning through tokens without making progress, the boss notices and intervenes.

Because agents are containers, the boss has a neat trick for resource management. When a worker is blocked - waiting for another agent to unblock a dependency, or waiting for a human decision - the boss can pause the container and recover the compute resources. When the blocker is resolved, the container resumes exactly where it left off.

The Workers Link to heading

Workers are the hands. Each one is capable of checking out a branch, working with a frontier model via API calls, and filling the role of a junior developer. They have all the tools they need for development tasks and a skill set that is 100% focussed on delivery.

A worker can be configured in one of three modes:

Planners read the requirements and existing documentation, then produce a delivery plan. They update the docs, break the work into discrete tasks, and create the task list for execution. Critically, planners also write the tests - acceptance tests derived from the business outcomes, written before any implementation begins. TDD or bust. A planner’s output is a set of well-defined, ordered tickets with clear acceptance criteria, and a test suite that defines “done” before a single line of production code is written.

Writers execute the tasks. One branch, one task, one focused delivery. They write code, run it locally, and push when they believe the task is complete. If they hit a problem they can’t solve, they don’t spin in circles. They post a message back to the issue - a real, readable message explaining what they tried and what they need. The boss picks that up and dispatches help.

Reviewers are the quality gate. When a writer marks a task as complete, the reviewer runs a fast-fail pipeline:

  1. Lint and compile - does the code pass the basics?
  2. Unit tests and coverage - is the new code covered by unit tests, and does the full suite pass?
  3. Acceptance criteria - does the output match what was asked for?
  4. Security review - are there any obvious vulnerabilities?
  5. Code quality - does it meet the project’s standards?

The gates are ordered by cost both in terms of CPU and token count. If the code doesn’t compile, there’s no point checking acceptance criteria. If it fails security review, there’s no point running a quality pass. The reviewer exits at the first failure and tells the writer to try again. Fast feedback, minimal token waste.

The Testers Link to heading

Once a task passes review, the tester takes over. Testers are a different breed of agent - they have browser automation tooling for frontend changes and can observe all backend calls and logs during test execution.

The tester runs two rounds:

  1. Business outcome tests - does the change actually deliver what the issue asked for? These are the tests written specifically for this piece of work.
  2. Regression tests - does the change break anything else? This is the full suite.

If both pass, the task is marked as done and sent back to the boss. If either fails, the tester reports what failed and the cycle continues.

Example issue Link to heading

Incoming issue from Slack

“@forge Customer portal banner text wraps incorrectly on mobile when viewed on iPhone”

Forge flow:

  1. Boss triages issue, accepts as bug
  2. Boss writes an issue in Git since the message originated from Slack
  3. Planner specifies design, writes mobile viewport acceptance test
  4. Writer updates SCSS
  5. Reviewer fails build due to hallucinated CSS variable
  6. Writer corrects
  7. Tester runs browser automation with mobile view set to iPhone dimensions
  8. PR generated, link posted in Slack as a response to original request
  9. Human approves PR, pipeline runs, issue resolved.

Agents That Talk to Each Other Link to heading

Here’s the part that makes the Forge more than just a pipeline with AI steps. The agents communicate through a durable message store - specifically, through comments on the issue itself.

When a writer hits a blocker, they don’t silently fail. They post a message:

“I’m blocked. The banner messages can’t be upgraded until the BFF API contract supports a flag in error messages for customerVisible.”

That message is visible to everyone - agents and humans alike. The boss picks it up, reads it, understands the dependency, and dispatches a worker to address the underlying issue. Meanwhile, the blocked worker’s container is paused. No wasted compute, no busy-waiting.

This is what makes the system debuggable. Every decision, every blocker, every handoff is recorded in the issue thread. If a human wants to understand why a task took three attempts, they can read the conversation. It’s the same visibility you’d get from a well-run human team that communicates in writing - except the agents actually do it every time, without exception.

There’s an unexpected side benefit here: agents are relentlessly polite. Review feedback between humans often degrades into terse, unhelpful shorthand - or worse. An agent reviewer doesn’t write “this is broken, where did you learn CSS?” It writes “the requirement calls for a pill-shaped button, however on XL buttons the fixed radius does not fully cover the entire side - the radius should be a calc based on the button’s height.” Fewer messages, more substance, zero ego. The rework cycle actually gets faster because the feedback is precise enough to act on without interpretation.

The dialogue is also how the boss manages priorities. If three workers are blocked on the same upstream change, the boss can see that and prioritise accordingly. If a reviewer keeps failing the same writer on the same issue, the boss can read the feedback loop and decide whether to reassign or escalate.

What’s Left for Humans Link to heading

Everything that matters.

The Forge doesn’t touch architecture decisions. It doesn’t redesign your data model. It doesn’t decide whether to migrate from REST to GraphQL or choose between a monolith and microservices. It doesn’t negotiate requirements with stakeholders or push back on a product decision that doesn’t make technical sense.

What it does is eliminate the grind. The tech debt tickets. The “make this button more round” changes. The QA failures from incomplete specs. The bug fixes in code nobody wants to maintain. The trivial but time-consuming churn that eats 80% of a team’s capacity.

What’s left is the actual interesting part of the SDLC - the feature development and architecture work that improves the product. The work that requires human judgement, creativity, and the ability to say “this requirement doesn’t make sense and here’s why.”

The Forge gives senior developers their time back. But “time back” doesn’t just mean more hours for product work. Part of being a senior developer is making sure your juniors are on their way to mid, and that your mids are capable of becoming seniors. You can’t do that when everyone on the team is punching tickets and triaging customer complaints, hunting down a missing order that CS has raised for the tenth time this month.

Time is far more valuable invested in raising capability than in shipping another patch. The Forge frees up the space for mentoring, for pair programming on the hard problems, for the kind of knowledge transfer that turns a good team into a great one. Instead of reviewing twenty PRs for minor fixes, you review the one PR that changes the authentication architecture - and you do it with the developer who wrote it, so they understand the trade-offs next time.

That’s realignment, not replacement. Developers in 2026 should be positioned where they create the most value: in the product, and in each other. You can’t vibe code if the vibe around the office is dead.

What’s Next Link to heading

The pieces are built and tested individually - Slack integration, Git automation, the agents, the dialogue layer, the execution skills. What’s left is wiring the end-to-end flow into a single autonomous pipeline and letting it loose on a real backlog.

There’s more to come. Shared memory across agents is the killer feature - a lesson learnt by one worker is immediately available to every other. This is on the roadmap and will change the game again. But that’s a post for another day.

The point of all of this isn’t to build a machine that replaces developers. It’s to stop wasting them. The industry has spent years asking “how do we get more output from our team?” when the better question was “what should our team actually be working on?”

Software teams were never supposed to spend most of their time fixing yesterday’s code. The Forge exists so they don’t have to anymore. The Forge unlocks the potential of your team that’s been buried under a backlog of tickets no-one wanted to touch in the first place.


The Forge is built on the Nanoclaw agent pattern and uses the Claude Code SDK for frontier model interactions for cost advantages, but it can use any LLM API. Local LLM inference can also be used to reduce API calls for the Boss and the QA roles, but frontier models are critical for workers and reviewers.

*No disrespect to the actual creatives in marketing. You know who you are.