From Prompts to Specifications Link to heading

Back in September, I wrote about Claude Code and the end of story points - how AI was transforming software engineering from implementation-focused work to requirements-focused orchestration. Five months on, with the release of frontier models like Claude Opus 4.5, Claude Sonnet 4.5, and OpenAI’s GPT-5 series, that transformation has accelerated beyond what I anticipated. The way we interact with AI has fundamentally changed, and with it, what it means to be a software engineer.

The Simple Prompt Era Link to heading

Cast your mind back to early 2025. Working with AI tools like GitHub Copilot or the early Claude API based iterations, we would write prompts like:

You are a technical expert in peer-to-peer communications using
IP technology stacks and cloud infrastructure.

Build me a peer-to-peer chat interface to run on serverless AWS
using Aurora as the backend, including an OpenTofu deployment
pipeline for test and prod environments.

After a bit of back and forth, maybe some polishing of the prompt, it would work. Sort of. The AI would generate something functional, we would iterate on it, fix the obvious issues, and eventually ship something. The prompt was the starting point, not the specification.

This worked because:

The scope was often limited enough for the model to infer intent
We expected to review and refine every line of code
The AI was a productivity tool, not an autonomous worker
Human engineers filled in the gaps

What Changed with Frontier Models Link to heading

The release of agents powered by Opus 4.5 and GPT-5.x changed the equation. These models don’t just understand code; they understand systems, constraints, trade-offs, and business context. They can implement complex features autonomously, maintain consistency across large codebases, and make architectural decisions that align with project conventions.

But this capability created a new problem: the models became good enough that vague prompts produced confident but inconsistent results. Tell three different sessions to “build a user authentication system” and you will get three different architectures, three different security models, and three different approaches to session management.

The bottleneck shifted. It was no longer “can the AI implement this?” but “did we tell the AI exactly what we wanted?”

The Specification-First Workflow Link to heading

Today, my workflow looks nothing like it did 12 months ago. Before I write a single line of code - or more accurately, before any agent writes a single line of code - I spend hours crafting specifications.

Here’s what a modern specification looks like for the same chat interface:

# Project: Peer-to-Peer Chat Interface

## Architecture Specification

### Infrastructure Requirements

- Cloud Provider: AWS
- Compute: Lambda functions (Node.js 20.x runtime)
- Database: Aurora Serverless v2 (PostgreSQL 15.x)
- Real-time: API Gateway WebSocket API for message delivery
- CDN: CloudFront with S3 origin for static assets

### Deployment Pipeline

- IaC Tool: OpenTofu 1.6.x (not Terraform)
- Environments: dev, staging, prod
- State Backend: S3 with DynamoDB locking
- Secrets: AWS Secrets Manager (not SSM Parameter Store)
- CI/CD: GitHub Actions with OIDC authentication

### Code Conventions

- Style: Prettier with project .prettierrc
- Linting: ESLint with typescript-eslint
- Naming: camelCase for functions, PascalCase for types
- Imports: Absolute paths via tsconfig paths
- Error Handling: Result<T, E> pattern, no thrown exceptions
- Logging: Structured JSON via pino

### Security Requirements

- Authentication: JWT with RS256 signing
- Token Storage: httpOnly secure cookies (not localStorage)
- Rate Limiting: 100 messages/minute per user
- Input Validation: Zod schemas at API boundary
- Message Encryption: End-to-end encryption for private chats
- OWASP Compliance: A01-A10 mitigations documented

### Database Schema Constraints

- Soft deletes only (deleted_at timestamp)
- All tables require created_at, updated_at
- UUIDs for primary keys (not auto-increment)
- Foreign key constraints enforced
- Indexes required for all foreign keys
- Message partitioning by conversation_id

### Testing Requirements

- Unit tests: Vitest with >80% coverage
- Integration tests: Supertest against local Lambda
- WebSocket tests: Mock connections for real-time features
- E2E tests: Playwright for critical user flows
- Test data: Factories with @faker-js/faker

This is just the infrastructure section. A complete specification includes API contracts, data models, user flows, error handling matrices, and acceptance criteria for every feature.

The Rise of Specification Tools Link to heading

The industry has caught up with this shift. Tools like OpenSpec have emerged specifically for writing machine-readable specifications that AI agents can consume. Instead of free-form markdown, you write structured documents that define not just what you want, but the constraints, conventions, and quality gates.

IDEs have evolved too. Amazon’s Kiro represents a new generation of development environments that enforce a specification-first workflow:

Specify: Write detailed requirements in a structured format
Design: Generate architecture documents and interface definitions
Allocate: Break work into discrete tasks with dependencies
Execute: Let agents implement against the specification

You literally cannot skip steps. Try to write code before completing your specification and the IDE blocks you. It feels restrictive at first - like training wheels - but it forces the discipline that produces consistent, high-quality output.

We Don’t Review Code Anymore Link to heading

That headline is provocative, but it captures a real shift. With the volume of code being generated by AI agents, traditional line-by-line code review has become impractical. Instead, we have review bots.

These automated reviewers check for:

Security gaps: OWASP Top 10 violations, injection vulnerabilities, authentication bypasses
Logic holes: Unreachable code, impossible conditions, missing edge cases
Consistency: Adherence to project conventions, import patterns, naming standards
Performance: N+1 queries, unbounded loops, memory leaks
Test coverage: Missing tests for new code paths

The security scanner flags potential SQL injection. The logic analyser identifies a race condition in the caching layer. The style checker ensures consistent formatting across 50 files. These run automatically on every commit, before any human eyes see the code.

Human review still happens, but it has moved up the abstraction ladder. We review:

Architectural decisions against system requirements
API contracts against business needs
Test coverage against acceptance criteria
Security model against threat model

We are no longer asking “is this code correct?” We are asking “does this solution solve the right problem in the right way?”

TDD for Requirements, Not Code Link to heading

Test-Driven Development has evolved. In the old model, we wrote tests for code:

// Traditional TDD: Test the implementation
describe("calculateDiscount", () => {
  it("should apply 10% discount for orders over $100", () => {
    expect(calculateDiscount(150)).toBe(15);
  });
});

In the specification-first model, we write tests for requirements before any code exists:

// Specification TDD: Test the requirement
describe("Discount Policy", () => {
  describe("Requirement: Volume discounts encourage larger orders", () => {
    it("GIVEN an order over $100 WHEN discount is calculated THEN 10% is applied", async () => {
      const order = await createOrder({ subtotal: 150 });
      const discount = await discountService.calculate(order);
      expect(discount.percentage).toBe(10);
      expect(discount.reason).toBe("volume_discount");
    });

    it("GIVEN an order under $100 WHEN discount is calculated THEN no discount is applied", async () => {
      const order = await createOrder({ subtotal: 50 });
      const discount = await discountService.calculate(order);
      expect(discount.percentage).toBe(0);
    });

    it("GIVEN a volume discount AND a promotional code WHEN discounts are calculated THEN only the higher discount applies", async () => {
      const order = await createOrder({ subtotal: 150, promoCode: "SAVE5" });
      const discount = await discountService.calculate(order);
      expect(discount.percentage).toBe(10); // Volume > promo
      expect(discount.reason).toBe("volume_discount");
    });
  });
});

These tests define the requirements. They document business rules. They serve as acceptance criteria. The agent then implements code that makes these tests pass.

This is a fundamental inversion. We used to write code, then write tests to verify it. Now we write tests that define what we want, then let agents write code to satisfy them.

Validating Specifications Through Parallel Implementation Link to heading

There is an emerging technique for validating whether a specification is actually complete: parallel implementation.

The process works like this:

Write your specification with full requirements and test suite
Have one agent implement it in Language A (say, TypeScript)
Have a separate agent - with no access to the first implementation - implement the same spec in Language B (say, Go)
Run both implementations against the test suite

If both implementations pass all tests and meet the business objectives, your specification is probably correct. The logic is simple: if two independent implementations, in different languages with different idioms and constraints, can both satisfy the requirements, then those requirements must be sufficiently clear and complete.

If they diverge - one passes tests the other fails, or they handle edge cases differently - then your specification has gaps. The ambiguity that a human engineer might fill in with assumptions becomes visible when two agents make different assumptions.

This is expensive in compute terms, but cheap compared to shipping a production system built on an ambiguous spec. It is also a powerful feedback loop: each failure teaches you where your specifications need more precision.

Some teams are taking this further, using three or four languages to increase confidence. Others use the same language but different agents (Claude vs GPT vs Gemini) to test for model-specific interpretations of ambiguous requirements.

The underlying principle is sound: if your specification cannot survive independent implementation, it is not ready for production.

The Fundamental Shift Link to heading

Here is what people get wrong about AI in software development: they think it means “the AI writes all the code now.”

That misses the point entirely.

An engineer still needs to understand how software works. You cannot write a specification for a distributed system if you do not understand eventual consistency, network partitions, and the CAP theorem. You cannot define security requirements if you do not understand authentication flows, token management, and common attack vectors. You cannot specify performance requirements if you do not understand caching strategies, database indexing, and algorithmic complexity.

The knowledge requirement has not decreased; it has shifted. We need to understand systems at a higher level of abstraction. We need to think about outcomes rather than implementations. We need to communicate precisely about complex technical concepts.

Managing Agents, Not Writing Code Link to heading

The real change is this: we are moving up the chain towards management. Not management of people, but management of agents.

Think about what a good engineering manager does:

Sets clear expectations and success criteria
Provides context about business goals and constraints
Reviews work against requirements
Identifies blockers and removes them
Coordinates between different workstreams
Ensures quality standards are maintained

This is exactly what we now do with AI agents. We are not writing code; we are managing the entities that write code. We define the requirements, provide the context, review the output, and ensure the work aligns with business objectives.

The job title might still say “Software Engineer” but the actual work looks more like “Agent Orchestrator” or “Technical Requirements Architect.”

What This Means for Engineers Link to heading

If you are a software engineer in 2026, you need to develop new skills:

Specification Writing: The ability to express technical requirements with precision. Vague specifications produce vague implementations.

System Thinking: Understanding how components interact, where failures occur, and how to design for resilience. You cannot specify what you do not understand.

Quality Assessment: Rapidly evaluating whether an implementation meets requirements. This requires deep technical knowledge applied at a higher level.

Business Translation: Converting stakeholder needs into technical specifications. The gap between “what the business wants” and “what we build” is now the critical bottleneck.

Agent Orchestration: Understanding how to structure work for AI agents, when to intervene, and how to provide effective feedback.

The Uncomfortable Truth Link to heading

Not everyone will make this transition successfully. The engineers who thrived in the old paradigm - those who took pride in crafting elegant implementations, who knew every library and framework intimately, who could debug assembly code - may struggle in a world where that knowledge has been commoditised.

The engineers who will thrive are those who can think at higher levels of abstraction, who can communicate technical concepts precisely, and who can manage complex systems without being mired in implementation details.

This is not about AI “taking jobs.” It is about the nature of the job changing fundamentally. The same thing happened when we moved from assembly to high-level languages, from on-premises to cloud, from monoliths to microservices. Each transition made some skills less valuable and created demand for new ones.

Looking Forward Link to heading

We are still early in this transition. The tools are maturing rapidly. Specification languages are becoming more sophisticated. Review bots are getting smarter. Agent orchestration platforms are emerging.

In another 12 months, I expect:

Specification tools will become as essential as IDEs
Most code reviews will be automated, with humans reviewing only edge cases
Agent orchestration will be a formal discipline with its own methodologies
Universities may start to teach specification writing as a core skill - though institutional inertia means curriculum changes take years, not months

The engineers who recognise this shift and adapt their skills accordingly will find themselves with unprecedented leverage. Those who insist that “real engineers write code” will find that definition increasingly quaint.

We have moved from prompts to specifications. From writing code to managing agents. From implementation to orchestration.

This is the new reality of software engineering. The question is not whether to adapt, but how quickly.

How has your workflow changed over the past year? Are you spending more time on specifications? Have you adopted any of the new specification-first tools? I would be interested to hear how others are navigating this transition.