The most profound shift in software engineering is transitioning from writing code to expressing intent, trusting intelligent systems to translate intent into working software.

Made with Rinto — analyse your own content free

Introduction to AI-Driven Software Development

Software engineering is undergoing its most significant transformation since high-level programming languages were introduced.

New paradigm for developers

Developers now express what to build, not how to build it, with machines handling the implementation.

Evolution of AI assistance

AI assistance evolved from autocomplete to autonomous agents that can clone repositories and submit pull requests without human keystrokes.

AI Automation Progression

AI automation in coding has progressed through distinct generations, each preserving prior capabilities while raising the ceiling on what an engineer could accomplish.

Stage	Description
~2021 Autocomplete	Simple token predictions; editor guesses next few characters.
~2022 Inline Code Suggestions	Complete entire functions from signature; model understands patterns, not just tokens.
~2023 Chat-Based Generation	Describe features in natural language, receive working implementation; conversation becomes the interface.
~2024-25 Coding Agents	Multi-file edits, tool calling, test execution, iterative self-correction; agent runs its own loop.
~2025-26 Autonomous Agents	Clone repositories, plan architecture, execute in sandboxes, run full test suites, submit pull requests; no human keystrokes required.

Paper's Purpose

This paper traces the spectrum from vibe coding to agentic engineering, examines the evolving developer role, and outlines adoption strategies for dependable software.

Target Audience

The paper targets software engineers, engineering managers, architects, and technical leaders seeking to understand AI's impact on SDLC.

AI Agents Overview

An AI agent is a software system that perceives a goal, plans steps to reach it, takes actions through tools, observes the results, and iterates until the goal is met or stopped.

What is Vibe Coding

Vibe coding is an approach where developers describe what they want in natural language, accept AI output, and fix errors by prompting the AI with error messages.

Vibe Coding to Agentic Engineering Spectrum

Vibe coding and agentic engineering are endpoints on a spectrum, differentiated by the structure, verification, and human judgment surrounding AI output.

Vibe Coding vs Agentic Engineering Spectrum

The spectrum highlights varying levels of structure, verification, and human judgment from casual vibe coding to disciplined agentic engineering.

Dimension	Vibe Coding	Structured AI-Assisted Coding	Agentic Engineering
Intent specification	Casual natural language prompts	Detailed prompts with examples and constraints	Formal specs, architecture docs, memory files
Verification	"Does it seem to work?"	Manual testing, spot-checking	Automated test suites, CI/CD gates, LM judges
Codebase understanding	Minimal; developer may not read the generated code	Selective review of critical paths	Comprehensive review of architecture; AI handles implementation details
Error handling	Copy-paste error messages back to the AI	Developer diagnoses root cause, AI implements fix	Agents self-diagnose within defined bounds; humans handle architectural issues
Appropriate scope	Prototypes, scripts, personal projects, hackathons	Features within established codebases	Production systems, team-scale development
Risk profile	High; acceptable for disposable code	Moderate; human judgment at key checkpoints	Low; systematic verification at every stage

Verification is key differentiator

The biggest differentiator between vibe coding and agentic engineering is how outputs are verified, with agentic engineering using both tests and evaluations.

Test and Evaluation Verification

Tests verify deterministic parts by checking input-output, while evaluations verify non-deterministic parts like agent trajectory and quality.

Context Engineering

Context engineering is the practice of providing AI agents with rich, structured information about a codebase, architecture, conventions, and intent.

Quality depends on context, not prompts

The quality of AI-generated code depends more on the quality of provided context than on clever prompts.

Six Primary Types of Context

Developers must consider six primary types of context when working with AI agents: Instructions, Knowledge, Memory, Examples, Tools, and Guardrails.

Static vs Dynamic Context

Context engineering balances elements possessed upfront (static) versus those retrieved on demand (dynamic), creating a critical separation.

Static vs Dynamic Context

Static context is always loaded and expensive, while dynamic context is loaded on demand and is more efficient.

Context Type	Loading Mechanism	Token Cost	Characteristics	Examples
Static Context	Always loaded, every interaction	High	Expensive but reliable; agent never forgets	System instructions, rule files (AGENTS.md), global memory, core guardrails
Dynamic Context	Loaded on demand, per task	Low per turn	Efficient and scalable; pay only for what you use	Agent Skills (triggered by task match), tool results, retrieved documents (RAG)

Architectural Decision for Context

The decision of what belongs in static versus dynamic context is an engineering trade-off that should be treated as a first-class architectural decision.

Agent Skills for Dynamic Context

Agent Skills are structured, portable packages of procedural knowledge loaded by the agent only when a task requires them, enabling lightweight generalists to act as specialists.

Context Engineering vs Prompt Engineering

The shift from "prompt engineering" to "context engineering" means models need human-developer-level context, not cleverly worded instructions.

The New Software Development Life Cycle

The shift from syntax to context engineering fundamentally changes software creation bottlenecks, necessitating a complete reimagining of the traditional SDLC.

Traditional SDLC Under Pressure

The traditional SDLC has evolved from waterfall to iterative models, but AI dramatically compresses the cycle, especially implementation.

Traditional vs AI-Driven SDLC

The AI-driven SDLC drastically shortens iteration cycles and shifts bottlenecks compared to the traditional iterative SDLC.

Phase	Traditional Iterative SDLC	AI-Driven SDLC
Requirements	2-3 days	Specs become eval criteria
Design	1-2 days	Architecture decisions amplified at scale
Implementation	1-3 weeks	Minutes to hours (Agent self-corrects)
Testing	3-5 days	Output Eval (Verify what it built AND how it got there), Trajectory Eval
Review & Deploy	2-3 days	Review & Deploy
Maintenance	Ongoing	Continuous automation
Sprint Cycle	Weeks	Minutes to hours

Pace of SDLC Change

The AI-driven SDLC's phase-by-phase picture (mid-2026) is rapidly shifting, with human judgment for verifying AI output remaining constant.

AI Transformation of SDLC Phases

AI capabilities are reshaping every phase of the software development lifecycle, from requirements gathering to maintenance.

Requirements and planning

AI tools directly participate in requirements refinement, generating user stories, identifying edge cases, producing API schemas, and interactive prototypes, collapsing the feedback loop to near zero.

Design and architecture

Architecture remains human-centric due to trade-offs dependent on business context and strategic considerations that AI cannot fully grasp.

Implementation

Modern coding agents can generate entire features from natural-language descriptions, implement complex algorithms, and produce multi-file changes that work together correctly.

Testing and quality assurance

Testing AI-generated code requires both output evaluation (final artifact) and trajectory evaluation (sequence of tool calls and reasoning) for comprehensive verification.

Code review and deployment

AI augments the review process as a first-pass reviewer, identifying bugs, style violations, security vulnerabilities, and performance issues before human review.

The Factory Model

The factory model describes software production where the developer's primary output is the system that builds code, not the code itself.

Harness Engineering

Harness Engineering is about building the scaffolding wrapped around an AI model that enables it to perceive goals, plan steps, execute actions, and iterate.

Harness as critical infrastructure

A raw AI model is not an agent; a harness provides state, tool execution, feedback loops, and enforceable constraints to make it an agent.

Harness Components

The harness includes instructions, tools, sandboxes, orchestration logic, guardrails/hooks, and observability.

Harness is the team's surface area

The extensive components of the harness represent the development team's surface area, not the model provider's.

Harness in SDLC Phases

The harness is present in every SDLC phase where an AI agent operates, providing scaffolding for tools, sandboxes, and orchestration.

Configuring the Harness (Requirements, Planning, Architecture)

The harness is configured and calibrated, with developers setting up the agent's environment and defining its tools and fundamental rules before AI writes production code.

Running the Harness (Implementation)

During active coding, the harness acts as a boundary that keeps the AI model focused, secure, and productive.

The Feedback Loop (Testing & QA)

Testing in an agentic workflow relies heavily on the harness to facilitate autonomous self-correction.

Observing the Harness (Code Review, Deployment, Maintenance)

Even after the code is written, the harness ensures the agent behaves safely in live or near-live environments.

Developer's Evolving Role

As AI takes over implementation, the developer's role transforms from writing code to exercising judgment, moving between 'conductor' and 'orchestrator' modes.

Harness configuration defines agentic engineering

The transition from vibe coding to agentic engineering is defined by how deliberately the harness is configured and applied, not just the tools used.

Measurable Impact of Harness Configuration

Public benchmarks show the harness effect is concrete; one team moved a coding agent from outside the Top 30 to Top 5 on Terminal Bench 2.0 by changing only the harness.

Agent failures are configuration failures

When an agent fails, the issue often traces back to a missing tool, vague rule, absent guardrail, or noisy context, not inherently the model.

Conductor vs Orchestrator Modes

Developers fluidly move between conductor (real-time, synchronous, in-IDE) and orchestrator (asynchronous, high-level, multi-agent) modes depending on the task.

Dimension	Conductor	Orchestrator
Interaction	Real-time, Synchronous, In-IDE	Asynchronous, High-level, Multi-agent
Developer's Role	Prompt, reviews inline, refines	Defines specific task, reviews PR/output, approves or corrects
Control Level	Keystroke-level control, immediate feedback, single-file scope, developer always in loop	Goal-level control, delayed feedback, multi-file scope, reviews outcomes not keystrokes
Best For	Exploratory coding, prototyping, learning new API	Feature implementation, migrations, test generation
Leverage	Fine-grained control	High-leverage delegation

Conductor Mode

In conductor mode, developers work in real-time with an AI pair-programmer, guiding AI with prompts and corrections while maintaining fine-grained control.

Orchestrator Mode

In orchestrator mode, developers define goals, assign them to agents, and review results at a higher abstraction level, without line-by-line code observation.

The 80% Problem

AI agents can rapidly generate approximately 80% of code for a feature, but the remaining 20% (edge cases, error handling, integration, correctness) demands deep contextual knowledge that current models often lack.

Coding Agents in Practice

Modern developers building an agent primarily work from a terminal, often in natural language, with other coding agents handling the typing.

Vibe Coding Production-ready Agents

This section addresses building production-ready agents, not just using coding agents to build software.

Building production agents

Tasks like building a customer support bot or research assistant require agents with their own tools, memory, evaluation, and deployment infrastructure.

Streamlined production agent workflow

The terminal-based workflow now supports building, evaluating, and deploying real agents at scale with persistent memory and observability.

Google's Agents CLI

Google's Agents CLI is a command-line tool that bundles skills for building agents on Google Cloud, working with the developer's preferred coding agent.

Scaling Production Agents

The same workflow scales from one agent to many using ADK for graph-based, multi-agent workflows, shared session state, and LLM-driven delegation.

The Economics of AI Development

Evaluating AI's impact on SDLC requires considering Total Cost of Ownership (TCO) and how workflows shift financial burdens between Capital Expenditure (CapEx) and Operational Expenditure (OpEx).

OpEx dictated by token economy

In the AI era, Operational Expenditure (OpEx) is heavily dictated by the token economy.

Economics of Vibe Coding vs Agentic Engineering

Vibe coding has low CapEx but high OpEx due to compounding hidden debt, while agentic engineering requires higher upfront CapEx for low OpEx and sustainable scale.

Metric	Vibe Coding	Agentic Engineering
CapEx	Minimal Investment (12%)	Upfront Platform Design (12%)
OpEx	High Running Costs	Low Marginal Running Costs
Characteristics	Rapid prototyping, slow scaling, high friction for long-term maintenance, economic dead-end for complex systems	Controlled iteration, fast scaling, low friction for automatic updates, economically sustainable for mature codebases

Hidden Debt of Vibe Coding

Vibe coding appears cost-effective initially (low CapEx) but incurs a massive, compounding OpEx burden due to its underlying practices.

Investment of Agentic Engineering

Agentic engineering requires deliberate upfront investment in engineering time and resources before production code is generated, which flips the economic model.

Context Engineering as a Financial Lever

Context engineering is a financial strategy because LLMs charge per token, making it unviable to pass large repositories into every prompt.

Scaling Efficiency via Dynamic Context and Skills

Advanced agentic engineering optimizes OpEx through dynamic context via "skills" or tool calling (e.g., Model Context Protocol servers).

Intelligent Model Routing

Agentic engineering enables intelligent model routing, using large models for complex tasks and smaller, cheaper models for deterministic, lower-complexity tasks to drive down operational token cost.

Where to Start with AI-Driven Development

The shift from syntax to intent is a present reality, and AI amplifies existing engineering culture; these practices translate the principle into action.

For individual developers

Individual developers should set up agent configurations, install skills, automate repetitive workflows, write tests/evals first, and review all AI-generated code.

1Set up AGENTS.md→2Install coding agent skills→3Make repetitive workflow an agent→4Write tests and evals first→5Review all shipped AI-generated code

For engineering leaders

Engineering leaders should make context engineering a first-class practice, set eval bars, reshape code review, distinguish prototyping from production, and invest in harness components as shared assets.

1Make context engineering first-class→2Set the bar at the eval→3Re-shape code review for AI-generated code→4Distinguish prototyping from production work→5Invest in shared harness components

For organizations

Organizations should treat AI development as an investment, invest in production substrate, adopt open standards, plan for hybrid teams, and reframe hiring around judgment.

1Treat AI development as engineering investment→2Invest in production substrate before scale→3Adopt open standards for inter-agent communication→4Plan for hybrid human-agent teams→5Reframe hiring around judgment

Conclusion: Intent as the New Interface

The transition from syntax to intent is a present reality, with AI transforming the SDLC; the key is how effectively individuals, teams, and organizations navigate this shift.

▸ 15 Expand

APEX

Software engineering shift: intent to working software

The most profound shift in software engineering is transitioning from writing code to expressing intent, trusting intelligent systems to translate intent into working software.

Made with Rinto — analyse your own content free

▸ 4 Expand

SECT

Introduction to AI-Driven Software Development

Software engineering is undergoing its most significant transformation since high-level programming languages were introduced.

▸ 1 Expand

SUP

New paradigm for developers

Developers now express what to build, not how to build it, with machines handling the implementation.

DATA

AI Coding Agent usage (early 2026)

As of early 2026, 85% of professional developers regularly use AI Coding Agents, 51% use them daily, and 41% of new code is AI-generated.

▸ 1 Expand

SUP

Evolution of AI assistance

AI assistance evolved from autocomplete to autonomous agents that can clone repositories and submit pull requests without human keystrokes.

FRMW

AI Automation Progression

AI automation in coding has progressed through distinct generations, each preserving prior capabilities while raising the ceiling on what an engineer could accomplish.

Stage	Description
~2021 Autocomplete	Simple token predictions; editor guesses next few characters.
~2022 Inline Code Suggestions	Complete entire functions from signature; model understands patterns, not just tokens.
~2023 Chat-Based Generation	Describe features in natural language, receive working implementation; conversation becomes the interface.
~2024-25 Coding Agents	Multi-file edits, tool calling, test execution, iterative self-correction; agent runs its own loop.
~2025-26 Autonomous Agents	Clone repositories, plan architecture, execute in sandboxes, run full test suites, submit pull requests; no human keystrokes required.

SUP

Paper's Purpose

This paper traces the spectrum from vibe coding to agentic engineering, examines the evolving developer role, and outlines adoption strategies for dependable software.

SUP

Target Audience

The paper targets software engineers, engineering managers, architects, and technical leaders seeking to understand AI's impact on SDLC.

▸ 2 Expand

SECT

AI Agents Overview

An AI agent is a software system that perceives a goal, plans steps to reach it, takes actions through tools, observes the results, and iterates until the goal is met or stopped.

FRMW

The Agent Loop

An AI agent operates in a continuous self-correcting loop: perceive goal, plan steps, act via tools, and observe results.

▸ 5 Expand

SUP

Five Parts of an AI Agent

Every AI agent, regardless of complexity, is built from five fundamental parts: the model, tools, memory, orchestration, and deployment.

COMP

Model

The model is the reasoning engine, deciding next steps, tool calls, or messages based on context.

COMP

Tools

Tools connect the model to the world, including APIs, executable code, queryable databases, and other delegatable agents.

COMP

Memory

Memory maintains the agent's state, recalling past interactions, retrieving project-specific rules, and retaining context across sessions.

COMP

Orchestration

Orchestration is the code that runs the agent's loop, assembling context, dispatching tools, capturing results, and deciding continuation.

COMP

Deployment

Deployment transforms a prototype into a service, covering hosting, identity, observability, and production infrastructure.

▸ 3 Expand

SECT

What is Vibe Coding

Vibe coding is an approach where developers describe what they want in natural language, accept AI output, and fix errors by prompting the AI with error messages.

EVID

Origin of "Vibe Coding"

Andrej Karpathy described "vibe coding" in February 2025, which resonated widely in the software engineering community.

EVID

Term Confusion

The term "vibe coding" became a common descriptor for any AI-assisted workflow, leading to confusion and loss of precise meaning.

SUP

Agentic Engineering Introduction

By early 2026, Karpathy introduced "agentic engineering" to describe the more disciplined end of the AI development spectrum.

▸ 3 Expand

SECT

Vibe Coding to Agentic Engineering Spectrum

Vibe coding and agentic engineering are endpoints on a spectrum, differentiated by the structure, verification, and human judgment surrounding AI output.

CMPR

Vibe Coding vs Agentic Engineering Spectrum

The spectrum highlights varying levels of structure, verification, and human judgment from casual vibe coding to disciplined agentic engineering.

Dimension	Vibe Coding	Structured AI-Assisted Coding	Agentic Engineering
Intent specification	Casual natural language prompts	Detailed prompts with examples and constraints	Formal specs, architecture docs, memory files
Verification	"Does it seem to work?"	Manual testing, spot-checking	Automated test suites, CI/CD gates, LM judges
Codebase understanding	Minimal; developer may not read the generated code	Selective review of critical paths	Comprehensive review of architecture; AI handles implementation details
Error handling	Copy-paste error messages back to the AI	Developer diagnoses root cause, AI implements fix	Agents self-diagnose within defined bounds; humans handle architectural issues
Appropriate scope	Prototypes, scripts, personal projects, hackathons	Features within established codebases	Production systems, team-scale development
Risk profile	High; acceptable for disposable code	Moderate; human judgment at key checkpoints	Low; systematic verification at every stage

▸ 1 Expand

INSG

Verification is key differentiator

The biggest differentiator between vibe coding and agentic engineering is how outputs are verified, with agentic engineering using both tests and evaluations.

TIP

Task-dependent approach

The appropriate approach (vibe coding vs. agentic engineering) depends on the stakes of the task.

EVID

Test and Evaluation Verification

Tests verify deterministic parts by checking input-output, while evaluations verify non-deterministic parts like agent trajectory and quality.

▸ 5 Expand

SECT

Context Engineering

Context engineering is the practice of providing AI agents with rich, structured information about a codebase, architecture, conventions, and intent.

INSG

Quality depends on context, not prompts

The quality of AI-generated code depends more on the quality of provided context than on clever prompts.

▸ 6 Expand

FRMW

Six Primary Types of Context

Developers must consider six primary types of context when working with AI agents: Instructions, Knowledge, Memory, Examples, Tools, and Guardrails.

COMP

Instructions

Instructions define the agent's core role, goals, and operational boundaries.

COMP

Knowledge

Knowledge includes retrieved documents, architectural diagrams, and domain-specific data.

COMP

Memory

Memory encompasses short-term session logs (what just happened) and long-term persistent state (what the project is).

COMP

Examples

Examples provide few-shot behavioral demonstrations and codebase reference patterns.

COMP

Tools

Tools are precise definitions of APIs, scripts, and external services the agent can invoke.

COMP

Guardrails

Guardrails impose hard constraints, formatting rules, and safety validations.

▸ 2 Expand

SUP

Static vs Dynamic Context

Context engineering balances elements possessed upfront (static) versus those retrieved on demand (dynamic), creating a critical separation.

CMPR

Static vs Dynamic Context

Static context is always loaded and expensive, while dynamic context is loaded on demand and is more efficient.

Context Type	Loading Mechanism	Token Cost	Characteristics	Examples
Static Context	Always loaded, every interaction	High	Expensive but reliable; agent never forgets	System instructions, rule files (AGENTS.md), global memory, core guardrails
Dynamic Context	Loaded on demand, per task	Low per turn	Efficient and scalable; pay only for what you use	Agent Skills (triggered by task match), tool results, retrieved documents (RAG)

DCSN

Architectural Decision for Context

The decision of what belongs in static versus dynamic context is an engineering trade-off that should be treated as a first-class architectural decision.

▸ 1 Expand

SUP

Agent Skills for Dynamic Context

Agent Skills are structured, portable packages of procedural knowledge loaded by the agent only when a task requires them, enabling lightweight generalists to act as specialists.

SUP

Problems Solved by Agent Skills

Agent Skills solve issues of context rot, lack of procedural memory, multi-agent operational overhead, and portability across tools and vendors.

INSG

Context Engineering vs Prompt Engineering

The shift from "prompt engineering" to "context engineering" means models need human-developer-level context, not cleverly worded instructions.

▸ 3 Expand

SECT

The New Software Development Life Cycle

The shift from syntax to context engineering fundamentally changes software creation bottlenecks, necessitating a complete reimagining of the traditional SDLC.

▸ 1 Expand

SUP

Traditional SDLC Under Pressure

The traditional SDLC has evolved from waterfall to iterative models, but AI dramatically compresses the cycle, especially implementation.

INSG

AI's Uneven Compression of SDLC

AI compresses implementation from weeks to hours, but requirements, architecture, and verification remain human-paced, creating a different workflow.

CMPR

Traditional vs AI-Driven SDLC

The AI-driven SDLC drastically shortens iteration cycles and shifts bottlenecks compared to the traditional iterative SDLC.

Phase	Traditional Iterative SDLC	AI-Driven SDLC
Requirements	2-3 days	Specs become eval criteria
Design	1-2 days	Architecture decisions amplified at scale
Implementation	1-3 weeks	Minutes to hours (Agent self-corrects)
Testing	3-5 days	Output Eval (Verify what it built AND how it got there), Trajectory Eval
Review & Deploy	2-3 days	Review & Deploy
Maintenance	Ongoing	Continuous automation
Sprint Cycle	Weeks	Minutes to hours

INSG

Pace of SDLC Change

The AI-driven SDLC's phase-by-phase picture (mid-2026) is rapidly shifting, with human judgment for verifying AI output remaining constant.

▸ 5 Expand

SECT

AI Transformation of SDLC Phases

AI capabilities are reshaping every phase of the software development lifecycle, from requirements gathering to maintenance.

SUP

Requirements and planning

▸ 2 Expand

SUP

Design and architecture

Architecture remains human-centric due to trade-offs dependent on business context and strategic considerations that AI cannot fully grasp.

EVID

AI assists architectural implementation

AI excels at implementing architectural decisions by scaffolding applications, generating consistent patterns, and ensuring code conforms to conventions.

SUP

Developer role shift in design

The developer's role shifts from writing boilerplate to making and documenting structural decisions that AI implements.

▸ 3 Expand

SUP

Implementation

Modern coding agents can generate entire features from natural-language descriptions, implement complex algorithms, and produce multi-file changes that work together correctly.

DATA

Productivity gains in implementation

Industry surveys report 25 to 39% productivity improvements from AI in implementation, with some tasks seeing larger gains.

OPP

Nuanced productivity gains

A METR study found experienced developers using AI assistants took 19% longer on some tasks due to time spent verifying, debugging, and correcting AI output.

INSG

AI transforms implementation work

AI transforms implementation work from writing code to reviewing, guiding, and verifying AI-generated output.

▸ 2 Expand

SUP

Testing and quality assurance

Testing AI-generated code requires both output evaluation (final artifact) and trajectory evaluation (sequence of tool calls and reasoning) for comprehensive verification.

SUP

AI transforms test generation

AI agents can produce test cases, including edge cases and property-based tests, and tests/evals become the primary mechanism for communicating intent to AI.

SUP

Continuous quality flywheel

These practices are most effective when integrated into a continuous quality flywheel for evaluation, diagnosis, optimization, and monitoring.

▸ 2 Expand

SUP

Code review and deployment

AI augments the review process as a first-pass reviewer, identifying bugs, style violations, security vulnerabilities, and performance issues before human review.

INSG

AI reduces cognitive burden in reviews

AI significantly reduces the cognitive burden on human reviewers, though human judgment for design, maintainability, and strategic alignment remains crucial.

SUP

AI-aware deployment pipelines

Deployment pipelines are becoming AI-aware, with agents monitoring deployment health, automating rollbacks, and predicting risks based on changes.

▸ 2 Expand

SECT

The Factory Model

The factory model describes software production where the developer's primary output is the system that builds code, not the code itself.

▸ 5 Expand

FRMW

Factory Model System Components

The factory model system that produces code includes specifications, agents, tests, feedback loops, and guardrails.

COMP

Specifications and context

Define what needs to be built.

COMP

Agents

Translate specifications into implementation.

COMP

Tests and quality gates

Verify correctness.

COMP

Feedback loops

Route failures back to agents for correction.

COMP

Guardrails

Constrain agents to safe, predictable behavior.

INSG

Developer as factory manager

The modern developer acts as a factory manager, designing the development system and ensuring quality control by giving agents success criteria.

▸ 4 Expand

SECT

Harness Engineering

Harness Engineering is about building the scaffolding wrapped around an AI model that enables it to perceive goals, plan steps, execute actions, and iterate.

▸ 1 Expand

SUP

Harness as critical infrastructure

A raw AI model is not an agent; a harness provides state, tool execution, feedback loops, and enforceable constraints to make it an agent.

INSG

Harness dominates agent behavior

The behavior experienced when working with AI coding tools is dominated by the harness, not just the underlying model.

▸ 6 Expand

FRMW

Harness Components

The harness includes instructions, tools, sandboxes, orchestration logic, guardrails/hooks, and observability.

COMP

Instructions and Rule Files

Text defining the agent's identity, concerns, and forbidden actions, including AGENTS.md, skill files, and sub-agent prompts.

COMP

Tools

Functions, MCP servers, and APIs the agent can call, along with descriptions of when and how to call them.

COMP

Sandboxes and execution environments

Where the agent's code runs and what it has access to, or is restricted from.

COMP

Orchestration logic

Manages sub-agent spawning, model routing, specialist hand-offs, and rules governing execution.

COMP

Guardrails or Hooks

Deterministic code running at specific lifecycle points to ensure the agent follows critical rules.

COMP

Observability

Logs, traces, evaluations, cost, and latency metering to assess agent performance.

INSG

Harness is the team's surface area

The extensive components of the harness represent the development team's surface area, not the model provider's.

▸ 4 Expand

SUP

Harness in SDLC Phases

The harness is present in every SDLC phase where an AI agent operates, providing scaffolding for tools, sandboxes, and orchestration.

▸ 1 Expand

STEP

Configuring the Harness (Requirements, Planning, Architecture)

The harness is configured and calibrated, with developers setting up the agent's environment and defining its tools and fundamental rules before AI writes production code.

EVID

Harness Configuration Actions

Developers provide instructions and rule files (e.g., AGENTS.md, architectural constraints) and define tools (APIs, database schemas) the agent can access.

▸ 2 Expand

STEP

Running the Harness (Implementation)

During active coding, the harness acts as a boundary that keeps the AI model focused, secure, and productive.

EVID

Harness Components in Implementation

Sandboxes, execution environments, and tools are used during the implementation phase.

EVID

Harness Action in Implementation

The model executes code in an isolated sandbox and uses harness-provided tools for tasks like file reading or web searching.

▸ 2 Expand

STEP

The Feedback Loop (Testing & QA)

Testing in an agentic workflow relies heavily on the harness to facilitate autonomous self-correction.

EVID

Harness Components in Testing

Orchestration Logic and Guardrails are used in Testing & QA.

EVID

Harness Action in Testing

The harness provides the execution environment for automated tests, capturing error output from failed tests and routing it back to the model for iteration.

▸ 2 Expand

STEP

Observing the Harness (Code Review, Deployment, Maintenance)

Even after the code is written, the harness ensures the agent behaves safely in live or near-live environments.

EVID

Harness Components in Maintenance

Hooks and Observability are used in Code Review, Deployment, & Maintenance.

EVID

Harness Action in Maintenance

The harness runs deterministic hooks (e.g., blocking commits) and its observability layer tracks metrics like token costs and latency for auditing agent decisions.

▸ 7 Expand

SECT

Developer's Evolving Role

As AI takes over implementation, the developer's role transforms from writing code to exercising judgment, moving between 'conductor' and 'orchestrator' modes.

INSG

Harness configuration defines agentic engineering

The transition from vibe coding to agentic engineering is defined by how deliberately the harness is configured and applied, not just the tools used.

▸ 1 Expand

EVID

Measurable Impact of Harness Configuration

Public benchmarks show the harness effect is concrete; one team moved a coding agent from outside the Top 30 to Top 5 on Terminal Bench 2.0 by changing only the harness.

DATA

LangChain benchmark improvement

A LangChain study raised a coding agent's score by 13.7 points on the same benchmark by tweaking only the system prompt, tools, and middleware.

INSG

Agent failures are configuration failures

When an agent fails, the issue often traces back to a missing tool, vague rule, absent guardrail, or noisy context, not inherently the model.

CMPR

Conductor vs Orchestrator Modes

Developers fluidly move between conductor (real-time, synchronous, in-IDE) and orchestrator (asynchronous, high-level, multi-agent) modes depending on the task.

Dimension	Conductor	Orchestrator
Interaction	Real-time, Synchronous, In-IDE	Asynchronous, High-level, Multi-agent
Developer's Role	Prompt, reviews inline, refines	Defines specific task, reviews PR/output, approves or corrects
Control Level	Keystroke-level control, immediate feedback, single-file scope, developer always in loop	Goal-level control, delayed feedback, multi-file scope, reviews outcomes not keystrokes
Best For	Exploratory coding, prototyping, learning new API	Feature implementation, migrations, test generation
Leverage	Fine-grained control	High-leverage delegation

▸ 2 Expand

SUP

Conductor Mode

In conductor mode, developers work in real-time with an AI pair-programmer, guiding AI with prompts and corrections while maintaining fine-grained control.

EVID

Conductor Mode Use Cases

This mode is typical for complex logic, debugging tricky issues, or working in unfamiliar codebases requiring deep understanding.

RISK

Conductor Mode Bottleneck

Conductor mode can become a bottleneck if the developer personally directs every keystroke, limiting AI's throughput improvement.

▸ 2 Expand

SUP

Orchestrator Mode

In orchestrator mode, developers define goals, assign them to agents, and review results at a higher abstraction level, without line-by-line code observation.

EVID

Orchestrator Mode Use Cases

This mode is typical for well-defined tasks like bug fixes, feature implementations against established patterns, codebase migrations, and test generation.

▸ 4 Expand

REQ

Orchestrator Mode Skill Set

Orchestrator mode requires strong skills in specification, decomposition, evaluation, and system design, rather than deep syntax expertise.

REQ

Specification Skill

Defining tasks precisely enough that an agent can execute them without ambiguity.

REQ

Decomposition Skill

Breaking large tasks into appropriately sized units for agent execution.

REQ

Evaluation Skill

Quickly assessing whether agent output meets quality standards.

REQ

System Design Skill

Designing the constraints, tests, and feedback loops that keep agents productive.

▸ 2 Expand

RISK

The 80% Problem

INSG

Evolution of AI errors

AI errors have evolved from simple syntax mistakes to insidious conceptual failures, such as wrong business logic assumptions or missing edge cases, which are harder to detect.

TIP

Effectively navigating the 80% problem

Developers effectively navigate this problem by using AI for rapid implementation of well-specified tasks and reserving their expertise for AI's weaknesses.

▸ 2 Expand

SECT

Coding Agents in Practice

Modern developers building an agent primarily work from a terminal, often in natural language, with other coding agents handling the typing.

▸ 3 Expand

SUP

Where coding agents fit in developer's day

Coding agents show up in three main places in everyday work: in the editor, in the terminal, and in the background.

▸ 1 Expand

COMP

In the editor

Provides inline completion, chat panels to explain/modify code, and whole-codebase awareness within the IDE, keeping work in flow.

EXMP

Editor Agent Examples

Examples include GitHub Copilot, Cursor, Windsurf, JetBrains AI Assistant.

▸ 1 Expand

COMP

In the terminal

Agents launched from the command line with a goal in plain language, working across the codebase with full file system access and iteration capabilities.

EXMP

Terminal Agent Examples

Examples include Antigravity CLI, Claude Code, Codex CLI, Open Code, and Cline.

▸ 1 Expand

COMP

In the background

Autonomous agents running in cloud-hosted sandboxes, often for hours, producing pull requests for later developer review.

EXMP

Background Agent Examples

Examples include Google Jules, GitHub Copilot agent mode, Cursor's background agents, and Google's specialized AlphaEvolve agent.

INSG

Task-dependent starting point

The right starting point for using coding agents depends on the specific task, not a rigid autonomy hierarchy.

▸ 4 Expand

SECT

Vibe Coding Production-ready Agents

This section addresses building production-ready agents, not just using coding agents to build software.

SUP

Building production agents

Tasks like building a customer support bot or research assistant require agents with their own tools, memory, evaluation, and deployment infrastructure.

SUP

Streamlined production agent workflow

The terminal-based workflow now supports building, evaluating, and deploying real agents at scale with persistent memory and observability.

▸ 1 Expand

EVID

Google's Agents CLI

Google's Agents CLI is a command-line tool that bundles skills for building agents on Google Cloud, working with the developer's preferred coding agent.

▸ 1 Expand

EXMP

Agents CLI Workflow

A coding agent can scaffold a project, write ADK code, generate evalsets, run them, deploy to Agent Runtime, and report back, all from a single instruction.

EVID

Agents CLI Snippet

Example command sequence: "uvx google-agents-cli setup", then "> Build a support agent that answers questions from our docs. > evaluate it on the FAQ dataset > Deploy it to Agent Engine".

▸ 2 Expand

SUP

Scaling Production Agents

The same workflow scales from one agent to many using ADK for graph-based, multi-agent workflows, shared session state, and LLM-driven delegation.

EVID

Anthropic C compiler experiment

Anthropic's engineering team built a working C compiler in early 2026 using agent teams on this architecture within two weeks.

INSG

Bottleneck shifts to specification

The bottleneck shifted from writing code to specifying what it should do and verifying that agents performed correctly.

▸ 7 Expand

SECT

The Economics of AI Development

Evaluating AI's impact on SDLC requires considering Total Cost of Ownership (TCO) and how workflows shift financial burdens between Capital Expenditure (CapEx) and Operational Expenditure (OpEx).

INSG

OpEx dictated by token economy

In the AI era, Operational Expenditure (OpEx) is heavily dictated by the token economy.

CMPR

Economics of Vibe Coding vs Agentic Engineering

Vibe coding has low CapEx but high OpEx due to compounding hidden debt, while agentic engineering requires higher upfront CapEx for low OpEx and sustainable scale.

Metric	Vibe Coding	Agentic Engineering
CapEx	Minimal Investment (12%)	Upfront Platform Design (12%)
OpEx	High Running Costs	Low Marginal Running Costs
Characteristics	Rapid prototyping, slow scaling, high friction for long-term maintenance, economic dead-end for complex systems	Controlled iteration, fast scaling, low friction for automatic updates, economically sustainable for mature codebases

▸ 3 Expand

RISK

Hidden Debt of Vibe Coding

Vibe coding appears cost-effective initially (low CapEx) but incurs a massive, compounding OpEx burden due to its underlying practices.

COMP

The Token Burn Rate

Every LLM interaction costs based on tokens; vibe coding's practice of dumping unstructured files and repeatedly asking for fixes leads to an expensive "prompting loop" with low success rates.

COMP

Maintenance Tax

Ad-hoc prompted code lacks structural consistency, leading to significant time spent reverse-engineering AI-generated "spaghetti" code for maintenance.

COMP

Security Remediation

Rapid code generation without an automated evaluation harness leads to rapid vulnerability generation, with production fixes being exponentially more expensive than design-phase fixes.

▸ 1 Expand

SUP

Investment of Agentic Engineering

Agentic engineering requires deliberate upfront investment in engineering time and resources before production code is generated, which flips the economic model.

EVID

CapEx in Agentic Engineering

CapEx includes designing API schemas, building deterministic test suites, and structuring the agent's context, leading to dramatically lower marginal costs for features.

▸ 1 Expand

SUP

Context Engineering as a Financial Lever

Context engineering is a financial strategy because LLMs charge per token, making it unviable to pass large repositories into every prompt.

JUST

Ensures high-signal payload

Effective context engineering ensures the model receives a dense, high-signal payload (like AGENTS.md and guardrails), increasing first-pass success rates and avoiding costly trial-and-error loops.

SUP

Scaling Efficiency via Dynamic Context and Skills

Advanced agentic engineering optimizes OpEx through dynamic context via "skills" or tool calling (e.g., Model Context Protocol servers).

SUP

Intelligent Model Routing

Agentic engineering enables intelligent model routing, using large models for complex tasks and smaller, cheaper models for deterministic, lower-complexity tasks to drive down operational token cost.

▸ 3 Expand

SECT

Where to Start with AI-Driven Development

The shift from syntax to intent is a present reality, and AI amplifies existing engineering culture; these practices translate the principle into action.

▸ 6 Expand

CHKL

For individual developers

Individual developers should set up agent configurations, install skills, automate repetitive workflows, write tests/evals first, and review all AI-generated code.

1Set up AGENTS.md→2Install coding agent skills→3Make repetitive workflow an agent→4Write tests and evals first→5Review all shipped AI-generated code

STEP

Set up AGENTS.md

Set up an AGENTS.md (or equivalent) for the project, starting with ten lines and adding rules when the agent misbehaves.

STEP

Install coding agent skills

Install a set of skills for coding agents (e.g., Agents CLI) to build, evaluate, deploy, and optimize agents.

STEP

Make repetitive workflow an agent

Pick one repetitive workflow (e.g., research, code review) and make it the first agent, prototyping with a coding agent and graduating to production via Agents CLI.

STEP

Write tests and evals first

Write tests and evaluations before generating code; they serve as the contract with AI and communicate intent more precisely than prompts.

STEP

Review all shipped AI-generated code

Review every line the agent produces that will ship, being skeptical and verifying error handling and package imports to avoid debugging costs.

TIP

Maintain developer skills

Maintain foundational developer skills like debugging, system design, and architectural discussions, as AI handles routine tasks, allowing focus on challenges.

▸ 5 Expand

CHKL

For engineering leaders

1Make context engineering first-class→2Set the bar at the eval→3Re-shape code review for AI-generated code→4Distinguish prototyping from production work→5Invest in shared harness components

STEP

Make context engineering first-class

Treat context engineering artifacts like AGENTS.md, system prompts, eval suites, and skill libraries as versioned, owned code that is reviewed in pull requests.

STEP

Set the bar at the eval

Require passing eval suites with explicit rubrics (task success, tool use quality, trajectory compliance, hallucination) as a precondition for agent deployment, not just working demos.

STEP

Re-shape code review for AI-generated code

Scrutinize AI-generated code for hallucinated dependencies, inadequate error handling, and subtle correctness gaps, training reviewers on failure modes and tuning checklists.

STEP

Distinguish prototyping from production work

Make the boundary explicit between vibe coding (exploration) and agentic engineering (production) for projects, branches, and environments to avoid accidental prototype shipments.

STEP

Invest in shared harness components

Treat reusable system prompts, skill libraries, MCP server connections, and evaluation harnesses as documented, maintained, and improved infrastructure to compound value across projects.

▸ 5 Expand

CHKL

For organizations

Organizations should treat AI development as an investment, invest in production substrate, adopt open standards, plan for hybrid teams, and reframe hiring around judgment.

STEP

Treat AI development as engineering investment

Treat AI-assisted development as an engineering investment, not merely a productivity feature, pairing AI tooling with eval coverage, observability, and architectural standards.

STEP

Invest in production substrate before scale

Build the operational discipline (evals, traces, permissions, security review) for production systems before shipping the first production agent, not after.

STEP

Adopt open standards for inter-agent communication

Adopt open standards like Model Context Protocol (MCP) for tool access and Agent2Agent (A2A) for cross-agent delegation to enable mixing vendors and frameworks.

STEP

Plan for hybrid human-agent teams

Plan for hybrid teams where humans set direction and agents implement, with clear handoff protocols, evolving code review, on-call rotations, and team structures.

STEP

Reframe hiring around judgment

Reframe hiring and skill development around judgment, specification, evaluation, and architectural review, as these are the new bottlenecks, not just implementation.

▸ 4 Expand

SECT

Conclusion: Intent as the New Interface

The transition from syntax to intent is a present reality, with AI transforming the SDLC; the key is how effectively individuals, teams, and organizations navigate this shift.

▸ 3 Expand

SUP

Durable Principles of AI-Driven SDLC

The paper presents a framework of mental models to understand the evolving landscape, with three durable principles: structure scales, AI amplifies culture, and the human role is evolving.

SUP

Structure scales, vibes don't

Vibe coding is for exploration, but for production software, agentic engineering's discipline (specs, tests, guardrails, human oversight) is critical to avoid outages.

SUP

AI amplifies engineering culture

Organizations with strong testing, architectural standards, and code review get more value from AI, as AI multiplies both strengths and weaknesses.

SUP

Human role is evolving, not diminishing

Builders understanding architecture, defining specs, evaluating output, and designing systems of constraints/feedback loops are more valuable as skills shift from implementation to judgment.

INSG

Broader impact of AI transformation

This transformation reshapes how software is built and what kind of software is possible, enabling smaller teams to tackle larger problems and reducing the barrier to creating software.

INSG

Future of software engineering

Thriving teams will embrace AI as a powerful tool while maintaining engineering discipline, understanding that the future is about designing systems where humans and AI contribute unique strengths.

INSG

New craft: verification, judgment, direction

Generation is solved; the new craft for developers is verification, judgment, and direction.