Artificial intelligence is undergoing a paradigm shift from passive, discrete tasks to autonomous problem-solving and task execution by AI agents.

Made with Rinto — analyse your own content free

AI Agents as LM Evolution

Agents represent the natural evolution of Language Models, made useful in software by combining an LM's reasoning with practical action capabilities.

Document Purpose

This document is the first in a five-part series, guiding developers, architects, and product leaders in transitioning to robust, production-grade agentic systems.

Introduction to AI Agents

An AI Agent combines models, tools, an orchestration layer, and runtime services, using a Language Model in a loop to accomplish a goal.

Anthropomorphizing AI

Words are insufficient to describe human-AI interaction, leading to anthropomorphizing AI with human terms like 'think,' 'reason,' and 'know'.

Model (The Brain)

The core language or foundation model serves as the agent's central reasoning engine to process information, evaluate options, and make decisions.

Tools (The Hands)

Tools connect the agent's reasoning to the outside world, enabling actions beyond text generation, including API extensions, code functions, and data stores.

Orchestration Layer (The Nervous System)

This layer governs the agent's operational loop, managing planning, memory (state), and reasoning strategy execution using prompting frameworks.

Deployment (The Body and Legs)

Production deployment ensures the agent is a reliable and accessible service, involving hosting on a secure, scalable server with monitoring and management.

Developer Role Shift

Building a generative AI agent shifts the developer's role from a 'bricklayer' defining explicit logic to a 'director' setting the scene and guiding autonomous actors.

LM Flexibility Challenge

A Large Language Model's incredible flexibility, its greatest strength, also makes it difficult to reliably compel it to do one specific thing perfectly.

Agent Ops for Debugging

'Agent Ops' redefines the debugging cycle of measurement, analysis, and system optimization by monitoring the agent's 'thought process' through traces and logs.

Agentic Problem-Solving Process

An AI agent operates on a continuous, cyclical 5-step process to achieve objectives, integrating a reasoning model, actionable tools, and a governing orchestration layer.

5 Fundamental Steps

The agentic problem-solving loop can be broken down into five fundamental steps, detailed in the book Agentic System Design.

Step 1: Get the Mission

The process begins with a specific, high-level goal provided by a user or an automated trigger.

Step 2: Scan the Scene

The agent gathers context by perceiving its environment, accessing available resources like memory, user guidance, tools, calendars, databases, or APIs.

Step 3: Think It Through

The agent's core 'think' loop, driven by the reasoning model, analyzes the Mission against the Scene and devises a plan.

Step 4: Take Action

The orchestration layer executes the first concrete step of the plan by selecting and invoking an appropriate tool, such as calling an API or querying a database.

Step 5: Observe and Iterate

The agent observes the outcome of its action, adds new information to its context or memory, and repeats the loop by returning to Step 3.

Continuous Cycle Management

The 'Think, Act, Observe' cycle continues, managed by the Orchestration Layer, reasoned by the Model, and executed by the Tools until the Mission is achieved.

Customer Support Agent Example

A customer support agent responding to 'Where is my order #12345?' demonstrates the 5-step problem-solving cycle.

Taxonomy of Agentic Systems

Agentic systems can be classified into broad levels, each building on the capabilities of the last, scaling in complexity.

Scoping Agent Type

For architects or product leaders, a key initial decision is scoping what kind of agent to build based on complexity.

Level 0: Core Reasoning System

This level starts with the Language Model as the reasoning engine, operating in isolation based on pre-trained knowledge without external tools or memory.

Level 1: Connected Problem-Solver

At this level, the reasoning engine connects to and utilizes external tools, allowing problem-solving beyond static, pre-trained knowledge.

Level 2: Strategic Problem-Solver

Level 2 expands capabilities from simple tasks to strategically planning complex, multi-part goals, with context engineering as a key skill.

Context Engineering

Context engineering is the agent's ability to actively select, package, and manage the most relevant information for each step of its plan.

Level 2 Example Mission

Find a good coffee shop halfway between two addresses, demonstrating multi-step strategic planning and tool use.

Proactive Assistance

Strategic planning enables proactive assistance, such as an agent reading a flight confirmation email and adding key context to a calendar.

Level 3: Collaborative Multi-Agent System

This level shifts the paradigm from a single 'super-agent' to a 'team of specialists' working in concert, mirroring human organizations with division of labor.

Level 4: Self-Evolving System

Level 4 is a profound leap from delegation to autonomous creation and adaptation, where an agentic system dynamically expands its capabilities.

Core Agent Architecture

Building agents involves the specific architectural design of its three core components: Model, Tools, and Orchestration, transitioning from concept to code.

Model: The Brain

The Language Model is the reasoning core, and its selection is a critical architectural decision dictating cognitive capabilities, operational cost, and speed.

Model Selection Approach

Treating model choice as simply picking the highest benchmark score is a common path to failure; success in production is rarely determined by generic academic benchmarks.

Agentic Fundamentals

Real-world success requires a model excelling at agentic fundamentals: superior reasoning for complex problems and reliable tool use.

Optimal Model Selection

The 'best' model is at the optimal intersection of quality, speed, and price for specific tasks, determined by defining the business problem and testing against direct metrics.

Multiple Models

You may choose more than one model, a 'team of specialists,' using models like Gemini 2.5 Pro for planning and Gemini 2.5 Flash for simpler tasks.

Model Routing Strategy

Model routing, either automatic or hard-coded, is a key strategy for optimizing both performance and cost.

Multimodal Data Handling

Natively multimodal models like Gemini live mode streamline image/audio processing, while specialized tools like Cloud Vision API or Speech-to-Text API convert data to text for language-only models.

AI Landscape Evolution

The AI landscape evolves rapidly, rendering a 'set it and forget it' mindset unsustainable; models chosen today will be superseded in six months.

Agent Ops Practice

Building for reality means investing in a nimble operational framework, an 'Agent Ops' practice, with a robust CI/CD pipeline that continuously evaluates new models.

Agent Ops Benefits

Agent Ops de-risks and accelerates upgrades, ensuring agents are powered by the best models without complete architectural overhaul.

Tools: The Hands

Tools connect the agent's reasoning (brain) to reality, enabling it to retrieve real-time information and take action beyond static training data.

Three-Part Tool Loop

A robust tool interface involves defining what a tool can do, invoking it, and observing the result.

Retrieving Information

Accessing up-to-date information is the most foundational tool, grounding the agent in reality and dramatically reducing hallucinations.

Executing Actions

Agents unleash true power when they move from reading information to actively performing actions, transforming into autonomous actors.

Human in the Loop (HITL)

HITL tools allow agents to pause workflows and ask for human confirmation or specific information, ensuring human involvement in critical decisions.

Function Calling

For agents to reliably do 'function calling' and use tools, clear instructions, secure connections, and orchestration are required.

Orchestration Layer

The orchestration layer, acting as the central nervous system, is the engine that runs the 'Think, Act, Observe' loop and governs agent behavior.

Core Design Choices

Architectural decisions for agents involve determining autonomy, implementation methods, and ensuring a production-grade framework.

Agent Autonomy Spectrum

The first architectural decision is determining the agent's degree of autonomy, which exists on a spectrum from deterministic workflows to dynamically adaptive LMs.

Implementation Method

No-code builders offer speed for structured tasks and simple agents, while code-first frameworks like Google's Agent Development Kit (ADK) provide deep control for complex systems.

Production-Grade Framework

A production-grade framework must be open, allowing plug-in of any model or tool to prevent vendor lock-in.

Instruct with Domain Knowledge and Persona

Developers' most powerful lever is instructing the agent with domain knowledge and a distinct persona, using a system prompt or core instructions.

Augment with Context

The agent's 'memory' is orchestrated into the LM context window at runtime.

Multi-Agent Systems and Design Patterns

As tasks grow in complexity, a 'team of specialists' approach, mirroring human organizations, is more effective than a single 'super-agent'.

Agent Deployment and Services

Deploying a local agent to a server makes it a reliable, accessible service, requiring several supporting services for effectiveness.

Agent Ops: Structured Approach to Unpredictable

Building agents requires a new operational philosophy called 'Agent Ops' due to the stochastic nature of agentic systems and probabilistic responses.

Testing Generative AI

Traditional deterministic software unit tests cannot simply assert output == expected for generative AI, as agent responses are probabilistic by design.

LM for Quality Evaluation

Evaluating agent 'quality' usually requires an LM to assess if the response fulfills requirements, avoids extra content, and maintains proper tone.

Agent Ops Definition

Agent Ops is a disciplined, structured approach managing the unique challenges of building, deploying, and governing AI agents, evolving from DevOps and MLOps.

Measure What Matters: Instrumenting Success

Define 'better' in business context by framing observability like an A/B test and identifying Key Performance Indicators (KPIs) for real-world impact.

Quality Instead of Pass/Fail: LM Judge

Since simple pass/fail evaluation is impossible for agents, quality is assessed using an 'LM as Judge' against a predefined rubric.

Metrics-Driven Development: Go/No-Go

Automating dozens of evaluation scenarios and establishing trusted quality scores allow confident testing of development agent changes.

Debug with OpenTelemetry Traces

OpenTelemetry traces provide a high-fidelity, step-by-step recording of the agent's entire execution path, essential for understanding 'why' metrics dip or bugs occur.

Cherish Human Feedback

Human feedback is the most valuable and data-rich resource for improving agents, serving as 'gifts' for new real-world edge cases.

Agent Interoperability

Interconnecting high-quality agents with users and other agents is crucial for bringing agents into a wider ecosystem, akin to the 'face of the Agent'.

Agents and Humans

The most common form of agent-human interaction is through a user interface, ranging from chatbots to rich, dynamic front-end experiences.

Agents and Agents

As enterprises scale AI, agents must connect with each other, requiring a common standard for discovery and communication.

Agents and Money

As AI agents perform more tasks, some involve buying, selling, or facilitating transactions, creating a trust crisis if something goes wrong.

Securing a Single Agent: Trust Trade-Off

When creating an AI agent, there's a fundamental tension between utility and security, as granting power introduces risk.

Agent Power and Risk

To be useful, agents need autonomy to make decisions and tools to perform actions like sending emails or querying databases.

Primary Security Concerns

Primary security concerns are rogue actions—unintended or harmful behaviors—and sensitive data disclosure.

Defense-in-Depth Approach

Managing agent security requires a hybrid, defense-in-depth approach, rather than relying solely on the AI model's judgment due to manipulation risks.

Deterministic Guardrails

The first security layer consists of traditional, deterministic guardrails—hardcoded rules acting as a security chokepoint outside the model's reasoning.

Reasoning-Based Defenses

The second layer uses AI to secure AI, training models to be resilient to attacks and employing specialized 'guard models' as security analysts.

Agent Identity: New Principal Class

Agent identity represents a new class of principal beyond human users and services, requiring its own verifiable identity.

IAM Paradigm Shift

This is a fundamental shift in how Identity and Access Management (IAM) must be approached in the enterprise.

Bedrock of Agent Security

Verifying each identity and having access controls is the bedrock of agent security.

Verifiable Digital Passport

An agent needs a cryptographically verifiable identity, often using standards like SPIFFE, analogous to an employee ID badge.

Least-Privilege Permissions

Once identified, an agent can be granted specific, least-privilege permissions, like a SalesAgent having CRM access but a HRonboardingAgent being denied.

Containment of Compromise

Granular control ensures that even if a single agent is compromised, the potential blast radius is contained.

Delegated Authority

Without an agent identity construct, agents cannot work on behalf of humans with limited delegated authority.

Authentication Categories

Different principal entities—users, agents, and service accounts—have distinct authentication and verification methods.

Principal entity	Authentication / Verification	Notes
Users	Authenticated with OAuth or SSO	Human actors with full autonomy and responsibility for their actions
Agents (new category of principles)	Verified with SPIFFE	Agents have delegated authority, taking actions on behalf of users
Service accounts	Integrated into IAM	Applications and containers, fully deterministic, no responsible for actions

Policies to Constrain Access

Policies are a form of authorization (AuthZ), distinct from authentication (AuthN), used to limit a principal's capabilities.

Securing an ADK Agent

Securing an agent built with the Agent Development Kit (ADK) involves practical application of identity and policy concepts through code and configuration.

Identity Definition

The process requires clear definition of user accounts (OAuth), service accounts (to run code), and agent identities (to use delegated authority).

Policy Enforcement

After authentication, policies constrain access to services, often done at the API governance layer with MCP and A2A services.

Guardrails in Tools/Models

The next layer involves building guardrails into tools, models, and sub-agents to enforce policies.

Predictable Security Baseline

This ensures that tool logic refuses unsafe or out-of-policy actions, providing a predictable and auditable security baseline through concrete, reliable code.

Callbacks and Plugins

For dynamic security, ADK provides Callbacks and Plugins; a before_tool_callback inspects parameters of a tool call before it runs.

Gemini as a Judge

A common plugin pattern is 'Gemini as a Judge', using a fast, inexpensive model like Gemini Flash-Lite to screen inputs and outputs for prompt injections or harmful content.

Model Armor Service

Model Armor is an optional managed service for dynamic checks, screening prompts and responses for prompt injection, jailbreak attempts, PII leakage, and malicious URLs.

Hybrid Security Approach

Combining strong identity, deterministic in-tool logic, dynamic AI-powered guardrails, and managed services like Model Armor builds a powerful and trustworthy single agent.

Scaling Up to Enterprise Fleet

Scaling AI agents from a single triumph to a fleet of hundreds across an enterprise presents architectural challenges beyond primary security concerns.

Security and Privacy: Agentic Frontier

An enterprise-grade platform must address unique security and privacy challenges inherent to generative AI, even with a single agent.

Agent Governance: Control Plane

Managing agent sprawl requires a higher-order architectural approach: a central gateway serving as a control plane for all agentic activity.

Gateway as Control System

The gateway approach creates a control system, establishing a mandatory entry point for all agentic traffic, including user-to-agent prompts, agent-to-tool calls, and direct LM inference requests.

Two Interconnected Functions

This control plane serves two primary, interconnected functions: Runtime Policy Enforcement and Centralized Governance.

Runtime Policy Enforcement

The gateway acts as the architectural chokepoint for security, handling authentication ('Do I know who this actor is?') and authorization ('Do they have permission to do this?').

Centralized Governance

A central registry, an enterprise app store for agents and tools, provides a source of truth to enforce policies effectively.

Managed, Secure, Efficient Ecosystem

Combining a runtime gateway with a central governance registry transforms chaotic sprawl into a managed, secure, and efficient ecosystem.

Cost and Reliability: Infrastructure Foundation

Enterprise-grade agents must be both reliable and cost-effective, requiring underlying infrastructure to manage these trade-offs securely and compliantly.

How Agents Evolve and Learn

Agents deployed in dynamic environments must adapt to changing policies, technologies, and data formats to avoid performance degradation.

How Agents Learn and Self Evolve

Agents learn from experience and external signals, much like humans, using this information to optimize future behavior.

Runtime Experience

Agents learn from session logs, traces, memory, tool interactions, and decision trajectories, including Human-in-the-Loop (HITL) feedback for guidance.

External Signals

Learning is driven by new external documents, such as updated enterprise policies, public regulatory guidelines, or critiques from other agents.

Enhanced Context Engineering

Advanced systems continuously refine prompts, few-shot examples, and retrieved memory information to optimize context for each task.

Tool Optimization and Creation

Agent reasoning identifies capability gaps, leading to gaining access to new tools, creating tools on the fly (e.g., Python scripts), or modifying existing ones.

Additional Optimization Techniques

Dynamically reconfiguring multi-agent design patterns or using Reinforcement Learning from Human Feedback (RLHF) are active research areas.

Learning New Compliance Guidelines

An enterprise agent operating in a heavily regulated industry can learn new compliance guidelines using a multi-agent workflow.

Simulation and Agent Gym

More advanced approaches involve a dedicated platform, an Agent Gym, engineered to optimize multi-agent systems in offline processes with advanced tooling.

Examples of Advanced Agents

Examples demonstrate advanced agent capabilities in scientific research and algorithm optimization.

Google Co-Scientist

Co-Scientist is an advanced AI agent designed as a virtual research collaborator to accelerate scientific discovery by systematically exploring complex problem spaces.

AlphaEvolve Agent

AlphaEvolve is an advanced agentic system that discovers and optimizes algorithms for complex problems in mathematics and computer science.

Conclusion

Generative AI agents represent a pivotal evolution, shifting artificial intelligence from a passive tool to an active, autonomous partner in problem-solving.

Formal Framework Provided

This document provided a formal framework for understanding and building these systems, moving beyond prototypes to establish reliable, production-grade architecture.

Three Essential Components

An agent is deconstructed into its three essential components: the reasoning Model ('Brain'), actionable Tools ('Hands'), and governing Orchestration Layer ('Nervous System').

Agent's True Potential

Seamless integration of these parts, operating in a continuous 'Think, Act, Observe' loop, unlocks an agent's true potential.

Classifying Agentic Systems

Classifying agentic systems from Level 1 Problem-Solver to Level 3 Multi-Agent System helps architects scope ambitions to task complexity.

New Developer Paradigm

The central challenge lies in a new developer paradigm where developers become 'architects' and 'directors' rather than 'bricklayers' defining explicit logic.

Source of Unreliability

The flexibility that makes LMs powerful is also the source of their unreliability.

Success in Engineering Rigor

Success is found in engineering rigor applied to the entire system, including robust tool contracts, resilient error handling, sophisticated context management, and comprehensive evaluation.

Foundational Blueprint

The outlined principles and architectural patterns serve as a foundational blueprint for navigating this new software frontier.

Harnessing Agentic AI Power

This disciplined architectural approach will be the deciding factor in harnessing the full power of agentic AI, building collaborative, capable, and adaptable team members.

▸ 15 Expand

APEX

From Predictive AI to Autonomous Agents

Artificial intelligence is undergoing a paradigm shift from passive, discrete tasks to autonomous problem-solving and task execution by AI agents.

Made with Rinto — analyse your own content free

▸ 4 Expand

CONC

AI Agents as LM Evolution

Agents represent the natural evolution of Language Models, made useful in software by combining an LM's reasoning with practical action capabilities.

DETL

Traditional AI Focus

For years, AI focused on passive, discrete tasks like answering questions, translating text, or generating images from prompts, requiring constant human direction.

INSG

Paradigm Shift to Autonomous Agents

There is a paradigm shift from AI that predicts or creates content to new software capable of autonomous problem-solving and task execution.

DETL

Agent Definition

An AI agent is a complete application that makes plans and takes actions to achieve goals, combining an LM's ability to reason with the ability to act.

INSG

Agents Handle Complex Tasks

Agents can handle complex, multi-step tasks independently, figuring out necessary steps to reach a goal without constant human guidance.

▸ 2 Expand

CONC

Document Purpose

This document is the first in a five-part series, guiding developers, architects, and product leaders in transitioning to robust, production-grade agentic systems.

DETL

Building Agent Prototypes

Building a simple prototype is straightforward, but ensuring security, quality, and reliability for production is a significant challenge.

▸ 4 Expand

CONC

Comprehensive Foundation

This paper provides a comprehensive foundation for building, deploying, and managing intelligent applications that reason, act, and observe to accomplish goals.

SUBC

Core Anatomy

Deconstructs an agent into its three essential components: the reasoning Model, actionable Tools, and the governing Orchestration Layer.

SUBC

Taxonomy of Capabilities

Classifies agents from simple, connected problem-solvers to complex, collaborative multi-agent systems.

SUBC

Architectural Design

Dives into practical design considerations for each component, from model selection to tool implementation.

SUBC

Building for Production

Establishes the Agent Ops discipline needed to evaluate, debug, secure, and scale agentic systems from single instance to enterprise fleet.

▸ 8 Expand

CONC

Introduction to AI Agents

An AI Agent combines models, tools, an orchestration layer, and runtime services, using a Language Model in a loop to accomplish a goal.

INSG

Anthropomorphizing AI

Words are insufficient to describe human-AI interaction, leading to anthropomorphizing AI with human terms like 'think,' 'reason,' and 'know'.

▸ 2 Expand

SUBC

Model (The Brain)

The core language or foundation model serves as the agent's central reasoning engine to process information, evaluate options, and make decisions.

DETL

Model Type Capabilities

The type of model, whether general-purpose, fine-tuned, or multimodal, dictates the agent's cognitive capabilities.

DETL

Context Window Curator

An agentic system ultimately curates the input context window for the Language Model.

▸ 1 Expand

SUBC

Tools (The Hands)

Tools connect the agent's reasoning to the outside world, enabling actions beyond text generation, including API extensions, code functions, and data stores.

DETL

Tool Usage Process

An agentic system allows an LM to plan tool usage, execute the tool, and incorporate results into the input context window of the next LM call.

▸ 2 Expand

SUBC

Orchestration Layer (The Nervous System)

This layer governs the agent's operational loop, managing planning, memory (state), and reasoning strategy execution using prompting frameworks.

DETL

Reasoning Techniques

The orchestration layer uses reasoning techniques like Chain-of-Thought or ReAct to break down complex goals and decide when to think versus use a tool.

DETL

Memory Management

This layer is responsible for providing agents with the memory to 'remember' information.

▸ 1 Expand

SUBC

Deployment (The Body and Legs)

Production deployment ensures the agent is a reliable and accessible service, involving hosting on a secure, scalable server with monitoring and management.

DETL

Accessing Deployed Agents

Once deployed, agents can be accessed by users through a graphical interface or programmatically via an Agent-to-Agent (A2A) API.

INSG

Developer Role Shift

Building a generative AI agent shifts the developer's role from a 'bricklayer' defining explicit logic to a 'director' setting the scene and guiding autonomous actors.

INSG

LM Flexibility Challenge

A Large Language Model's incredible flexibility, its greatest strength, also makes it difficult to reliably compel it to do one specific thing perfectly.

CONC

Agent Ops for Debugging

'Agent Ops' redefines the debugging cycle of measurement, analysis, and system optimization by monitoring the agent's 'thought process' through traces and logs.

▸ 8 Expand

CONC

Agentic Problem-Solving Process

An AI agent operates on a continuous, cyclical 5-step process to achieve objectives, integrating a reasoning model, actionable tools, and a governing orchestration layer.

DETL

5 Fundamental Steps

The agentic problem-solving loop can be broken down into five fundamental steps, detailed in the book Agentic System Design.

▸ 1 Expand

SUBC

Step 1: Get the Mission

The process begins with a specific, high-level goal provided by a user or an automated trigger.

EXMP

Mission Example

An example mission is 'Organize my team's travel for the upcoming conference' or 'A new high-priority customer ticket has arrived'.

SUBC

Step 2: Scan the Scene

The agent gathers context by perceiving its environment, accessing available resources like memory, user guidance, tools, calendars, databases, or APIs.

▸ 1 Expand

SUBC

Step 3: Think It Through

The agent's core 'think' loop, driven by the reasoning model, analyzes the Mission against the Scene and devises a plan.

DETL

Chain of Reasoning

This involves a chain of reasoning, like planning to use 'get_team_roster' and then 'calendar_api' to book travel.

SUBC

Step 4: Take Action

The orchestration layer executes the first concrete step of the plan by selecting and invoking an appropriate tool, such as calling an API or querying a database.

SUBC

Step 5: Observe and Iterate

The agent observes the outcome of its action, adds new information to its context or memory, and repeats the loop by returning to Step 3.

DETL

Continuous Cycle Management

The 'Think, Act, Observe' cycle continues, managed by the Orchestration Layer, reasoned by the Model, and executed by the Tools until the Mission is achieved.

▸ 7 Expand

EXMP

Customer Support Agent Example

A customer support agent responding to 'Where is my order #12345?' demonstrates the 5-step problem-solving cycle.

DETL

Example: Agent Strategy

Instead of acting immediately, the agent enters its 'Think It Through' phase to devise a multi-step plan for providing a delivery status.

DETL

Example: Plan - Identify

The agent identifies the need to find the order in the internal database to confirm existence and retrieve details.

DETL

Example: Plan - Track

From order details, the agent extracts the shipping carrier's tracking number and queries the external carrier's API for live status.

DETL

Example: Plan - Report

Finally, the agent synthesizes the gathered information into a clear, helpful response for the user.

DETL

Example: Act - Step 1

In its first 'Act' phase, the agent calls the find_order("12345") tool, observing a full order record with tracking number 'ZYX987'.

DETL

Example: Act - Step 2

The orchestration layer then calls the get_shipping_status("ZYX987") tool, observing the result 'Out for Delivery'.

DETL

Example: Act - Report

With all data gathered, the agent plans the final message and generates the response: 'Your order #12345 is 'Out for Delivery'!

▸ 6 Expand

CONC

Taxonomy of Agentic Systems

Agentic systems can be classified into broad levels, each building on the capabilities of the last, scaling in complexity.

INSG

Scoping Agent Type

For architects or product leaders, a key initial decision is scoping what kind of agent to build based on complexity.

▸ 3 Expand

SUBC

Level 0: Core Reasoning System

This level starts with the Language Model as the reasoning engine, operating in isolation based on pre-trained knowledge without external tools or memory.

DETL

Strength of Level 0

Its strength lies in extensive training, allowing it to explain concepts and plan problem-solving deeply.

DETL

Trade-off of Level 0

The trade-off is a complete lack of real-time awareness, being 'blind' to facts outside its training data.

EXMP

Level 0 Example

A Level 0 agent can explain baseball rules or Yankees history but not the score of last night's game, as it's outside its training data.

▸ 2 Expand

SUBC

Level 1: Connected Problem-Solver

At this level, the reasoning engine connects to and utilizes external tools, allowing problem-solving beyond static, pre-trained knowledge.

DETL

Level 1 Capability

This fundamental ability to interact with the world, using tools like search or APIs, is the core capability of a Level 1 agent.

EXMP

Level 1 Example

Given the mission 'What was the final score of the Yankees game last night?', a Level 1 agent uses a Google Search API to find and synthesize the answer 'Yankees won 5-3'.

▸ 3 Expand

SUBC

Level 2: Strategic Problem-Solver

Level 2 expands capabilities from simple tasks to strategically planning complex, multi-part goals, with context engineering as a key skill.

▸ 1 Expand

CONC

Context Engineering

Context engineering is the agent's ability to actively select, package, and manage the most relevant information for each step of its plan.

JUST

Importance of Context Engineering

Context engineering curates the model's limited attention to prevent overload and ensure efficient performance, thus impacting agent accuracy.

▸ 7 Expand

EXMP

Level 2 Example Mission

Find a good coffee shop halfway between two addresses, demonstrating multi-step strategic planning and tool use.

DETL

Example: Step 1 Think

The agent first thinks 'I must first find the halfway point'.

DETL

Example: Step 1 Act

The agent calls the Maps tool with both addresses.

DETL

Example: Step 1 Observe

The agent observes 'The halfway point is Millbrae, CA'.

DETL

Example: Step 2 Think

The agent then thinks 'Now I must find coffee shops in Millbrae' with a 4-star rating or higher.

DETL

Example: Step 2 Act

The agent calls the google_places tool with query='coffee shop in Millbrae, CA', min_rating=4.0, demonstrating context engineering.

DETL

Example: Step 2 Observe

The agent observes the search returns 'Millbrae Coffee' and 'The Daily Grind'.

DETL

Example: Step 3 Think

The agent thinks 'I will synthesize these results and present them to the user'.

DETL

Proactive Assistance

Strategic planning enables proactive assistance, such as an agent reading a flight confirmation email and adding key context to a calendar.

▸ 3 Expand

SUBC

Level 3: Collaborative Multi-Agent System

This level shifts the paradigm from a single 'super-agent' to a 'team of specialists' working in concert, mirroring human organizations with division of labor.

DETL

Agents as Tools

Here, agents treat other agents as tools, exemplified by a 'Project Manager' agent delegating tasks to specialized team members.

▸ 3 Expand

EXMP

Level 3 Example Mission

A 'Project Manager' agent receives the mission 'Launch our new 'Solaris' headphones' and delegates sub-tasks.

DETL

Example: Market Research

The Project Manager delegates to a MarketResearchAgent to 'Analyze competitor pricing for noise-canceling headphones' and return a summary by tomorrow.

DETL

Example: Marketing Task

The Project Manager delegates to a MarketingAgent to 'Draft three versions of a press release' using the 'Solaris' product spec sheet.

DETL

Example: Web Development

The Project Manager delegates to a WebDevAgent to 'Generate the new product page HTML' based on design mockups.

INSG

Automating Complex Workflows

This collaborative model represents the frontier of automating entire, complex business workflows from start to finish, despite current LM reasoning limitations.

▸ 3 Expand

SUBC

Level 4: Self-Evolving System

Level 4 is a profound leap from delegation to autonomous creation and adaptation, where an agentic system dynamically expands its capabilities.

DETL

Dynamic Capability Expansion

At this level, an agent can identify gaps in its own capabilities and dynamically create new tools or even new agents to fill them.

▸ 3 Expand

EXMP

Level 4 Example Mission

A 'Project Manager' agent, tasked with 'Solaris' launch, realizes it needs social media sentiment monitoring but lacks a tool.

DETL

Example: Think (Meta-Reasoning)

The agent thinks 'I must track social media buzz for 'Solaris,' but I lack the capability'.

DETL

Example: Act (Autonomous Creation)

Instead of failing, it invokes an AgentCreator tool to build a new agent that monitors social media for keywords 'Solaris headphones', performs sentiment analysis, and reports daily summaries.

DETL

Example: Observe

A new, specialized SentimentAnalysisAgent is created, tested, and added to the team on the fly, contributing to the original mission.

INSG

Learning and Evolving Organization

This level of autonomy, where a system dynamically expands its own capabilities, transforms a team of agents into a truly learning and evolving organization.

▸ 4 Expand

CONC

Core Agent Architecture

Building agents involves the specific architectural design of its three core components: Model, Tools, and Orchestration, transitioning from concept to code.

▸ 9 Expand

SUBC

Model: The Brain

The Language Model is the reasoning core, and its selection is a critical architectural decision dictating cognitive capabilities, operational cost, and speed.

DCSN

Model Selection Approach

Treating model choice as simply picking the highest benchmark score is a common path to failure; success in production is rarely determined by generic academic benchmarks.

DETL

Agentic Fundamentals

Real-world success requires a model excelling at agentic fundamentals: superior reasoning for complex problems and reliable tool use.

JUST

Optimal Model Selection

The 'best' model is at the optimal intersection of quality, speed, and price for specific tasks, determined by defining the business problem and testing against direct metrics.

DETL

Multiple Models

You may choose more than one model, a 'team of specialists,' using models like Gemini 2.5 Pro for planning and Gemini 2.5 Flash for simpler tasks.

INSG

Model Routing Strategy

Model routing, either automatic or hard-coded, is a key strategy for optimizing both performance and cost.

DETL

Multimodal Data Handling

Natively multimodal models like Gemini live mode streamline image/audio processing, while specialized tools like Cloud Vision API or Speech-to-Text API convert data to text for language-only models.

INSG

AI Landscape Evolution

The AI landscape evolves rapidly, rendering a 'set it and forget it' mindset unsustainable; models chosen today will be superseded in six months.

DCSN

Agent Ops Practice

Building for reality means investing in a nimble operational framework, an 'Agent Ops' practice, with a robust CI/CD pipeline that continuously evaluates new models.

JUST

Agent Ops Benefits

Agent Ops de-risks and accelerates upgrades, ensuring agents are powered by the best models without complete architectural overhaul.

▸ 4 Expand

SUBC

Tools: The Hands

Tools connect the agent's reasoning (brain) to reality, enabling it to retrieve real-time information and take action beyond static training data.

DETL

Three-Part Tool Loop

A robust tool interface involves defining what a tool can do, invoking it, and observing the result.

▸ 2 Expand

CONC

Retrieving Information

Accessing up-to-date information is the most foundational tool, grounding the agent in reality and dramatically reducing hallucinations.

DETL

RAG for External Knowledge

Retrieval-Augmented Generation (RAG) enables agents to query external knowledge stored in Vector Databases or Knowledge Graphs.

DETL

NL2SQL for Structured Data

Natural Language to SQL (NL2SQL) tools allow agents to query databases to answer analytic questions, like 'What were our top-selling products last quarter?'.

▸ 2 Expand

CONC

Executing Actions

Agents unleash true power when they move from reading information to actively performing actions, transforming into autonomous actors.

DETL

Wrapping APIs/Code

Existing APIs and code functions can be wrapped as tools, allowing agents to send emails, schedule meetings, or update customer records.

DETL

Writing/Executing Code

For dynamic tasks, an agent can write and execute code on the fly in a secure sandbox, generating SQL queries or Python scripts.

▸ 1 Expand

CONC

Human in the Loop (HITL)

HITL tools allow agents to pause workflows and ask for human confirmation or specific information, ensuring human involvement in critical decisions.

EXMP

HITL Implementation

HITL could be implemented via SMS text messaging or a task in a database.

▸ 3 Expand

SUBC

Function Calling

For agents to reliably do 'function calling' and use tools, clear instructions, secure connections, and orchestration are required.

DETL

OpenAPI Specification

Longstanding standards like the OpenAPI specification provide a structured contract describing a tool's purpose, parameters, and expected response.

DETL

Model Context Protocol (MCP)

Open standards like the Model Context Protocol (MCP) are popular for simpler discovery and connection to tools due to convenience.

DETL

Native Tools

A few models, like Gemini with native Google Search, invoke functions as part of the LM call itself.

▸ 1 Expand

SUBC

Orchestration Layer

The orchestration layer, acting as the central nervous system, is the engine that runs the 'Think, Act, Observe' loop and governs agent behavior.

INSG

Orchestration Layer Role

This layer is not just plumbing but the conductor of the agentic symphony, deciding when the model reasons, which tool acts, and how results inform the next movement.

▸ 6 Expand

CONC

Core Design Choices

Architectural decisions for agents involve determining autonomy, implementation methods, and ensuring a production-grade framework.

DCSN

Agent Autonomy Spectrum

The first architectural decision is determining the agent's degree of autonomy, which exists on a spectrum from deterministic workflows to dynamically adaptive LMs.

DCSN

Implementation Method

No-code builders offer speed for structured tasks and simple agents, while code-first frameworks like Google's Agent Development Kit (ADK) provide deep control for complex systems.

▸ 2 Expand

DCSN

Production-Grade Framework

A production-grade framework must be open, allowing plug-in of any model or tool to prevent vendor lock-in.

DETL

Precise Control

The framework must provide precise control, enabling a hybrid approach where LM reasoning is governed by hard-coded business rules.

DETL

Observability

The framework must be built for observability, generating detailed traces and logs that expose the entire reasoning trajectory for unexpected agent behavior.

▸ 1 Expand

CONC

Instruct with Domain Knowledge and Persona

Developers' most powerful lever is instructing the agent with domain knowledge and a distinct persona, using a system prompt or core instructions.

INSG

Agent Constitution

This instruction isn't just a simple command; it serves as the agent's constitution, defining constraints, desired output, rules of engagement, and tone.

▸ 2 Expand

CONC

Augment with Context

The agent's 'memory' is orchestrated into the LM context window at runtime.

SUBC

Short-Term Memory

This is the agent's active 'scratchpad,' maintaining the running history of the current conversation and action-observation pairs for immediate context.

▸ 1 Expand

SUBC

Long-Term Memory

Long-term memory provides persistence across sessions, implemented as a RAG system connected to a vector database or search engine.

INSG

Personalized Continuous Experience

The orchestrator enables the agent to pre-fetch and query its history, allowing it to 'remember' user preferences or past task outcomes for personalization.

▸ 5 Expand

CONC

Multi-Agent Systems and Design Patterns

As tasks grow in complexity, a 'team of specialists' approach, mirroring human organizations, is more effective than a single 'super-agent'.

JUST

Benefits of Division of Labor

This division of labor makes each specialized AI agent simpler, more focused, and easier to build, test, and maintain for dynamic or long-running processes.

DETL

Coordinator Pattern

For dynamic or non-linear tasks, a 'manager' agent segments complex requests and intelligently routes sub-tasks to specialist agents, then aggregates responses.

DETL

Sequential Pattern

For linear workflows, the Sequential pattern acts like a digital assembly line where one agent's output becomes the next agent's input.

DETL

Iterative Refinement Pattern

This pattern creates a feedback loop, using a 'generator' agent to create content and a 'critic' agent to evaluate it against quality standards.

DETL

Human-in-the-Loop (HITL) Pattern

For high-stakes tasks, the Human-in-the-Loop (HITL) pattern is critical, creating a deliberate pause for human approval before significant agent action.

▸ 3 Expand

CONC

Agent Deployment and Services

Deploying a local agent to a server makes it a reliable, accessible service, requiring several supporting services for effectiveness.

DETL

Essential Services

An agent requires session history, memory persistence, and other services for effective operation in production.

DETL

Agent Builder Responsibilities

Agent builders are responsible for logging, security measures, data privacy, data residency, and regulation compliance when deploying to production.

DETL

Deployment Options

Agent builders can rely on application hosting infrastructure, including purpose-built platforms like Vertex AI Agent Engine or industry standard runtimes like Cloud Run or GKE.

▸ 8 Expand

CONC

Agent Ops: Structured Approach to Unpredictable

Building agents requires a new operational philosophy called 'Agent Ops' due to the stochastic nature of agentic systems and probabilistic responses.

DETL

Testing Generative AI

Traditional deterministic software unit tests cannot simply assert output == expected for generative AI, as agent responses are probabilistic by design.

INSG

LM for Quality Evaluation

Evaluating agent 'quality' usually requires an LM to assess if the response fulfills requirements, avoids extra content, and maintains proper tone.

DETL

Agent Ops Definition

Agent Ops is a disciplined, structured approach managing the unique challenges of building, deploying, and governing AI agents, evolving from DevOps and MLOps.

▸ 2 Expand

CONC

Measure What Matters: Instrumenting Success

Define 'better' in business context by framing observability like an A/B test and identifying Key Performance Indicators (KPIs) for real-world impact.

DETL

KPI Examples

KPIs should go beyond technical correctness to include goal completion rates, user satisfaction scores, task latency, operational cost per interaction, and business goals like revenue or retention.

JUST

Metrics-Driven Development

A top-down view of KPIs guides testing, enables metrics-driven development, and allows calculation of return on investment.

▸ 5 Expand

CONC

Quality Instead of Pass/Fail: LM Judge

Since simple pass/fail evaluation is impossible for agents, quality is assessed using an 'LM as Judge' against a predefined rubric.

DETL

LM Judge Rubric

The LM Judge evaluates if the agent's output is correct, factually grounded, and follows instructions.

DETL

Automated Evaluation

This automated evaluation, run against a golden dataset of prompts, provides a consistent measure of quality.

DETL

Evaluation Dataset Creation

Creating evaluation datasets involves sampling scenarios from existing production or development interactions, covering the full breadth of use cases and unexpected ones.

DCSN

Evaluation Review

Evaluation results should always be reviewed by a domain expert before acceptance as valid.

INSG

Product Manager Responsibility

Curation and maintenance of these evaluations are increasingly a key responsibility for Product Managers with support from domain experts.

▸ 2 Expand

CONC

Metrics-Driven Development: Go/No-Go

Automating dozens of evaluation scenarios and establishing trusted quality scores allow confident testing of development agent changes.

DETL

Deployment Process

Run new versions against the evaluation dataset, compare scores to existing production versions, and use A/B deployments for maximum safety.

DETL

Important Factors

Beyond automated evaluations, important factors include latency, cost, and task success rates, which should be compared in A/B deployments.

▸ 3 Expand

CONC

Debug with OpenTelemetry Traces

OpenTelemetry traces provide a high-fidelity, step-by-step recording of the agent's entire execution path, essential for understanding 'why' metrics dip or bugs occur.

DETL

Trace Details

Traces expose the exact prompt, model's internal reasoning, chosen tool, parameters, and raw observation data.

JUST

Debugging Utility

Trace details diagnose and fix root causes of issues, providing deep insights primarily for debugging, not performance overviews.

DETL

Trace Data Collection

Trace data can be collected seamlessly in platforms like Google Cloud Trace, streamlining root cause analysis by visualizing and searching traces.

▸ 3 Expand

CONC

Cherish Human Feedback

Human feedback is the most valuable and data-rich resource for improving agents, serving as 'gifts' for new real-world edge cases.

DETL

Feedback Collection

Collecting and aggregating bug reports or 'thumbs down' feedback is critical to generate insights and trigger alerts for operational issues.

JUST

Closing the Loop

An effective Agent Ops process 'closes the loop' by capturing feedback, replicating the issue, and converting it into a new, permanent test case.

INSG

System Vaccination

Closing the loop ensures the bug is fixed and the system is 'vaccinated' against that entire class of error recurring.

▸ 1 Expand

CONC

Agent Interoperability

Interconnecting high-quality agents with users and other agents is crucial for bringing agents into a wider ecosystem, akin to the 'face of the Agent'.

DETL

Agents Are Not Tools

There is a distinction between connecting to agents and connecting agents with data and APIs; agents are not tools themselves.

▸ 6 Expand

CONC

Agents and Humans

The most common form of agent-human interaction is through a user interface, ranging from chatbots to rich, dynamic front-end experiences.

DETL

HITL Interaction Patterns

Human-in-the-Loop (HITL) patterns include intent refinement, goal expansion, confirmation, and clarification requests.

DETL

LM Control of UI

Computer use is a tool category where the LM controls a user interface, with human interaction and oversight, navigating pages or pre-filling forms.

DETL

Dynamic UI Adaptation

The LM can change the UI to meet needs via Tools controlling UI (MCP UI), specialized messaging (AG UI), or generating bespoke interfaces (A2UI).

DETL

Multimodal Communication

Advanced agents are breaking the text barrier with real-time, multimodal communication in 'live mode' for a natural connection.

DETL

Gemini Live API

Technologies like the Gemini Live API enable bidirectional streaming, allowing users to speak to and interrupt agents as in natural conversation.

INSG

Enhanced Agent-Human Collaboration

With camera and microphone access, agents can see and hear users, responding with generated speech at human-like latency, fundamentally changing collaboration.

▸ 4 Expand

CONC

Agents and Agents

As enterprises scale AI, agents must connect with each other, requiring a common standard for discovery and communication.

DETL

Challenges of Agent Interconnection

Without a common standard, connecting different specialized agents from various teams would create a tangled web of brittle, custom API integrations.

▸ 3 Expand

SUBC

Agent2Agent (A2A) Protocol

The Agent2Agent (A2A) protocol is an open standard designed to solve agent communication, acting as a universal handshake for the agentic economy.

DETL

Agent Card

A2A allows any agent to publish a digital 'business card' (Agent Card), a simple JSON file advertising capabilities, network endpoint, and security credentials.

JUST

Standardized Discovery

Agent Cards make discovery simple and standardized for agent communication.

DETL

Agent Communication Distinction

Unlike MCP for transactional requests, Agent 2 Agent communication is typically for additional problem solving.

▸ 1 Expand

SUBC

Task-Oriented Architecture

Once discovered, agents communicate using a task-oriented architecture, framing interactions as asynchronous 'tasks' instead of simple request-response.

DETL

Streaming Updates

A client agent sends a task request to a server agent, which can provide streaming updates over a long-running connection.

INSG

Interoperable Ecosystem

This robust, standardized communication protocol enables collaborative, Level 3 multi-agent systems and transforms isolated agents into an interoperable ecosystem.

▸ 5 Expand

CONC

Agents and Money

As AI agents perform more tasks, some involve buying, selling, or facilitating transactions, creating a trust crisis if something goes wrong.

INSG

Trust Crisis in Agentic Economy

If an autonomous agent clicks 'buy,' it creates a crisis of trust regarding fault, authorization, authenticity, and accountability.

JUST

Unlocking Agentic Economy

To unlock a true agentic economy, new standards are needed that allow agents to transact securely and reliably on behalf of their users.

▸ 2 Expand

SUBC

Agent Payments Protocol (AP2)

AP2 is an open protocol designed as the definitive language for agentic commerce, extending A2A by introducing cryptographically-signed digital 'mandates'.

DETL

Verifiable User Intent

Digital mandates act as verifiable proof of user intent, creating a non-repudiable audit trail for every transaction.

JUST

Global Transaction Capability

This allows agents to securely browse, negotiate, and transact on a global scale based on delegated authority from the user.

▸ 1 Expand

SUBC

x402 Protocol

x402 is an open internet payment protocol using the standard HTTP 402 'Payment Required' status code.

JUST

Frictionless Micropayments

It enables frictionless machine-to-machine micropayments, allowing agents to pay for API access or digital content on a pay-per-use basis without complex accounts.

INSG

Foundational Trust Layer

Together, AP2 and x402 protocols are building the foundational trust layer for the agentic web.

▸ 16 Expand

CONC

Securing a Single Agent: Trust Trade-Off

When creating an AI agent, there's a fundamental tension between utility and security, as granting power introduces risk.

DETL

Agent Power and Risk

To be useful, agents need autonomy to make decisions and tools to perform actions like sending emails or querying databases.

DETL

Primary Security Concerns

Primary security concerns are rogue actions—unintended or harmful behaviors—and sensitive data disclosure.

DCSN

Defense-in-Depth Approach

Managing agent security requires a hybrid, defense-in-depth approach, rather than relying solely on the AI model's judgment due to manipulation risks.

▸ 2 Expand

SUBC

Deterministic Guardrails

The first security layer consists of traditional, deterministic guardrails—hardcoded rules acting as a security chokepoint outside the model's reasoning.

EXMP

Guardrail Example

A policy engine blocking purchases over $100 or requiring explicit user confirmation before external API interaction is an example.

JUST

Guardrail Benefit

This layer provides predictable, auditable hard limits on the agent's power.

▸ 2 Expand

SUBC

Reasoning-Based Defenses

The second layer uses AI to secure AI, training models to be resilient to attacks and employing specialized 'guard models' as security analysts.

DETL

Guard Model Function

Guard models examine the agent's proposed plan before execution, flagging potentially risky or policy-violating steps for review.

JUST

Robust Security Posture

This hybrid model, combining rigid code certainty with contextual AI awareness, creates a robust security posture ensuring agent power aligns with its purpose.

▸ 7 Expand

SUBC

Agent Identity: New Principal Class

Agent identity represents a new class of principal beyond human users and services, requiring its own verifiable identity.

INSG

IAM Paradigm Shift

This is a fundamental shift in how Identity and Access Management (IAM) must be approached in the enterprise.

JUST

Bedrock of Agent Security

Verifying each identity and having access controls is the bedrock of agent security.

DETL

Verifiable Digital Passport

An agent needs a cryptographically verifiable identity, often using standards like SPIFFE, analogous to an employee ID badge.

DETL

Least-Privilege Permissions

Once identified, an agent can be granted specific, least-privilege permissions, like a SalesAgent having CRM access but a HRonboardingAgent being denied.

JUST

Containment of Compromise

Granular control ensures that even if a single agent is compromised, the potential blast radius is contained.

DETL

Delegated Authority

Without an agent identity construct, agents cannot work on behalf of humans with limited delegated authority.

CMPR

Authentication Categories

Different principal entities—users, agents, and service accounts—have distinct authentication and verification methods.

Principal entity	Authentication / Verification	Notes
Users	Authenticated with OAuth or SSO	Human actors with full autonomy and responsibility for their actions
Agents (new category of principles)	Verified with SPIFFE	Agents have delegated authority, taking actions on behalf of users
Service accounts	Integrated into IAM	Applications and containers, fully deterministic, no responsible for actions

▸ 2 Expand

SUBC

Policies to Constrain Access

Policies are a form of authorization (AuthZ), distinct from authentication (AuthN), used to limit a principal's capabilities.

EXMP

Policy Example

An example policy is 'Users in Marketing can only access these 27 API endpoints and cannot execute DELETE commands'.

JUST

Principle of Least Privilege

The recommended approach for agents is to constrain access to only the capabilities required for their jobs, applying the principle of least privilege.

▸ 8 Expand

SUBC

Securing an ADK Agent

Securing an agent built with the Agent Development Kit (ADK) involves practical application of identity and policy concepts through code and configuration.

DETL

Identity Definition

The process requires clear definition of user accounts (OAuth), service accounts (to run code), and agent identities (to use delegated authority).

DETL

Policy Enforcement

After authentication, policies constrain access to services, often done at the API governance layer with MCP and A2A services.

DETL

Guardrails in Tools/Models

The next layer involves building guardrails into tools, models, and sub-agents to enforce policies.

JUST

Predictable Security Baseline

This ensures that tool logic refuses unsafe or out-of-policy actions, providing a predictable and auditable security baseline through concrete, reliable code.

DETL

Callbacks and Plugins

For dynamic security, ADK provides Callbacks and Plugins; a before_tool_callback inspects parameters of a tool call before it runs.

DETL

Gemini as a Judge

A common plugin pattern is 'Gemini as a Judge', using a fast, inexpensive model like Gemini Flash-Lite to screen inputs and outputs for prompt injections or harmful content.

▸ 1 Expand

SUBC

Model Armor Service

Model Armor is an optional managed service for dynamic checks, screening prompts and responses for prompt injection, jailbreak attempts, PII leakage, and malicious URLs.

JUST

Model Armor Benefits

Offloading complex security tasks to Model Armor ensures consistent, robust protection without developers needing to build and maintain guardrails.

INSG

Hybrid Security Approach

Combining strong identity, deterministic in-tool logic, dynamic AI-powered guardrails, and managed services like Model Armor builds a powerful and trustworthy single agent.

▸ 1 Expand

CONC

Scaling Up to Enterprise Fleet

Scaling AI agents from a single triumph to a fleet of hundreds across an enterprise presents architectural challenges beyond primary security concerns.

INSG

Agent Sprawl

When agents and tools proliferate across an organization, it can lead to 'API sprawl' like complexity, requiring systems to handle much more than just individual agent security.

▸ 4 Expand

CONC

Security and Privacy: Agentic Frontier

An enterprise-grade platform must address unique security and privacy challenges inherent to generative AI, even with a single agent.

DETL

New Attack Vectors

The agent itself becomes a new attack vector vulnerable to prompt injection, data poisoning, and inadvertent leakage of sensitive data.

DETL

Defense-in-Depth Strategy

A robust platform provides a defense-in-depth strategy, starting with protecting data from training base models and using controls like VPC Service Controls.

DETL

Input and Output Filtering

The strategy requires input and output filtering, acting like a firewall for prompts and responses.

DETL

Contractual Protections

The platform must offer contractual protections, like intellectual property indemnity for training data and generated output, giving enterprises legal and technical confidence.

▸ 5 Expand

CONC

Agent Governance: Control Plane

Managing agent sprawl requires a higher-order architectural approach: a central gateway serving as a control plane for all agentic activity.

INSG

Gateway as Control System

The gateway approach creates a control system, establishing a mandatory entry point for all agentic traffic, including user-to-agent prompts, agent-to-tool calls, and direct LM inference requests.

DETL

Two Interconnected Functions

This control plane serves two primary, interconnected functions: Runtime Policy Enforcement and Centralized Governance.

▸ 2 Expand

SUBC

Runtime Policy Enforcement

The gateway acts as the architectural chokepoint for security, handling authentication ('Do I know who this actor is?') and authorization ('Do they have permission to do this?').

JUST

Observability Benefit

Centralizing enforcement provides a 'single pane of glass' for observability, creating common logs, metrics, and traces for every transaction.

INSG

Transparent and Auditable System

This transforms disparate agents and workflows into a transparent and auditable system.

▸ 2 Expand

SUBC

Centralized Governance

A central registry, an enterprise app store for agents and tools, provides a source of truth to enforce policies effectively.

DETL

Registry Benefits

The registry allows developers to discover and reuse assets, preventing redundant work, and gives administrators a complete inventory.

DETL

Formal Lifecycle

It enables a formal lifecycle for agents and tools, allowing security reviews before publication, versioning, and creation of fine-grained policies.

INSG

Managed, Secure, Efficient Ecosystem

Combining a runtime gateway with a central governance registry transforms chaotic sprawl into a managed, secure, and efficient ecosystem.

▸ 5 Expand

CONC

Cost and Reliability: Infrastructure Foundation

Enterprise-grade agents must be both reliable and cost-effective, requiring underlying infrastructure to manage these trade-offs securely and compliantly.

INSG

Negative ROI Factors

An agent that frequently fails or provides slow results has a negative Return on Investment.

INSG

Scaling Challenges

A prohibitively expensive agent cannot scale to meet business demands, hindering its utility.

DETL

Infrastructure Options

Infrastructure needs range from scale-to-zero for irregular traffic to dedicated capacity like Provisioned Throughput for LM services or 99.9% SLAs for runtimes like Cloud Run.

JUST

Predictable Performance

These infrastructure options, coupled with comprehensive monitoring, ensure predictable performance, making important agents responsive even under heavy load.

INSG

Foundation for AI Scaling

This establishes the final, essential foundation for scaling AI agents from innovation into a core, reliable enterprise component.

▸ 2 Expand

CONC

How Agents Evolve and Learn

Agents deployed in dynamic environments must adapt to changing policies, technologies, and data formats to avoid performance degradation.

DETL

Agent Aging

Without adaptability, an agent's performance degrades over time, a process called 'aging', leading to loss of utility and trust.

JUST

Scalable Learning Solution

Manually updating a large fleet of agents is uneconomical; a more scalable solution is designing agents that learn and evolve autonomously.

▸ 6 Expand

CONC

How Agents Learn and Self Evolve

Agents learn from experience and external signals, much like humans, using this information to optimize future behavior.

SUBC

Runtime Experience

Agents learn from session logs, traces, memory, tool interactions, and decision trajectories, including Human-in-the-Loop (HITL) feedback for guidance.

SUBC

External Signals

Learning is driven by new external documents, such as updated enterprise policies, public regulatory guidelines, or critiques from other agents.

SUBC

Enhanced Context Engineering

Advanced systems continuously refine prompts, few-shot examples, and retrieved memory information to optimize context for each task.

SUBC

Tool Optimization and Creation

Agent reasoning identifies capability gaps, leading to gaining access to new tools, creating tools on the fly (e.g., Python scripts), or modifying existing ones.

DETL

Additional Optimization Techniques

Dynamically reconfiguring multi-agent design patterns or using Reinforcement Learning from Human Feedback (RLHF) are active research areas.

▸ 5 Expand

EXMP

Learning New Compliance Guidelines

An enterprise agent operating in a heavily regulated industry can learn new compliance guidelines using a multi-agent workflow.

DETL

Querying Agent Role

A Querying Agent retrieves raw data in response to a user request.

DETL

Reporting Agent Role

A Reporting Agent synthesizes retrieved data into a draft report.

DETL

Critiquing Agent Role

A Critiquing Agent, with compliance guidelines, reviews the report, escalating to a human expert if ambiguity or final sign-off is needed.

DETL

Learning Agent Role

A Learning Agent observes the interaction, pays attention to human expert feedback, and generalizes it into new, reusable guidelines.

INSG

Autonomous Adaptation Loop

If a human flags data requiring anonymization, the Learning Agent records it, and the Critiquing Agent applies this new rule, reducing human intervention.

▸ 5 Expand

CONC

Simulation and Agent Gym

More advanced approaches involve a dedicated platform, an Agent Gym, engineered to optimize multi-agent systems in offline processes with advanced tooling.

DETL

Agent Gym Attribute: Standalone

It is not in the execution path, functioning as a standalone off-production platform with assistance from any LM model and offline tools.

DETL

Agent Gym Attribute: Simulation

It offers a simulation environment for agents to 'exercise' on new data and learn, excellent for 'trial-and-error' with many optimization pathways.

DETL

Agent Gym Attribute: Synthetic Data

It can call advanced synthetic data generators to guide simulation to be realistic and pressure-test agents, including red-teaming and critiquing agents.

DETL

Agent Gym Attribute: Adaptable Tools

The arsenal of optimization tools is not fixed; it can adopt new tools via open protocols or learn new concepts and craft tools.

DETL

Agent Gym Attribute: Human Connection

Agent Gym can connect to human domain experts for consulting on outcomes, guiding optimizations for edge-cases of 'tribal knowledge'.

▸ 2 Expand

CONC

Examples of Advanced Agents

Examples demonstrate advanced agent capabilities in scientific research and algorithm optimization.

▸ 5 Expand

SUBC

Google Co-Scientist

Co-Scientist is an advanced AI agent designed as a virtual research collaborator to accelerate scientific discovery by systematically exploring complex problem spaces.

DETL

Co-Scientist Goal

It enables researchers to define a goal, ground the agent in knowledge sources, and generate/evaluate novel hypotheses.

DETL

Ecosystem of Agents

To achieve its goals, Co-Scientist spawns an ecosystem of collaborating agents.

DETL

Supervisor Agent Role

The AI acts as a research project manager, creating a detailed plan and delegating tasks to specialized agents, distributing resources.

JUST

Scalability and Improvement

This structure ensures the project can scale easily and that the team's methods improve as they work toward the final goal.

INSG

Continuous Improvement

Various agents work for hours or days, continuously improving generated hypotheses, running loops that refine ideas and judgment methods.

▸ 10 Expand

SUBC

AlphaEvolve Agent

AlphaEvolve is an advanced agentic system that discovers and optimizes algorithms for complex problems in mathematics and computer science.

DETL

AlphaEvolve Mechanism

It combines Gemini language models' creative code generation with an automated evaluation system, using an evolutionary process.

DETL

Evolutionary Process

The AI generates potential solutions, an evaluator scores them, and promising ideas inspire the next generation of code.

DETL

Breakthroughs

This approach has led to significant breakthroughs, including improving Google's data centers, chip design, and AI training.

DETL

Matrix Multiplication

AlphaEvolve has discovered faster matrix multiplication algorithms.

DETL

Mathematical Problems

AlphaEvolve has found new solutions to open mathematical problems.

JUST

AlphaEvolve Strength

AlphaEvolve excels at problems where verifying solution quality is easier than finding the solution itself.

DETL

Iterative Human-AI Partnership

AlphaEvolve is designed for a deep, iterative partnership between humans and AI, working in two main ways.

SUBC

Transparent Solutions

The AI generates solutions as human-readable code, allowing users to understand logic, gain insights, trust results, and directly modify code.

SUBC

Expert Guidance

Human expertise is essential for defining the problem, refining evaluation metrics, and steering exploration, preventing unintended loopholes.

INSG

Continuous Code Improvement

The agent continuously improves the code, enhancing metrics specified by humans.

▸ 9 Expand

CONC

Conclusion

Generative AI agents represent a pivotal evolution, shifting artificial intelligence from a passive tool to an active, autonomous partner in problem-solving.

INSG

Formal Framework Provided

This document provided a formal framework for understanding and building these systems, moving beyond prototypes to establish reliable, production-grade architecture.

DETL

Three Essential Components

An agent is deconstructed into its three essential components: the reasoning Model ('Brain'), actionable Tools ('Hands'), and governing Orchestration Layer ('Nervous System').

JUST

Agent's True Potential

Seamless integration of these parts, operating in a continuous 'Think, Act, Observe' loop, unlocks an agent's true potential.

DETL

Classifying Agentic Systems

Classifying agentic systems from Level 1 Problem-Solver to Level 3 Multi-Agent System helps architects scope ambitions to task complexity.

INSG

New Developer Paradigm

The central challenge lies in a new developer paradigm where developers become 'architects' and 'directors' rather than 'bricklayers' defining explicit logic.

JUST

Source of Unreliability

The flexibility that makes LMs powerful is also the source of their unreliability.

JUST

Success in Engineering Rigor

Success is found in engineering rigor applied to the entire system, including robust tool contracts, resilient error handling, sophisticated context management, and comprehensive evaluation.

INSG

Foundational Blueprint

The outlined principles and architectural patterns serve as a foundational blueprint for navigating this new software frontier.

INSG

Harnessing Agentic AI Power

This disciplined architectural approach will be the deciding factor in harnessing the full power of agentic AI, building collaborative, capable, and adaptable team members.