Economy of Minds (EOM) is a decentralized multi-agent system that uses market-like mechanisms for agents to coordinate, improve, and achieve emergent multi-step reasoning and strong performance on various tasks.

Made with Rinto — analyse your own content free

EOM System Overview

EOM is a new multi-agent system from Harvard that leverages auctions, payments, and wealth accumulation for decentralized coordination.

Relevance for Multi-Agent System Builders

EOM is for developers building multi-agent systems to accomplish specific tasks, addressing limitations of hand-designed orchestration.

EOM's Core Goal

Given a task, EOM aims to generate an optimized population of multi-agents, each with specific instructions on how and when to act.

Emergent Complex Behaviors

Complex behaviors emerge automatically when simple agents optimize their actions around uncertainties posed by other agents.

EOM Clarifications and Caveats

This paper introduces a new algorithm to optimize agents on verifiable environments, not for financial independence or trading.

EOM Agent Definition

In EOM, an agent is not a separately trained neural network but essentially a prompted LLM policy.

EOM's Two Coupled Loops

EOM operates through two coupled loops: Planning within an episode and Adaptation across episodes.

EOM's Final Deliverable

The goal of EOM is a group of agents, each with its own system prompt and a policy of when to act.

Loop 1: Collect Experiences + Run Auction

This loop defines the within-episode dynamics, including agent bidding, action execution, and wealth transfer.

Agent Activation

At each environment step, agents run a prompt to determine if they should 'wake up' and participate in the auction.

Bid Submission

Woken agents automatically submit their frozen bids, which are fixed during initialization.

Auction Winner

The agent with the highest bid wins the auction, immediately loses the bid amount, and gains control of the environment.

Action Execution

The winning agent samples an action in the target environment, advancing the clock from s_t to s_t+1.

Environment Reward

The environment transitions and produces a reward r_t.

Wealth Transfer and Credit Assignment

Wealth transfer happens with bucket-brigade credit assignment, involving payments between agents and environment rewards.

Payment to Previous Winner

The new winner pays its bid to the previous winner.

Environment Reward Collection

The new winner also collects the environment reward r_t into their wallet.

First Winner Payment

For the very first winner in an episode, payment goes to the 'house' instead of another agent.

Loop Repetition

The loop repeats on the updated environment, with agents waking up based on the latest observation.

Agent Bankruptcy

If an agent goes bankrupt (wealth drops to zero or below), they are thrown out.

Passive Agent Penalization

If an agent sits on their wallet and declines participation, their wallet degrades over time, leading to bankruptcy.

Urgency for Participation

The wealth degradation mechanism adds urgency to agent participation in the system.

Addressing Credit Assignment Problem

This method addresses the credit assignment problem common in environments without intermediate rewards.

Solution for Credit Assignment

The 'pay your bid to the last auction winner' rule provides a solution for long-horizon credit assignment.

Backward Flow of Value

The design decision has a key consequence related to the backward flow of value.

Agent Profit Mechanism

An agent can profit by moving the system into states where downstream agents are willing to 'pay their bid' to take over.

Decentralized Credit Assignment

This mechanism becomes decentralized credit assignment across the trajectory.

Reward for Enabling Actions

If an action enables valuable future actions, later agents 'buy' the continuation via bids, rewarding the agent even without direct r_t.

Loop 2: Evolve Agents

After episode rollouts finish, the population of agent policies is updated using economic selection and prompt mutation.

Inference and Shipping

EOM trains and ships a society of agents, not a single winner, with market simulation used only during training.

Case Study: Accelerator Design

The Accelerator Design task illustrates EOM's 'Economy of Minds' idea, showcasing role-specialized agents and wealth dynamics.

Role-Specialized Agents

Agents are specialized into roles like Historian, Planner, and Executor for the accelerator design task.

Environment Reward Metric

The environment reward is about improving EDP (energy-delay product) on GEMMINI ResNet-50 kernels, where lower EDP is better.

Wealth as Scoreboard

Each role-specialized agent carries wealth, which acts as a live scoreboard of usefulness as episodes progress.

Wealth Accumulation

Agents that help produce new best records accumulate wealth.

Agent Penalties

A periodic rent steadily penalizes everyone, causing mediocre agents to slowly die out.

Agent Removal

Once wealth drops below zero, an agent goes bankrupt and is removed.

Agent Reproduction

The richest agents spawn mutated 'good-birth' descendants (exploitation).

Agent Exploration

The weakest agents spawn amended 'bad-birth' descendants (exploration).

Discovery of Valuable Lineages

Across different kernels, market pressure automatically discovers which specialist lineage is actually valuable.

Lineage Examples

Sometimes Historian-style memory collapses due to inherited bias, or Planner lineages reproduce because search direction is the bottleneck.

Co-existence of Roles

Sometimes multiple roles co-exist because they are complementary.

Emergent Coordination and Credit

Coordination and credit assignment emerge from simple incentives like wealth flow, rent, birth, and bankruptcy.

Adaptive Population Without Central Control

This mechanism produces an adaptive population without requiring a central control system.

Emerging Behaviors / 'Aha Moments'

The paper highlights several 'aha moments' or emerging behaviors, revealing how economic rules lead to self-organization.

Role Seeding

For specific environments like MATH, agents are seeded with roles such as Planners, Executors, and Verifiers during initialization.

Intuitive Bidding Behavior

Planners likely bid early, while verifiers likely make bids after a draft solution is in place.

Economic Rules Self-Organization

EOM doesn't hard-code workflows, instead setting up economic rules that lead to self-organizing behaviors resembling learned algorithms.

1) Credit Assignment as Market Signal

Performance improves because the economy selects useful action chains, reproduces them, and deletes agents that don’t contribute.

2) Non-monotonic Learning Curves

On Finance-Agent-Bench, EOM performance dips early during exploration, later recovering and surpassing initial performance.

3) Wealth Trajectories Show Dominant Lineages

In accelerator design, useful lineages persist, spawn offspring, and dominate auctions, while failed variants go bankrupt.

4) Discovery of Reusable Domain Structure

On the hardest accelerator kernels, the society repeatedly converges on a specific tiling/dataflow motif without templates.

5) Prompts Evolve into Reasoning Routines

In scientific research, prompts evolve into compact multi-step reasoning routines, with executors internalizing roles and adding self-checks.

6) Action Discipline: Learning When Not to Spend

In the CloudCast task, the economy selects different workflow shapes based on the workspace state, showing emergent resource-awareness.

CloudCast Task Description

CloudCast is an iterative code-optimization task where agents improve a Python program to minimize total data-transfer cost.

Workflow Shapes vs Workspace State

The economy selects different workflow shapes depending on whether the workspace is near a high score or uncertain/regressed.

Workspace State	Workflow Shape
Near a high score	short 'read-edit-evaluate-commit'
Uncertain/Regressed	longer 'edit-build-evaluate' loops

Emergent Resource Awareness

This is an emergent resource-awareness behavior, demonstrating a society-level policy of cautious versus aggressive action.

▸ 13 Expand

APEX

Economy of Minds (EOM) Explained

Economy of Minds (EOM) is a decentralized multi-agent system that uses market-like mechanisms for agents to coordinate, improve, and achieve emergent multi-step reasoning and strong performance on various tasks.

Made with Rinto — analyse your own content free

▸ 2 Expand

CONC

EOM System Overview

EOM is a new multi-agent system from Harvard that leverages auctions, payments, and wealth accumulation for decentralized coordination.

DETL

System Design

Agents coordinate and improve over time using market-like mechanisms like auctions, payments, and wealth accumulation.

STAT

Reported Performance

Such an environment has led to emergent multi-step reasoning and strong performance on several agentic tasks.

▸ 3 Expand

INSG

Relevance for Multi-Agent System Builders

EOM is for developers building multi-agent systems to accomplish specific tasks, addressing limitations of hand-designed orchestration.

JUST

Current Limitations

Most multi-agent stacks rely on hand-designed orchestration, where developers manually define explicit prompts and state machine graphs.

DETL

Task Requirements

Long tasks require different role switches according to the state and progress of the task.

DETL

Optimal Design Goal

Systems should optimally switch system prompts for continuous task progress.

▸ 2 Expand

CONC

EOM's Core Goal

Given a task, EOM aims to generate an optimized population of multi-agents, each with specific instructions on how and when to act.

DETL

Mechanism for Optimization

EOM simulates a market system that externally controls how agents evolve.

DETL

Optimization Result

The end result is a group of specialized agents and an intelligent routing mechanism to select how they solve a task.

▸ 2 Expand

INSG

Emergent Complex Behaviors

Complex behaviors emerge automatically when simple agents optimize their actions around uncertainties posed by other agents.

DETL

Theory Origin

This theory of behaviors organically emerging from multi-agent scenarios is not a new concept.

EXMP

Prior Work Example

Older pre-LLM multi-agent works, such as the OpenAI Hide and Seek paper, indicated similar emergent behaviors.

▸ 10 Expand

CONC

EOM Clarifications and Caveats

This paper introduces a new algorithm to optimize agents on verifiable environments, not for financial independence or trading.

DETL

Not Financial Training

The paper is NOT training agents to be financially independent or perform trades or auctions.

DETL

Algorithm Purpose

This is an algorithm to optimize agents on common verifiable environments.

EXMP

Target Environments

Target environments include Math, optimizing accelerator code, deep search, and scientific research.

DETL

Agent Awareness

For the most part, the agents don't even know they are inside this market simulator.

DETL

External Control System

This is an external system that controls how agents evolve and which ones don't.

DETL

Auction Mechanism

Agents bid in the auction to win the right to take a step in one of these target environments.

DETL

Winning Actions

Winning in the auction deficits the amount from their wallet, allowing them to visit the environment and take an action.

DETL

Payment Flow

Future agents taking actions in the same environment pay their bid back to the previous agent (the last winner).

INSG

Policy Development

Over time, the wealthiest agents end up with the best policies to perform in the target environment.

INSG

Credit Assignment Approach

This is a super interesting take on long-horizon credit assignment and evolutionary prompt optimization algorithms.

▸ 1 Expand

CONC

EOM Agent Definition

In EOM, an agent is not a separately trained neural network but essentially a prompted LLM policy.

▸ 7 Expand

SUBC

Agent Components

Each agent is characterized by a prompt, a trigger condition, a frozen bid value, and a wealth variable.

DETL

Prompt/Role Definition

A prompt, which is a system prompt or instruction template, defines the agent's 'role' and procedure.

DETL

Role Adaptability

This role changes depending on the target environment being optimized for.

EXMP

MATH Task Roles

For MATH tasks, roles assigned include planner, executor, and verifier.

EXMP

Accelerator Design Roles

For the accelerator design task, roles include historian, planner, and executor.

DETL

Trigger Condition

A trigger or 'wake-up' condition determines when an agent is eligible to bid in the auction.

DETL

Frozen Bid Value

A frozen bid value is used in auctions, fixed during initialization.

DETL

Wealth Variable

A wealth variable changes over time and drives agent selection and evolution.

▸ 2 Expand

CONC

EOM's Two Coupled Loops

EOM operates through two coupled loops: Planning within an episode and Adaptation across episodes.

SUBC

Loop 1: Planning (within episode)

Within an episode, agents auction for the right to act at each step, and wealth is updated via a bucket-brigade payment rule.

SUBC

Loop 2: Adaptation (across episodes)

Across episodes, the population evolves prompts using exploration/exploitation driven purely by wealth.

▸ 1 Expand

CONC

EOM's Final Deliverable

The goal of EOM is a group of agents, each with its own system prompt and a policy of when to act.

DETL

Problem Solving Process

Given a new problem, agents bid on who will act, perform the action, and repeat the process until the solution is reached.

▸ 6 Expand

CONC

Loop 1: Collect Experiences + Run Auction

This loop defines the within-episode dynamics, including agent bidding, action execution, and wealth transfer.

DETL

Agent Activation

At each environment step, agents run a prompt to determine if they should 'wake up' and participate in the auction.

DETL

Bid Submission

Woken agents automatically submit their frozen bids, which are fixed during initialization.

DETL

Auction Winner

The agent with the highest bid wins the auction, immediately loses the bid amount, and gains control of the environment.

DETL

Action Execution

The winning agent samples an action in the target environment, advancing the clock from s_t to s_t+1.

DETL

Environment Reward

The environment transitions and produces a reward r_t.

▸ 13 Expand

CONC

Wealth Transfer and Credit Assignment

Wealth transfer happens with bucket-brigade credit assignment, involving payments between agents and environment rewards.

DETL

Payment to Previous Winner

The new winner pays its bid to the previous winner.

DETL

Environment Reward Collection

The new winner also collects the environment reward r_t into their wallet.

DETL

First Winner Payment

For the very first winner in an episode, payment goes to the 'house' instead of another agent.

DETL

Loop Repetition

The loop repeats on the updated environment, with agents waking up based on the latest observation.

DETL

Agent Bankruptcy

If an agent goes bankrupt (wealth drops to zero or below), they are thrown out.

DETL

Passive Agent Penalization

If an agent sits on their wallet and declines participation, their wallet degrades over time, leading to bankruptcy.

INSG

Urgency for Participation

The wealth degradation mechanism adds urgency to agent participation in the system.

DETL

Addressing Credit Assignment Problem

This method addresses the credit assignment problem common in environments without intermediate rewards.

JUST

Solution for Credit Assignment

The 'pay your bid to the last auction winner' rule provides a solution for long-horizon credit assignment.

INSG

Backward Flow of Value

The design decision has a key consequence related to the backward flow of value.

JUST

Agent Profit Mechanism

An agent can profit by moving the system into states where downstream agents are willing to 'pay their bid' to take over.

INSG

Decentralized Credit Assignment

This mechanism becomes decentralized credit assignment across the trajectory.

JUST

Reward for Enabling Actions

If an action enables valuable future actions, later agents 'buy' the continuation via bids, rewarding the agent even without direct r_t.

▸ 5 Expand

CONC

Loop 2: Evolve Agents

After episode rollouts finish, the population of agent policies is updated using economic selection and prompt mutation.

DETL

Population Update Mechanism

Low wealth agents are pruned out, and rich agents are mutated for the next round.

JUST

Reasons for Low Wealth

Low wealth agents either did not participate in the auction (too passive) or took actions leading to bad future states.

DETL

Population Replenishment

New agents are added until the population reaches size constraints, using two sources: exploitation and exploration.

▸ 1 Expand

SUBC

Exploitation

Exploitation involves picking wealthy 'parent' agents and mutating their prompts slightly to produce children.

JUST

Exploitation Benefits

This preserves useful behaviors, amplifies successful strategies, and promotes specialization.

▸ 1 Expand

SUBC

Exploration

Exploration replaces bankrupt or weak agents with new variants.

JUST

Exploration Benefits

New variants are created by amending prompts to correct failure modes or explore different behavior regions.

▸ 6 Expand

CONC

Inference and Shipping

EOM trains and ships a society of agents, not a single winner, with market simulation used only during training.

DETL

Shipped Entity

What is 'trained' and then 'shipped' to solve tasks is a society or population of agents.

DETL

Agent Autonomy

Each agent has its own prompts and local 'when to act' logic.

DETL

Evaluation Process

At evaluation time, a thread-local copy of the trained population is used, with the wake-up policy selecting the acting agent.

DETL

Frozen Population

During evaluation, the population is 'frozen' meaning no further training occurs.

DETL

Train-Time Market Simulation

All market simulation antics, such as wallets and wealth transfer, are solely for train-time.

DETL

Inference Bid System

The bid system is still used during inference to determine which agent acts when multiple want to 'wake up'.

▸ 13 Expand

EXMP

Case Study: Accelerator Design

The Accelerator Design task illustrates EOM's 'Economy of Minds' idea, showcasing role-specialized agents and wealth dynamics.

▸ 3 Expand

SUBC

Role-Specialized Agents

Agents are specialized into roles like Historian, Planner, and Executor for the accelerator design task.

DETL

Historian Role

The Historian summarizes previous trials and keeps memory of promising or failed directions.

DETL

Planner Role

The Planner proposes high-level search directions.

DETL

Executor Role

The Executor runs fine-grained local evaluations.

DETL

Environment Reward Metric

The environment reward is about improving EDP (energy-delay product) on GEMMINI ResNet-50 kernels, where lower EDP is better.

DETL

Wealth as Scoreboard

Each role-specialized agent carries wealth, which acts as a live scoreboard of usefulness as episodes progress.

DETL

Wealth Accumulation

Agents that help produce new best records accumulate wealth.

DETL

Agent Penalties

A periodic rent steadily penalizes everyone, causing mediocre agents to slowly die out.

DETL

Agent Removal

Once wealth drops below zero, an agent goes bankrupt and is removed.

DETL

Agent Reproduction

The richest agents spawn mutated 'good-birth' descendants (exploitation).

DETL

Agent Exploration

The weakest agents spawn amended 'bad-birth' descendants (exploration).

INSG

Discovery of Valuable Lineages

Across different kernels, market pressure automatically discovers which specialist lineage is actually valuable.

EXMP

Lineage Examples

Sometimes Historian-style memory collapses due to inherited bias, or Planner lineages reproduce because search direction is the bottleneck.

EXMP

Co-existence of Roles

Sometimes multiple roles co-exist because they are complementary.

INSG

Emergent Coordination and Credit

Coordination and credit assignment emerge from simple incentives like wealth flow, rent, birth, and bankruptcy.

JUST

Adaptive Population Without Central Control

This mechanism produces an adaptive population without requiring a central control system.

▸ 9 Expand

CONC

Emerging Behaviors / 'Aha Moments'

The paper highlights several 'aha moments' or emerging behaviors, revealing how economic rules lead to self-organization.

DETL

Role Seeding

For specific environments like MATH, agents are seeded with roles such as Planners, Executors, and Verifiers during initialization.

JUST

Intuitive Bidding Behavior

Planners likely bid early, while verifiers likely make bids after a draft solution is in place.

INSG

Economic Rules Self-Organization

EOM doesn't hard-code workflows, instead setting up economic rules that lead to self-organizing behaviors resembling learned algorithms.

▸ 3 Expand

SUBC

1) Credit Assignment as Market Signal

Performance improves because the economy selects useful action chains, reproduces them, and deletes agents that don’t contribute.

INSG

Emergent Coordination

Coordination is an emergent property of selection, not an engineered protocol.

INSG

Sharpening Interaction Topology

The system gets better at which sequences of agents act, meaning the interaction topology sharpens over time.

EXMP

Similarity to OpenAI Paper

This behavior is similar to that observed in the OpenAI Hide-and-Seek paper.

▸ 2 Expand

SUBC

2) Non-monotonic Learning Curves

On Finance-Agent-Bench, EOM performance dips early during exploration, later recovering and surpassing initial performance.

JUST

Reason for Early Dip

The early dip is due to exploration testing alternative specialists.

INSG

Market-like Phenomenon

This is a 'market-like' phenomenon, where early turnover and reallocation temporarily hurt headline performance.

▸ 1 Expand

SUBC

3) Wealth Trajectories Show Dominant Lineages

In accelerator design, useful lineages persist, spawn offspring, and dominate auctions, while failed variants go bankrupt.

INSG

Unit of Learning

The unit of learning is not one agent prompt, but an evolving family tree of prompts under wealth selection pressure.

▸ 2 Expand

SUBC

4) Discovery of Reusable Domain Structure

On the hardest accelerator kernels, the society repeatedly converges on a specific tiling/dataflow motif without templates.

JUST

No Template Provided

The system is not given the motif as a template, and reward is only 'EDP record-breaks' without specific labels.

INSG

Learned Design Heuristic

The system learns a reusable design heuristic through selection.

▸ 3 Expand

SUBC

5) Prompts Evolve into Reasoning Routines

In scientific research, prompts evolve into compact multi-step reasoning routines, with executors internalizing roles and adding self-checks.

DETL

Executor Internalization

An EXECUTER internalizes what previously required other roles.

DETL

Self-Check Mutations

Mutations add increasingly explicit self-checks such as principle-first, symmetry checks, feasibility checks, and substitution to falsify.

INSG

Procedural Module Behavior

An agent becomes less of a generic text generator and more like a procedural module that runs a learned scientific derivation routine.

▸ 3 Expand

SUBC

6) Action Discipline: Learning When Not to Spend

In the CloudCast task, the economy selects different workflow shapes based on the workspace state, showing emergent resource-awareness.

EXMP

CloudCast Task Description

CloudCast is an iterative code-optimization task where agents improve a Python program to minimize total data-transfer cost.

CMPR

Workflow Shapes vs Workspace State

The economy selects different workflow shapes depending on whether the workspace is near a high score or uncertain/regressed.

Workspace State	Workflow Shape
Near a high score	short 'read-edit-evaluate-commit'
Uncertain/Regressed	longer 'edit-build-evaluate' loops

INSG

Emergent Resource Awareness

This is an emergent resource-awareness behavior, demonstrating a society-level policy of cautious versus aggressive action.