{"sourceUrl":"https://x.com/neural_avb/status/2062930609861976454","sourceType":"url","contentType":"Explainer","apex":{"id":"n1","type":"APEX","label":"Economy of Minds (EOM) Explained","text":"Economy of Minds (EOM) is a decentralized multi-agent system that uses market-like mechanisms for agents to coordinate, improve, and achieve emergent multi-step reasoning and strong performance on various tasks.","children":[{"id":"n2","type":"CONC","label":"EOM System Overview","text":"EOM is a new multi-agent system from Harvard that leverages auctions, payments, and wealth accumulation for decentralized coordination.","parentId":"n1","children":[{"id":"n3","type":"DETL","label":"System Design","text":"Agents coordinate and improve over time using market-like mechanisms like auctions, payments, and wealth accumulation.","parentId":"n2","children":[]},{"id":"n4","type":"STAT","label":"Reported Performance","text":"Such an environment has led to emergent multi-step reasoning and strong performance on several agentic tasks.","parentId":"n2","children":[]}]},{"id":"n5","type":"INSG","label":"Relevance for Multi-Agent System Builders","text":"EOM is for developers building multi-agent systems to accomplish specific tasks, addressing limitations of hand-designed orchestration.","parentId":"n1","children":[{"id":"n6","type":"JUST","label":"Current Limitations","text":"Most multi-agent stacks rely on hand-designed orchestration, where developers manually define explicit prompts and state machine graphs.","parentId":"n5","children":[]},{"id":"n7","type":"DETL","label":"Task Requirements","text":"Long tasks require different role switches according to the state and progress of the task.","parentId":"n5","children":[]},{"id":"n8","type":"DETL","label":"Optimal Design Goal","text":"Systems should optimally switch system prompts for continuous task progress.","parentId":"n5","children":[]}]},{"id":"n9","type":"CONC","label":"EOM's Core Goal","text":"Given a task, EOM aims to generate an optimized population of multi-agents, each with specific instructions on how and when to act.","parentId":"n1","children":[{"id":"n10","type":"DETL","label":"Mechanism for Optimization","text":"EOM simulates a market system that externally controls how agents evolve.","parentId":"n9","children":[]},{"id":"n11","type":"DETL","label":"Optimization Result","text":"The end result is a group of specialized agents and an intelligent routing mechanism to select how they solve a task.","parentId":"n9","children":[]}]},{"id":"n12","type":"INSG","label":"Emergent Complex Behaviors","text":"Complex behaviors emerge automatically when simple agents optimize their actions around uncertainties posed by other agents.","parentId":"n1","children":[{"id":"n13","type":"DETL","label":"Theory Origin","text":"This theory of behaviors organically emerging from multi-agent scenarios is not a new concept.","parentId":"n12","children":[]},{"id":"n14","type":"EXMP","label":"Prior Work Example","text":"Older pre-LLM multi-agent works, such as the OpenAI Hide and Seek paper, indicated similar emergent behaviors.","parentId":"n12","children":[]}]},{"id":"n15","type":"CONC","label":"EOM Clarifications and Caveats","text":"This paper introduces a new algorithm to optimize agents on verifiable environments, not for financial independence or trading.","parentId":"n1","children":[{"id":"n16","type":"DETL","label":"Not Financial Training","text":"The paper is NOT training agents to be financially independent or perform trades or auctions.","parentId":"n15","children":[]},{"id":"n17","type":"DETL","label":"Algorithm Purpose","text":"This is an algorithm to optimize agents on common verifiable environments.","parentId":"n15","children":[]},{"id":"n18","type":"EXMP","label":"Target Environments","text":"Target environments include Math, optimizing accelerator code, deep search, and scientific research.","parentId":"n15","children":[]},{"id":"n19","type":"DETL","label":"Agent Awareness","text":"For the most part, the agents don't even know they are inside this market simulator.","parentId":"n15","children":[]},{"id":"n20","type":"DETL","label":"External Control System","text":"This is an external system that controls how agents evolve and which ones don't.","parentId":"n15","children":[]},{"id":"n21","type":"DETL","label":"Auction Mechanism","text":"Agents bid in the auction to win the right to take a step in one of these target environments.","parentId":"n15","children":[]},{"id":"n22","type":"DETL","label":"Winning Actions","text":"Winning in the auction deficits the amount from their wallet, allowing them to visit the environment and take an action.","parentId":"n15","children":[]},{"id":"n23","type":"DETL","label":"Payment Flow","text":"Future agents taking actions in the same environment pay their bid back to the previous agent (the last winner).","parentId":"n15","children":[]},{"id":"n24","type":"INSG","label":"Policy Development","text":"Over time, the wealthiest agents end up with the best policies to perform in the target environment.","parentId":"n15","children":[]},{"id":"n25","type":"INSG","label":"Credit Assignment Approach","text":"This is a super interesting take on long-horizon credit assignment and evolutionary prompt optimization algorithms.","parentId":"n15","children":[]}]},{"id":"n26","type":"CONC","label":"EOM Agent Definition","text":"In EOM, an agent is not a separately trained neural network but essentially a prompted LLM policy.","parentId":"n1","children":[{"id":"n27","type":"SUBC","label":"Agent Components","text":"Each agent is characterized by a prompt, a trigger condition, a frozen bid value, and a wealth variable.","parentId":"n26","children":[{"id":"n28","type":"DETL","label":"Prompt/Role Definition","text":"A prompt, which is a system prompt or instruction template, defines the agent's 'role' and procedure.","parentId":"n27","children":[]},{"id":"n29","type":"DETL","label":"Role Adaptability","text":"This role changes depending on the target environment being optimized for.","parentId":"n27","children":[]},{"id":"n30","type":"EXMP","label":"MATH Task Roles","text":"For MATH tasks, roles assigned include planner, executor, and verifier.","parentId":"n27","children":[]},{"id":"n31","type":"EXMP","label":"Accelerator Design Roles","text":"For the accelerator design task, roles include historian, planner, and executor.","parentId":"n27","children":[]},{"id":"n32","type":"DETL","label":"Trigger Condition","text":"A trigger or 'wake-up' condition determines when an agent is eligible to bid in the auction.","parentId":"n27","children":[]},{"id":"n33","type":"DETL","label":"Frozen Bid Value","text":"A frozen bid value is used in auctions, fixed during initialization.","parentId":"n27","children":[]},{"id":"n34","type":"DETL","label":"Wealth Variable","text":"A wealth variable changes over time and drives agent selection and evolution.","parentId":"n27","children":[]}]}]},{"id":"n35","type":"CONC","label":"EOM's Two Coupled Loops","text":"EOM operates through two coupled loops: Planning within an episode and Adaptation across episodes.","parentId":"n1","children":[{"id":"n36","type":"SUBC","label":"Loop 1: Planning (within episode)","text":"Within an episode, agents auction for the right to act at each step, and wealth is updated via a bucket-brigade payment rule.","parentId":"n35","children":[]},{"id":"n37","type":"SUBC","label":"Loop 2: Adaptation (across episodes)","text":"Across episodes, the population evolves prompts using exploration/exploitation driven purely by wealth.","parentId":"n35","children":[]}]},{"id":"n38","type":"CONC","label":"EOM's Final Deliverable","text":"The goal of EOM is a group of agents, each with its own system prompt and a policy of when to act.","parentId":"n1","children":[{"id":"n39","type":"DETL","label":"Problem Solving Process","text":"Given a new problem, agents bid on who will act, perform the action, and repeat the process until the solution is reached.","parentId":"n38","children":[]}]},{"id":"n40","type":"CONC","label":"Loop 1: Collect Experiences + Run Auction","text":"This loop defines the within-episode dynamics, including agent bidding, action execution, and wealth transfer.","parentId":"n1","children":[{"id":"n41","type":"DETL","label":"Agent Activation","text":"At each environment step, agents run a prompt to determine if they should 'wake up' and participate in the auction.","parentId":"n40","children":[]},{"id":"n42","type":"DETL","label":"Bid Submission","text":"Woken agents automatically submit their frozen bids, which are fixed during initialization.","parentId":"n40","children":[]},{"id":"n43","type":"DETL","label":"Auction Winner","text":"The agent with the highest bid wins the auction, immediately loses the bid amount, and gains control of the environment.","parentId":"n40","children":[]},{"id":"n44","type":"DETL","label":"Action Execution","text":"The winning agent samples an action in the target environment, advancing the clock from s_t to s_t+1.","parentId":"n40","children":[]},{"id":"n45","type":"DETL","label":"Environment Reward","text":"The environment transitions and produces a reward r_t.","parentId":"n40","children":[]},{"id":"n46","type":"CONC","label":"Wealth Transfer and Credit Assignment","text":"Wealth transfer happens with bucket-brigade credit assignment, involving payments between agents and environment rewards.","parentId":"n40","children":[{"id":"n47","type":"DETL","label":"Payment to Previous Winner","text":"The new winner pays its bid to the previous winner.","parentId":"n46","children":[]},{"id":"n48","type":"DETL","label":"Environment Reward Collection","text":"The new winner also collects the environment reward r_t into their wallet.","parentId":"n46","children":[]},{"id":"n49","type":"DETL","label":"First Winner Payment","text":"For the very first winner in an episode, payment goes to the 'house' instead of another agent.","parentId":"n46","children":[]},{"id":"n50","type":"DETL","label":"Loop Repetition","text":"The loop repeats on the updated environment, with agents waking up based on the latest observation.","parentId":"n46","children":[]},{"id":"n51","type":"DETL","label":"Agent Bankruptcy","text":"If an agent goes bankrupt (wealth drops to zero or below), they are thrown out.","parentId":"n46","children":[]},{"id":"n52","type":"DETL","label":"Passive Agent Penalization","text":"If an agent sits on their wallet and declines participation, their wallet degrades over time, leading to bankruptcy.","parentId":"n46","children":[]},{"id":"n53","type":"INSG","label":"Urgency for Participation","text":"The wealth degradation mechanism adds urgency to agent participation in the system.","parentId":"n46","children":[]},{"id":"n54","type":"DETL","label":"Addressing Credit Assignment Problem","text":"This method addresses the credit assignment problem common in environments without intermediate rewards.","parentId":"n46","children":[]},{"id":"n55","type":"JUST","label":"Solution for Credit Assignment","text":"The 'pay your bid to the last auction winner' rule provides a solution for long-horizon credit assignment.","parentId":"n46","children":[]},{"id":"n56","type":"INSG","label":"Backward Flow of Value","text":"The design decision has a key consequence related to the backward flow of value.","parentId":"n46","children":[]},{"id":"n57","type":"JUST","label":"Agent Profit Mechanism","text":"An agent can profit by moving the system into states where downstream agents are willing to 'pay their bid' to take over.","parentId":"n46","children":[]},{"id":"n58","type":"INSG","label":"Decentralized Credit Assignment","text":"This mechanism becomes decentralized credit assignment across the trajectory.","parentId":"n46","children":[]},{"id":"n59","type":"JUST","label":"Reward for Enabling Actions","text":"If an action enables valuable future actions, later agents 'buy' the continuation via bids, rewarding the agent even without direct r_t.","parentId":"n46","children":[]}]}]},{"id":"n60","type":"CONC","label":"Loop 2: Evolve Agents","text":"After episode rollouts finish, the population of agent policies is updated using economic selection and prompt mutation.","parentId":"n1","children":[{"id":"n61","type":"DETL","label":"Population Update Mechanism","text":"Low wealth agents are pruned out, and rich agents are mutated for the next round.","parentId":"n60","children":[]},{"id":"n62","type":"JUST","label":"Reasons for Low Wealth","text":"Low wealth agents either did not participate in the auction (too passive) or took actions leading to bad future states.","parentId":"n60","children":[]},{"id":"n63","type":"DETL","label":"Population Replenishment","text":"New agents are added until the population reaches size constraints, using two sources: exploitation and exploration.","parentId":"n60","children":[]},{"id":"n64","type":"SUBC","label":"Exploitation","text":"Exploitation involves picking wealthy 'parent' agents and mutating their prompts slightly to produce children.","parentId":"n60","children":[{"id":"n65","type":"JUST","label":"Exploitation Benefits","text":"This preserves useful behaviors, amplifies successful strategies, and promotes specialization.","parentId":"n64","children":[]}]},{"id":"n66","type":"SUBC","label":"Exploration","text":"Exploration replaces bankrupt or weak agents with new variants.","parentId":"n60","children":[{"id":"n67","type":"JUST","label":"Exploration Benefits","text":"New variants are created by amending prompts to correct failure modes or explore different behavior regions.","parentId":"n66","children":[]}]}]},{"id":"n68","type":"CONC","label":"Inference and Shipping","text":"EOM trains and ships a society of agents, not a single winner, with market simulation used only during training.","parentId":"n1","children":[{"id":"n69","type":"DETL","label":"Shipped Entity","text":"What is 'trained' and then 'shipped' to solve tasks is a society or population of agents.","parentId":"n68","children":[]},{"id":"n70","type":"DETL","label":"Agent Autonomy","text":"Each agent has its own prompts and local 'when to act' logic.","parentId":"n68","children":[]},{"id":"n71","type":"DETL","label":"Evaluation Process","text":"At evaluation time, a thread-local copy of the trained population is used, with the wake-up policy selecting the acting agent.","parentId":"n68","children":[]},{"id":"n72","type":"DETL","label":"Frozen Population","text":"During evaluation, the population is 'frozen' meaning no further training occurs.","parentId":"n68","children":[]},{"id":"n73","type":"DETL","label":"Train-Time Market Simulation","text":"All market simulation antics, such as wallets and wealth transfer, are solely for train-time.","parentId":"n68","children":[]},{"id":"n74","type":"DETL","label":"Inference Bid System","text":"The bid system is still used during inference to determine which agent acts when multiple want to 'wake up'.","parentId":"n68","children":[]}]},{"id":"n75","type":"EXMP","label":"Case Study: Accelerator Design","text":"The Accelerator Design task illustrates EOM's 'Economy of Minds' idea, showcasing role-specialized agents and wealth dynamics.","parentId":"n1","children":[{"id":"n76","type":"SUBC","label":"Role-Specialized Agents","text":"Agents are specialized into roles like Historian, Planner, and Executor for the accelerator design task.","parentId":"n75","children":[{"id":"n77","type":"DETL","label":"Historian Role","text":"The Historian summarizes previous trials and keeps memory of promising or failed directions.","parentId":"n76","children":[]},{"id":"n78","type":"DETL","label":"Planner Role","text":"The Planner proposes high-level search directions.","parentId":"n76","children":[]},{"id":"n79","type":"DETL","label":"Executor Role","text":"The Executor runs fine-grained local evaluations.","parentId":"n76","children":[]}]},{"id":"n80","type":"DETL","label":"Environment Reward Metric","text":"The environment reward is about improving EDP (energy-delay product) on GEMMINI ResNet-50 kernels, where lower EDP is better.","parentId":"n75","children":[]},{"id":"n81","type":"DETL","label":"Wealth as Scoreboard","text":"Each role-specialized agent carries wealth, which acts as a live scoreboard of usefulness as episodes progress.","parentId":"n75","children":[]},{"id":"n82","type":"DETL","label":"Wealth Accumulation","text":"Agents that help produce new best records accumulate wealth.","parentId":"n75","children":[]},{"id":"n83","type":"DETL","label":"Agent Penalties","text":"A periodic rent steadily penalizes everyone, causing mediocre agents to slowly die out.","parentId":"n75","children":[]},{"id":"n84","type":"DETL","label":"Agent Removal","text":"Once wealth drops below zero, an agent goes bankrupt and is removed.","parentId":"n75","children":[]},{"id":"n85","type":"DETL","label":"Agent Reproduction","text":"The richest agents spawn mutated 'good-birth' descendants (exploitation).","parentId":"n75","children":[]},{"id":"n86","type":"DETL","label":"Agent Exploration","text":"The weakest agents spawn amended 'bad-birth' descendants (exploration).","parentId":"n75","children":[]},{"id":"n87","type":"INSG","label":"Discovery of Valuable Lineages","text":"Across different kernels, market pressure automatically discovers which specialist lineage is actually valuable.","parentId":"n75","children":[]},{"id":"n88","type":"EXMP","label":"Lineage Examples","text":"Sometimes Historian-style memory collapses due to inherited bias, or Planner lineages reproduce because search direction is the bottleneck.","parentId":"n75","children":[]},{"id":"n89","type":"EXMP","label":"Co-existence of Roles","text":"Sometimes multiple roles co-exist because they are complementary.","parentId":"n75","children":[]},{"id":"n90","type":"INSG","label":"Emergent Coordination and Credit","text":"Coordination and credit assignment emerge from simple incentives like wealth flow, rent, birth, and bankruptcy.","parentId":"n75","children":[]},{"id":"n91","type":"JUST","label":"Adaptive Population Without Central Control","text":"This mechanism produces an adaptive population without requiring a central control system.","parentId":"n75","children":[]}]},{"id":"n92","type":"CONC","label":"Emerging Behaviors / 'Aha Moments'","text":"The paper highlights several 'aha moments' or emerging behaviors, revealing how economic rules lead to self-organization.","parentId":"n1","children":[{"id":"n93","type":"DETL","label":"Role Seeding","text":"For specific environments like MATH, agents are seeded with roles such as Planners, Executors, and Verifiers during initialization.","parentId":"n92","children":[]},{"id":"n94","type":"JUST","label":"Intuitive Bidding Behavior","text":"Planners likely bid early, while verifiers likely make bids after a draft solution is in place.","parentId":"n92","children":[]},{"id":"n95","type":"INSG","label":"Economic Rules Self-Organization","text":"EOM doesn't hard-code workflows, instead setting up economic rules that lead to self-organizing behaviors resembling learned algorithms.","parentId":"n92","children":[]},{"id":"n96","type":"SUBC","label":"1) Credit Assignment as Market Signal","text":"Performance improves because the economy selects useful action chains, reproduces them, and deletes agents that don’t contribute.","parentId":"n92","children":[{"id":"n97","type":"INSG","label":"Emergent Coordination","text":"Coordination is an emergent property of selection, not an engineered protocol.","parentId":"n96","children":[]},{"id":"n98","type":"INSG","label":"Sharpening Interaction Topology","text":"The system gets better at which sequences of agents act, meaning the interaction topology sharpens over time.","parentId":"n96","children":[]},{"id":"n99","type":"EXMP","label":"Similarity to OpenAI Paper","text":"This behavior is similar to that observed in the OpenAI Hide-and-Seek paper.","parentId":"n96","children":[]}]},{"id":"n100","type":"SUBC","label":"2) Non-monotonic Learning Curves","text":"On Finance-Agent-Bench, EOM performance dips early during exploration, later recovering and surpassing initial performance.","parentId":"n92","children":[{"id":"n101","type":"JUST","label":"Reason for Early Dip","text":"The early dip is due to exploration testing alternative specialists.","parentId":"n100","children":[]},{"id":"n102","type":"INSG","label":"Market-like Phenomenon","text":"This is a 'market-like' phenomenon, where early turnover and reallocation temporarily hurt headline performance.","parentId":"n100","children":[]}]},{"id":"n103","type":"SUBC","label":"3) Wealth Trajectories Show Dominant Lineages","text":"In accelerator design, useful lineages persist, spawn offspring, and dominate auctions, while failed variants go bankrupt.","parentId":"n92","children":[{"id":"n104","type":"INSG","label":"Unit of Learning","text":"The unit of learning is not one agent prompt, but an evolving family tree of prompts under wealth selection pressure.","parentId":"n103","children":[]}]},{"id":"n105","type":"SUBC","label":"4) Discovery of Reusable Domain Structure","text":"On the hardest accelerator kernels, the society repeatedly converges on a specific tiling/dataflow motif without templates.","parentId":"n92","children":[{"id":"n106","type":"JUST","label":"No Template Provided","text":"The system is not given the motif as a template, and reward is only 'EDP record-breaks' without specific labels.","parentId":"n105","children":[]},{"id":"n107","type":"INSG","label":"Learned Design Heuristic","text":"The system learns a reusable design heuristic through selection.","parentId":"n105","children":[]}]},{"id":"n108","type":"SUBC","label":"5) Prompts Evolve into Reasoning Routines","text":"In scientific research, prompts evolve into compact multi-step reasoning routines, with executors internalizing roles and adding self-checks.","parentId":"n92","children":[{"id":"n109","type":"DETL","label":"Executor Internalization","text":"An EXECUTER internalizes what previously required other roles.","parentId":"n108","children":[]},{"id":"n110","type":"DETL","label":"Self-Check Mutations","text":"Mutations add increasingly explicit self-checks such as principle-first, symmetry checks, feasibility checks, and substitution to falsify.","parentId":"n108","children":[]},{"id":"n111","type":"INSG","label":"Procedural Module Behavior","text":"An agent becomes less of a generic text generator and more like a procedural module that runs a learned scientific derivation routine.","parentId":"n108","children":[]}]},{"id":"n112","type":"SUBC","label":"6) Action Discipline: Learning When Not to Spend","text":"In the CloudCast task, the economy selects different workflow shapes based on the workspace state, showing emergent resource-awareness.","parentId":"n92","children":[{"id":"n113","type":"EXMP","label":"CloudCast Task Description","text":"CloudCast is an iterative code-optimization task where agents improve a Python program to minimize total data-transfer cost.","parentId":"n112","children":[]},{"id":"n114","type":"CMPR","label":"Workflow Shapes vs Workspace State","text":"The economy selects different workflow shapes depending on whether the workspace is near a high score or uncertain/regressed.","table":{"cols":["Workspace State","Workflow Shape"],"rows":[{"label":"Near a high score","cells":["short 'read-edit-evaluate-commit'"]},{"label":"Uncertain/Regressed","cells":["longer 'edit-build-evaluate' loops"]}]},"parentId":"n112","children":[]},{"id":"n115","type":"INSG","label":"Emergent Resource Awareness","text":"This is an emergent resource-awareness behavior, demonstrating a society-level policy of cautious versus aggressive action.","parentId":"n112","children":[]}]}]}]},"slug":"avb-on-x-economy-of-minds-multiagent-pro-c4cd7e","sharedAt":{"_seconds":1780805168,"_nanoseconds":992000000},"title":"AVB on X: \"https://t.co/kEVNDAXkOC\" / X"}