Antonio Gullí is an Engineering Director at Google. He wrote a 453-page book that breaks down AI Agent development into 21 design patterns. But this is not a book review. My motivation for reading this book was very specific: I’ve written about Harness Engineering, the pitfalls of Clawdbot, and the seven turning points from burning Token to actually being useful in “AI Agents Are Not Magic.” After each time I finished writing, there was always a question I hadn’t fully thought through: Is there a set of reusable underlying logic behind these things? This book gave me the answer, and it’s deeper than I thought.

You may not be writing an Agent at all. The harshest judgment in the book is hidden in the prologue. Most people are using “AI” that is only Level 0: bare LLM, without tools, without memory, and unable to act. You ask it what the best picture of the 2025 Oscars is, and it guesses. The book says it very bluntly: Level 0 stuff is not an Agent. Going up is the real Agent:

Level 1: Tool User. The Agent starts using tools: search, API, databases. But it’s not just “able to call interfaces,” it’s more about judging for itself when to call, what to call, and how to use the results. The book gives a very specific example: a user asks “What are some new dramas recently?” The Agent itself realizes that this information is not in the training data, and actively uses the search tool to find it, and then synthesizes the results. The key step is “realizing it itself.” It’s not that humans tell it “go search,” but that it judges that it needs to search. This judgment ability is the threshold of Level 1.

Level 2: Strategic Thinker. There are two more things: planning and Context Engineering. The book defines Context Engineering: it’s not about piling up information, but about carefully selecting, trimming, and packaging the context. The example is wonderful: the user wants to find a coffee shop between two locations. The Agent first calls the map tool to get a bunch of data, and then judges for itself that “the next step only needs street names,” trims the map output into a short list, and then feeds it to the local search tool. Every step is doing information noise reduction. There is a sentence in the book that I read several times: “To get AI to the highest accuracy, you must give it short, focused, and powerful context.” Context Engineering is doing this. At this level, the Agent can also self-reflect. After finishing the work, it reviews it itself, and if it finds a problem, it changes it itself.

Level 3: Multi-Agent Collaboration. The book’s position is very clear: don’t always think about building an all-powerful super agent. The truly reliable approach is to build a team, such as a project manager Agent + researcher Agent + designer Agent + copywriter Agent. The example in the book is a new product release: a “project manager Agent” does the overall scheduling and issues tasks to the “market research Agent,” “product design Agent,” and “marketing Agent.” The key is communication: how Agents transmit data, how they synchronize states, and how they handle conflicts. This chapter draws six communication topologies, from the simplest single Agent to the most flexible custom hybrid, and explains what scenarios each is suitable for.

After reading these four levels, I suddenly understood why many people say “my Agent is not easy to use.” The model is not the problem, the problem is that you are using it as a chatbot, and it may not even be at Level 1.

Context Engineering: The most underestimated concept in the book. I wrote a Harness Engineering article about how the design of the track is more important than the horsepower of the engine. After reading this book, I realized that Context Engineering is the mapping of Harness Engineering at the prompt level. Traditional Prompt Engineering only cares about “how you ask.” The Context Engineering in the book cares about “what is in front of the Agent before asking.” It includes four layers of information: The first layer, system prompt. Define who the Agent is, what tone, and what boundaries. Most people only write this layer. The second layer, external data. RAG-retrieved documents, tool call return values, real-time API data. This is where most people get stuck: they know they need to feed data, but they don’t know how to feed it without overwhelming the model. The third layer, implicit data. User identity, interaction history, environmental status. Things you didn’t say explicitly but the Agent should know. For example, if you tell the Agent “Help me send an email to John to confirm tomorrow’s meeting,” it should know what tomorrow’s meeting is in your calendar and what your relationship with John is. The fourth layer, feedback loop. After each output of the Agent, automatically evaluate the quality and adjust the context strategy for the next time. The book calls this “automated context optimization,” and Google’s Vertex AI Prompt Optimizer is the engineering implementation of this idea. When I read this, I remembered the article I wrote before, “AI Agents Are Not Magic,” in which one of the experiences was “Your agent needs rules, and a lot of rules.” Looking back now, those rules are essentially the manual version of Context Engineering, and the book systematizes it.

Reflection: Two Agents are really better than one. This is the Pattern with the most practical value to me in the whole book. The core of Reflection is very simple: after the Agent finishes its work, it reviews it itself, and if it finds a problem, it changes it itself. But the implementation method is exquisite. The book clearly states that the Producer and Critic must use two different Agents and give different system prompts. The same persona reviewing its own things will definitely have blind spots. If you let the same LLM write code and then review the code it wrote, it will most likely say “it’s pretty good.” The book gives a complete code example. The Producer’s prompt is “You are a Python developer, write a function to calculate the factorial, and handle boundary conditions and exceptions.” The Critic’s prompt is “You are a picky senior engineer, review the code line by line, check for bugs, style, missing boundary conditions, and areas for improvement. If it’s perfect, output CODE_IS_PERFECT, otherwise list all the problems.” Then there is a for loop: Producer writes code → Critic reviews → Producer revises according to the comments → Critic reviews again → until Critic says CODE_IS_PERFECT or the maximum number of iterations is reached. It’s that simple. But the book reminds us of a cost issue that is easily overlooked: each reflection cycle is a new LLM call, and the more iterations, the more expensive it is. And as the dialogue history expands, the context window is filled with previous versions and criticisms, and the actual available reasoning space is shrinking. Therefore, the best practice for Reflection is: set a reasonable maximum number of iterations (the book uses 3), and stop once the Critic is satisfied, don’t pursue perfection. The uses are far more than just writing code. Writing articles, making plans, summarizing documents, and solving logic problems can all be done with the Producer-Critic model. The book lists seven application scenarios, and the core logic is the same: produce first, then review, and then revise.

Multi-Agent is not the more complex the better. What I like most about the Multi-Agent Collaboration chapter is the six communication topology diagrams. Many people start with complex ones, but in fact, three are enough for most scenarios: Single Agent (independent execution): The task can be broken down into independent sub-problems, and each Agent can handle its own. Simple and easy to maintain. Peer-to-Peer: Agents communicate directly with each other, without a central control node. Decentralized, good fault tolerance, one Agent hanging up does not affect the overall situation. But the coordination cost is high and it is easy to be chaotic. Supervisor (central scheduling): A Supervisor Agent manages a group of Worker Agents. Assign tasks, collect results, and resolve conflicts. Clear hierarchy and easy to manage. But the Supervisor is a single point of failure and also a performance bottleneck. The other three (Supervisor-as-Tool, hierarchical, custom hybrid) are variants and combinations of the first three. The book says it very realistically: the topology you need depends on the complexity of your task. The more fragmented the task is, the higher the communication cost. To a certain extent, the Supervisor model is more efficient than the hierarchical model. My experience is that many people spend 80% of their time on communication protocols when building Multi-Agent, forgetting to ask a more basic question: Does this task really need multiple Agents? The book writes very clearly that Level 2 single Agent + Reflection is often enough. Level 3 is prepared for scenarios where a single Agent really can’t handle it.

Memory’s three-layer model, I vaguely felt it before but didn’t name it. I resonate most with the Memory chapter, because when I wrote those two articles about Obsidian + Claude, I was always thinking about a question: How should the Agent’s memory be layered? The book gives the answer: Session: The context window of the current conversation, this is the shortest memory, and it disappears when the conversation ends. Long context models just enlarge this window, but it is essentially temporary, and it is expensive and slow to process the entire window every time you reason. State: Temporary data during the current task. For example, “what task is being done,” “what step has been completed,” and “what data has been generated in the middle.” Longer than Session, but cleared when the task is over. The book uses Google ADK’s State mechanism to give a complete example. Memory: Long-term memory across sessions and tasks. User preferences, learned experiences, important historical decisions, stored in a database or vector database, semantic retrieval. The book emphasizes a very important point: Memory is not just about storing it, but also about designing a complete set of strategies for “what to store, when to store, and how to retrieve.” Storing too much will cause noise, and storing too little will not be enough. In the Clawdbot article I wrote before, I mentioned “state files” and “workspace documents,” which are essentially hand-crafted State layer and Memory layer, and the book frames this.

Five assumptions, the fifth is the most outrageous. The book mentions five assumptions about the future of Agent at the end. The first four are still within the scope of reasonable deduction: general-purpose Agents from writing code to managing projects, deep personalization to actively discover your needs, embodied intelligence out of the screen into the physical world, and Agents becoming independent economic entities. The fifth one shocked me: Deformed Multi-Agent. You only declare the goal, such as “create an e-commerce business that sells boutique coffee.” The system automatically decides: first create a “market research Agent” and a “brand Agent.” After running a round of data, it judges that the brand Agent is no longer needed and splits it into three new ones: “Logo Design Agent,” “Website Building Agent,” and “Supply Chain Agent.” If the Website Building Agent becomes a bottleneck, the system will automatically replicate three parallel Agents to do different pages at the same time. Throughout the process, the system continuously automatically optimizes the prompt of each Agent and constantly reorganizes the team structure. The book calls this “goal-driven, self-deforming multi-agent system.” It is not executing the plan you wrote, but generating its own plan, adjusting its own plan, and reorganizing its own execution team. This reminds me of Karpathy’s AutoResearch: write a program.md, define goals, metrics, boundaries, and press “start.” Humans are outside the loop. But this book pushes it further: even how the Agent team is formed and how it is reorganized is left to the system to decide. Humans only declare “what they want.”

Three things you can do right away. After reading this book, I have three actions that I can implement immediately: First, add a Critic to your current Agent. Whether you are using Claude Code, CrewAI, or your own framework, add a step to the end of your existing workflow: let another Agent (using a different system prompt) review the output of the previous step. Add code review to code generation, fact-checking to article writing, and feasibility review to plan formulation. One more LLM call, but the quality improvement is often doubled. The Producer-Critic model in the book is plug-and-play. Second, start doing Context Engineering, not just Prompt Engineering. Look back at the instruction file you wrote for the Agent. If it’s all rules about “how you should do it” and lacks the context of “what environment you are facing now,” add it. Tell the Agent what project it is in, what decisions it has made before, and what the user’s preferences are. The Context Engineering chapter in the book and your AGENTS.md are two expressions of the same thing. Third, don’t rush to use Multi-Agent. Get your single Agent to Level 2: have tools, have Reflection, have Memory. The book repeatedly emphasizes that Level 2 single Agent plus Producer-Critic and Context Engineering can cover the vast majority of practical scenarios. Level 3 is prepared for tasks that are truly cross-domain, multi-stage, and require parallel division of labor. The problem for most people is not that there are not enough Agents, but that they have not tuned a single Agent well.

This book is 453 pages long and will be published by Springer in 2025. The code examples cover LangChain/LangGraph, Google ADK, CrewAI, and OpenAI API. The foreword was written by the Google Cloud AI VP, and there is also a recommendation preface by the Goldman Sachs CIO, which is surprisingly good. But my reason for recommending it is not “comprehensive.” It’s that you will realize one thing after reading it: the pitfalls you’ve encountered in Agent over the past six months have all been organized into patterns. You don’t need to invent Reflection again, you don’t need to guess how to layer Memory again, and you don’t need to try which communication topology Multi-Agent should use again. Someone has drawn the map for you, and all that’s left is to walk. Are you developing with AI Agent? What level is your current Agent at?

[Yanhua]

RichSilo Exclusive Analysis:

AI Agent Design Patterns: Implications for the Crypto Market

The publication of Antonio Gullí’s “Agentic Design Patterns” represents a significant milestone in understanding AI agent architecture, with profound implications for the crypto market. As Google’s Engineering Director outlines a four-level hierarchy of agent capabilities – from basic tool users to sophisticated multi-agent collaboration systems – crypto projects leveraging AI must reassess their technical approaches to remain competitive.

Market Impact: Beyond Hype to Structured Implementation

The crypto market has been saturated with AI-driven projects claiming revolutionary capabilities, yet most remain at Level 0 (bare LLMs without tools or memory). This book provides a framework for distinguishing genuine agent innovation from superficial AI integration. For crypto investors, this creates a valuable lens for evaluating projects:

Level 1+ Agents: Projects implementing tool-using capabilities (Level 1) or strategic thinking with context engineering (Level 2) will demonstrate clear advantages in DeFi, trading algorithms, and governance. These agents can autonomously recognize when to search blockchain data, use oracles, or adjust strategies based on market conditions.
Multi-Agent Systems (Level 3): Truly sophisticated crypto projects will implement the recommended communication topologies – particularly the supervisor model for coordinating specialized agents in areas like cross-chain arbitrage, risk management, and compliance monitoring.

Token Price Implications: Differentiating Value Creation

The book’s emphasis on structured design patterns directly impacts token valuation mechanics:

Infrastructure Tokens: Projects providing compute resources for advanced AI agents (particularly those supporting reflection cycles and multi-agent coordination) will see increased demand. The book’s warning about the high cost of multiple LLM calls translates to sustained token burn mechanisms for AI infrastructure providers.
Protocol Tokens: DeFi protocols implementing Level 2 agents with context engineering and reflection capabilities can demonstrate superior risk-adjusted returns, potentially creating upward pressure on their governance tokens as institutional capital flows toward demonstrably superior strategies.
Application Tokens: NFT platforms and gaming projects utilizing producer-critic models for content generation will likely experience enhanced user engagement and secondary market activity, directly benefiting application-layer tokens.

Key Investment Opportunities

Based on the design patterns outlined in the book, three specific categories of crypto projects present compelling investment cases:

Context Engineering-First Oracles: Projects that implement the four-layer context model (system prompt, external data, implicit data, feedback loop) will provide higher-quality on-chain data for AI agents. These oracles can charge premium fees for their superior information filtering capabilities.
Agent-Centric DAOs: Implementing the supervisor communication topology for coordinating specialized agents in governance, treasury management, and development creates more efficient and valuable decentralized organizations. Projects like this could see significant treasury value appreciation.
Reflection-Enabled DeFi Protocols: Platforms incorporating the producer-critic model for automated market making and risk assessment can demonstrate superior performance metrics, attracting liquidity and yield-seeking investors.

Significant Risks to Monitor

The book’s insights also highlight critical risks for crypto AI projects:

Over-Engineering Premium: Many projects will attempt complex multi-agent systems when Level 2 single agents with reflection capabilities would suffice. This creates a “complexity tax” that may not justify the additional costs, particularly in gas-constrained environments.
Centralization Risk: Advanced AI agents, particularly supervisor models, create single points of failure that contradict blockchain’s decentralized ethos. Projects must carefully balance sophistication with decentralization.
Cost Inflation: The book explicitly warns about the exponential cost increases with multiple reflection cycles. Crypto projects must implement innovative tokenomics models to support these computational demands without making their services prohibitively expensive.
Security Blind Spots: The producer-critic model requires distinct agents to avoid confirmation bias. Crypto projects failing to implement this properly may create exploitable vulnerabilities in their AI systems.

Strategic Recommendations for Investors

Prioritize Level 2 Implementations: Focus on projects that have mastered context engineering and reflection before considering multi-agent systems. The book suggests these simpler implementations cover most practical scenarios.
Evaluate Context Architecture: Scrutinize how projects structure their four-layer context model. The most successful will have explicit strategies for each layer, particularly implicit data handling (user history, environmental status) which is often overlooked.
Assess Agent Communication Topologies: For multi-agent projects, verify they’re using appropriate communication patterns. The supervisor model is generally recommended over more complex hierarchies for most crypto use cases.
Monitor Tokenomics for AI Operations: Ensure projects have sustainable token models that account for the high computational costs of advanced AI operations, particularly reflection cycles.

The “Agentic Design Patterns” framework provides crypto investors with an unprecedented tool for evaluating AI-integrated projects beyond superficial hype. As the market matures, the ability to distinguish between Level 0 chatbots and genuine Level 2+ agents will become increasingly critical for identifying sustainable value creation in the intersection of AI and blockchain.

AI Agent Design Patterns: Implications for the Crypto Market

Market Impact: Beyond Hype to Structured Implementation

Token Price Implications: Differentiating Value Creation

Key Investment Opportunities

Significant Risks to Monitor

Strategic Recommendations for Investors

More from SiloRadar

FIFA World Cup Exit Day 15 Sees South Korea Stock Crash

Case involving over 200 million RMB: Why was the Shanghai virtual currency exchange illegal business operation case sentenced to a suspended sentence? Legal defense review by Attorney Shiwei Shao’s team

The next 10 years of Ethereum in Vitalik’s eyes

The Contract Algorithm Scythe: What Makes Already Fragile Shitcoins Even More Fragile?

How much more expensive will SK Hynix ADR be?