Wujia / Haibo Wu

Blog 文章

If the Agent Loop Is Tiny, What Are the Other Tens of Thousands of Lines For? Agent 核心就几十行代码,那剩下的几万行到底在解决什么问题?

A production-engineering answer from Claude Code and Memex: the core ReAct loop is small, but real agents need tools, permissions, UI, state recovery, observability, and product structure.

从 Claude Code 泄露代码和 Memex 手机端 Agent 实践出发,解释生产级 Agent 的工程量为什么远远超过核心循环。

English edition adapted for native English readers; Chinese text follows the original Zhihu source. 英文版按英文读者习惯重写整理,中文版保留知乎原文。

After Claude Code's source leaked, I ran cloc and saw roughly 400,000 lines. I then asked Claude Code to look at the code with me. The conclusion was not that the core agent idea is complicated. It was that a production agent product contains a huge amount of surrounding engineering.

From the source, it is a full product-grade application: 40+ tool implementations with validation, permission models, progress tracking, and error handling; 140+ Ink/React terminal UI components; IDE bridge layers for VS Code and JetBrains; OAuth, JWT, macOS Keychain integration, organization policy controls; multi-agent coordination; plugin, skill, and memory systems; slash commands; session recovery; remote mode; voice input; Vim mode; themes; feature flags; telemetry; analytics; and a lot of TypeScript type definitions.
Online discussion about Claude Code source size
A screenshot from the original answer showing people reacting to the code size.

This probably explains why Claude Code can say it was built with Claude, while Anthropic is still hiring many engineers. AI can help write code, but real product engineering does not disappear.

The core logic of an agent can indeed be a simple ReAct-style loop. But making that loop work in the real world requires a large amount of engineering. Since many answers focused on coding agents, I used our recent open-source personal-life-recording agent, Memex, as the example. Its agent logic runs locally on the phone, and the codebase had already grown to nearly 80,000 lines.

Memex codebase size screenshot
Memex had already grown into a substantial local-agent codebase.

1. Connecting multiple LLM providers

OpenAI's API format is a de facto standard, but not every model provider or cloud vendor is fully compatible. To support multiple providers, you need wrappers for streaming output, token accounting, error handling, and edge cases.

Because the mobile Dart/Flutter ecosystem lacked a mature foundation for this, we open-sourced dart_agent_core to unify these interfaces. That layer alone reached around 7,000 lines.

dart_agent_core code size screenshot
The LLM provider layer alone contains nontrivial engineering work.

2. Rebuilding the toolbox on mobile

Coding agents such as Devin can rely on a Linux shell and mature command-line tools. A phone does not have that environment. If an agent needs Grep, Find, or Edit-like capabilities, you have to implement those tool behaviors locally.

Memex also handles heterogeneous personal input: images, voice, and text. The agent needs to turn those fragments into structured markdown for management. We deliberately did not provide web search or generic HTTP tools. Memex is meant to focus on personal records and internal logic, not become a general-purpose OpenDevin-like system.

3. Knowledge management without simple RAG

Coding agents benefit from code's strong structure. Personal records are much messier, and simple retrieval-augmented generation is often not enough. We wrote substantial code and prompts to make the agent behave like a file manager: classifying, indexing, and organizing local records.

We also set strict read-write granularity limits for the knowledge base, so the agent cannot operate on huge files or directories at once and blow up the context. In some organization tasks, we add adversarial review logic: if the agent proposes a knowledge structure that violates rules, code checks reject it and ask the agent to redo the work.

4. Permissions and safety boundaries

An agent that can call tools and read memory is risky by default. Every tool call needs its own permission checks. The system also needs memory isolation: which data is visible to the agent, and how the agent verifies that it is not operating outside the intended boundary.

5. Engineering generative UI

We did not want the agent to output only text. Memex experiments with generative UI: a library of common app templates, plus a fallback path where the agent generates HTML and renders it in a WebView. The routing and rendering logic itself takes real engineering.

6. Process scheduling and state recovery on mobile

Mobile operating systems manage memory aggressively. The app process can be killed at any time. If an agent task is halfway done, the system has to save progress and recover instantly when the user reopens the app.

7. Observability for model calls

Agent execution produces many model calls. To make the system transparent and controllable, you need observability: how many calls a task used, how many tokens it spent, how much it cost, and where failures occurred.

You also need error tracking. If the agent enters a loop or produces invalid output, logs and automatic interception prevent wasted API spend and make debugging possible.

The core loop is the ideal. The tens of thousands of lines are reality: the code that lets the ideal survive real devices, real data, real users, and real failure modes.

Memex is still only an early prototype. If you are interested in this direction, the project is open at github.com/memex-lab/memex.

今天Claude code代码被泄露了,我用cloc跑了下,有40w行代码,我让cc看了下代码:

从源码看,它其实是一个完整的产品级应用,塞了非常多东西:
40+ 个工具实现,每个都有自己的输入校验、权限模型、进度追踪、错误处理,光这些就很重了
140+ 个 Ink/React 终端 UI 组件,做了完整的交互式终端界面,不是简单的 console.log
IDE 桥接层(VS Code、JetBrains 双向通信),这本身就是个小型 RPC 系统
OAuth 2.0 完整流程、JWT、macOS Keychain 集成、组织级策略限制
多 Agent 协调系统、团队协作、Agent Swarms
插件系统、技能系统、持久化记忆系统
50+ 个斜杠命令,很多都有自己的完整 UI 界面
会话管理、恢复、分享、远程模式
语音输入、Vim 模式、主题系统
GrowthBook 特性开关、OpenTelemetry 遥测、分析系统
另外代码膨胀还有一些工程层面的原因:TypeScript 类型定义本身就占不少行数;Tool.ts 一个文件就 29K 行,大量是类型声明;权限系统在每个工具上都有独立的逻辑分支;还有大量的条件编译代码(内部版本 vs 公开版本通过 feature() 和 USER_TYPE 区分)。

一堆人在线打卡:

Agent 核心就几十行代码,那剩下的几万行到底在解决什么问题?

估计这也是为什么一方面claude code人说claude code是大家用claude写的,一方面又在大肆招聘工程师

这是一个很好的问题,虽然 Agent 的核心逻辑确实是一个基于 ReAct 思想的简单循环(Loop),就像其他答友回答的,依旧需要大量的工程工作来让系统work。看到大家回答集中在coding agent,我来提供一个最近我们开源的个人生活记录Agent的开发实践。它是一个Agent逻辑完全实现在手机本地的产品,目前代码量已经达到快8万行。

Agent 核心就几十行代码,那剩下的几万行到底在解决什么问题?

统一接入多种大模型供应商 (LLM Providers)

虽然 OpenAI 的格式是行业标准,但并不是所有模型厂商或云服务商都完全兼容。

  • 适配工作: 为了接入不同的 LLM Provider,需要写大量的封装代码来处理流式输出、Token 统计和错误捕获。
  • 自研底座: 由于移动端(Dart/Flutter 体系)缺乏成熟的框架,我们开源了 dart_agent_core 来统一这些接口。https://github.com/memex-lab/dart_agent_core
    这里也有7千的代码量
Agent 核心就几十行代码,那剩下的几万行到底在解决什么问题?

2. 在移动端重建“工具箱”

主流的 Coding Agent(如 Devin)依赖 Linux 的 Bash 环境,可以使用各种成熟的命令行工具。但手机系统没有这些环境。

  • 工具定义: 我们必须在手机本地手动实现一套类似 Grep、Find、Edit 的工具逻辑。
  • 异构数据处理: 用户输入包含图片、语音和文字,Agent 需要将这些碎片信息统一结构化为 Markdown 文本进行管理。
  • 我们没有提供网络搜索、HTTP 请求等对外工具。这种选择是为了让 Memex 成为一个专注于个人记录、注重内在逻辑的智能体,而不是变成一个像 OpenDevin 那样无所不包的通用型工具。这种深度挖掘本地数据价值的逻辑,远比调用一个搜索接口要复杂得多。

3. 非 RAG 的知识管理逻辑

Coding Agent 强大是因为代码本身具有高度的结构化。个人记录的数据非常杂乱,简单的 RAG(检索增强生成)往往效果不佳。

  • 整理工作: 我们编写了大量的代码和Prompt让 Agent 扮演“文件管理员”的角色,对用户的本地文件进行分类、索引和整理。
  • 粒度约束: 针对知识库的文件和目录,我们设定了严格的读写粒度限制,防止 Agent 一次性操作过大的数据导致上下文溢出或逻辑混乱。
  • 对抗性审查: 在 Agent 进行知识整理时,我们会引入一套“审查逻辑”与其对抗。例如,Agent 提出一种知识库结构,我们会通过预设的代码逻辑去校验该结构的合理性,如果不符合规范则打回重做。

4. 权限检查与安全边界

Agent 能够调用工具和读取内存,这带来了巨大的安全风险。

  • 校验机制: 每一项工具调用都需要独立的权限检查逻辑。
  • 记忆隔离: 必须明确定义哪些数据是 Agent 可见的,以及如何进行自我校验,防止 Agent 误操作。

5. 生成式 UI(Generative UI)的工程化

为了提升体验,我们没有只输出文本,而是实践了生成式 UI。

  • 模板系统: 预先实现了一套主流 App 的 UI 模板库。
  • 动态渲染: Agent 会优先匹配预设模板;如果无法匹配,则动态生成 HTML 并通过 WebView 渲染。这套调度逻辑占用了不少代码量。

6. 移动端的进程调度与状态恢复

手机系统的内存管理非常严格,App 进程随时可能被系统中止。

  • 状态保持: 为了防止 Agent 的任务执行到一半被杀掉,需要编写大量的逻辑来保存运行进度,并在用户重新打开 App 时瞬间恢复状态。

7. 模型调用的可观测性 (Observability)

Agent 运行过程中会产生大量的模型调用。为了让系统透明且可控,我们构建了一整套可观测性设施:

  • 成本与频次监控: 详细记录 Agent 在完成一个任务时调用了多少次接口,消耗了多少 Token,折算成具体的费用。
  • 错误追踪: 当 Agent 陷入死循环或输出格式错误时,需要有完整的日志记录和自动拦截机制,防止产生无效的 API 支出。

还有很多很多各种各样的feature需要实现,今天开源的Memex依旧只是一个雏形,诚邀对这个方向感兴趣的同学一起来参与这个项目:

Github:https://github.com/memex-lab/memex