S03 todo write

2026年3月31日

s03 todo write

处理多步任务时，大模型常面临“失忆”窘境——不仅会原地打转、跳步，还容易彻底跑偏。核心原因在于“上下文污染”：长对话中，工具输出的日志会不断稀释掉最初的系统提示。

Harness 层: 规划。

改动

功能	s02 tool_use	s03 todo_writer
Tools	4 种基础工具（`bash`, `read`, `write`, `edit`）。	新增第 5 种工具 `todo`，用于多步任务的状态更新。
Task Planning	Agent 处于“走一步看一步”的无状态模式。	引入 `TodoManager`，校验状态（`pending`, `in_progress`, `completed`），强制 Agent 理清思路。
System Prompt	仅约束 OS 边界和完成条件。	明确指令要求 Agent：处理多步任务前必须使用 `todo` 工具进行规划，并标记状态。
Control flow	拦截所有参数完全相同的重复调用。	白名单机制：将无副作用或需要高频调用的 `todo` 和 `read_file` 移出拦截名单，避免误打断 Agent 的正常重试/更新逻辑。
Agent Reminder	全凭大模型自觉调用工具。	新增督工机制：如果 Agent 连续 3 轮交互都没有更新 todo 列表，系统会强制向上下文中注入 `<reminder>Update your todos.</reminder>`，防止其偏离计划。

todo 列表1

todo 列表2

核心类`TodoManager` 实现：带有“强制约束”的状态机

class TodoManager:
    def __init__(self):
        self.todos = []

    def update(self, todos: list) -> str:
        if len(todos) > 15:
            raise ValueError("Max 15 todos allowed")

        validated = []
        in_progress_num = 0
        for i, todo in enumerate(todos): # 验证每个 todo 项
            text = str(todo.get("text", "")).strip()
            status = str(todo.get("status", "")).strip().lower()
            todo_id = str(todo.get("id", f"todo_{i}")).strip()
            if not text:
                raise ValueError(f"Todo {todo_id}: text required")
            if status not in ["pending", "in_progress", "completed"]:
                raise ValueError(f"Todo {todo_id}: invalid status '{status}'")
            if status == "in_progress":
                in_progress_num += 1
            validated.append({
                "id": todo_id,
                "text": text,
                "status": status,
            })
        if in_progress_num > 1:
            raise ValueError("Max 1 in_progress todo allowed")
        self.todos = validated
        return self.render_todo()

反馈机制`render_todo`

    def render_todo(self) -> str:
        if not self.todos:
            return "No todos"
        colors = {
            "pending": "\033[33m",
            "in_progress": "\033[36m",
            "completed": "\033[32m",
            "reset": "\033[0m",
        }
        lines = []
        for todo in self.todos:
            status = todo["status"]
            marker = {"pending": "[ ]", "in_progress": "[>]", "completed": "[x]"}[status]
            color = colors[status]
            lines.append(f"{color}{marker} - {todo['id']}: {todo['text']}{colors['reset']}\n")

        # ...

通过返回形如 [>] - todo_1: Write tests 和 Done: 2/5 这样的结构化文本，实质上是在给大模型提供一种空间感和进度条，强化它的长期记忆。

Agent Loop 中的督工

s03 在 agent_loop 中加入了“督工机制”

# 统计距离上一次调用 todo 工具过去了多少轮
rounds_since_todo = 0 if used_todo else rounds_since_todo + 1

if rounds_since_todo >= 3: 
    results.insert(0, {"type": "text", "text": "<reminder>Update your todos.</reminder>"})

提醒大模型更新任务状态。

工具白名单

在 s02 中，为了防死循环，严格拦截了重复的工具调用。但在 s03 中，需要为 todo 工具放行。

# todo 和 read_file 属于低风险重复调用，跳过中止检测
if block.name not in {"todo", "read_file"} and tool_call_sig == last_tool_call:
    # ... 触发死循环拦截 ...

有时候大模型需要反复读取 Todo 来确认下一步，对死循环拦截逻辑做“白名单”处理。

具体代码

>>> agents-demo/agents at main · knight02-bit/agents-demo