workspace_autoresearch
Claude Code-driven iterative kernel optimization workspace inside
akg_agents/. Plan → edit → eval → keep/discard loop against a
measurable metric. Verifier / DSL adapters / patches reuse
akg_agents.op.verifier and akg_agents.op.utils; phase machine +
hooks + slash command are workspace-local.
Quick Start
# Drop sources into workspace/<op_name>_ref.py and workspace/<op_name>_kernel.py,
# then start a task. --dsl required; pick --devices N XOR --worker-url.
/autoresearch --ref workspace/<op_name>_ref.py --kernel workspace/<op_name>_kernel.py \
--op-name <op_name> --dsl triton_cuda --devices 0
# Resume later
/autoresearch --resume
# Monitor in a separate terminal
python .autoresearch/scripts/dashboard.py <task_dir> --watch
Full operational details in .claude/commands/autoresearch.md.
Remote Worker
For eval on remote hardware (e.g. Ascend NPU), pass
--worker-url 127.0.0.1:9111 to /autoresearch on init, or set in
task.yaml:
worker:
urls:
- 127.0.0.1:9111
Skills Library
Skills live at akg_agents/python/akg_agents/op/resources/skills/, organized
by DSL/backend. The workspace points at them via AKG_AGENTS_AR_SKILLS_ROOT (set in
.claude/settings.json). During PLAN, follow the [AR Phase: …] hint — it
embeds the resolved Glob pattern. Read SKILL.md files whose frontmatter
matches your direction; cite SKILL ids in plan rationales.
Invariants (hook-driven flow)
-
.ar_state/plan.mdis the source of truth. Onlycreate_plan.pyandsettle.pywrite it (both viaworkflow.PlanStore). Never hand-edit. TodoWrite is a UI mirror, not a substitute. -
Plan IDs are globally monotonic.
p1, p2, ...fromprogress.json.next_pid. Never reuse, never skip. -
Every
pNeither settles (KEEP / DISCARD / FAIL inhistory.jsonl) or is silently dropped at a REPLAN/DIAGNOSE boundary — pid counter still advances, no synthetic DISCARD row written. -
Phase transitions are owned by
workflow.PhaseController. Never write.ar_state/.phasemanually. The hook (hooks/post_bash.py) triggers the controller after activation and aftercreate_plan.pyvalidates; the engine scripts (workflow.run_baseline_initinsideengine/baseline.py,_post_settleinsideengine/pipeline.py) trigger it after baseline / round settlement. Either way every write goes throughPhaseController.on_*. Listen to the[AR Phase: ...]messages on stderr; don't pokephase_machinedirectly (it's a library, not a CLI —hooks/guard_bash.pyrejects direct invocation). -
Editable files are scoped by
task.yaml.editable_files. Editing anything else is rejected byhooks/guard_edit.py. -
After a session break, resume with
/autoresearch --resume. Do not patch state files to recover. -
create_plan.pyrejects mean the plan has a real problem (diversity, repeated failure keywords, short rationale). Read stderr and rewrite — don't retry the same XML payload. -
TodoWrite sync is mandatory. When a hook emits
additionalContextwith a TodoWrite payload, call TodoWrite with it verbatim next turn. -
AR scripts run as direct top-level Bash invocations only. To invoke a blessed CLI the command must be a single foreground call:
python .autoresearch/scripts/engine/<name>.py <task_dir> [args...](pipeline, baseline, create_plan, eval_wrapper, quick_check, settle, parse_args). The top-level lifecycle scripts use the flat path:python .autoresearch/scripts/<name>.py(scaffold, resume, dashboard). Env-var prefixes, Python flags, and FD redirection (> log 2>&1) are fine. Wrappers (nohup,bash -lc,sh -c, subshells,$(...)), chains (&&,||,;,|), and backgrounding (&) are unsupported and rejected byhooks/guard_bash.py. Run multiple AR scripts as separate Bash tool calls.Reading AR scripts (e.g.
cat .autoresearch/scripts/engine/pipeline.py,git diff -- .autoresearch/scripts/engine/settle.py) is allowed because the classifier sees those heads as read-only and the args don't execute. The Read tool is still preferred — it's the idiomatic way to inspect file contents in Claude Code. -
DIAGNOSE phase ends with a new plan. Two paths to that end:
- Preferred (subagent route). Call
Task(subagent_type='ar-diagnosis'); the subagent's prompt asks it to Write a structured artifact at<task_dir>/.ar_state/diagnose_v<plan_version>.mdcontaining three sections (Root cause/Fix directions/What to avoid), useful citations of recent FAIL rounds byR<n>, and the marker line[AR DIAGNOSE COMPLETE marker_v<plan_version>]. The host gates on file presence, marker, and section names; then writeplan_items.xmland runcreate_plan.py. - Fallback (manual planning). After 5 failed Task attempts on the
same
plan_version, the artifact gate is relaxed: writeplan_items.xmlyourself usinghistory.jsonl+plan.md, then runcreate_plan.py. Further Task calls are blocked at this point.
While the artifact is invalid AND attempts < cap, Bash is locked to
read-only / lifecycle ops (no AR scripts beyond create_plan.py,
which is itself gated on artifact validity).
Provenance note: hook payloads do NOT distinguish main agent from subagent, so the host validates the artifact's CONTENT only — not who wrote it. The subagent path is preferred because the prompt and read-only-by-default tool isolation produce a more reliable diagnosis, not because the host can prove the subagent wrote the file.
- Stop is only legal at phase FINISH.
hooks/stop_save.pyblocks early Stop in every other phase; the block message embedsget_guidance(task_dir)so the agent sees the next action.max_rounds+ auto-DIAGNOSE-on-3-fails are the budget. If stuck, go through DIAGNOSE (#10), not Stop.
Dependencies
- Python >= 3.10
pip install pyyaml fastapi uvicornakg_agentsimportable (pip install -e akg_agents/orPYTHONPATH)- Claude Code CLI