文件最后提交记录最后更新时间
workspace_autoresearch prompts: DSL-neutralize PLAN / DIAGNOSE / REPLAN The shared planning prompts and diversity check were authored with triton vocabulary baked in (matmul epilogues, num_warps, num_stages, block_size_*). Same prompt flow runs for every DSL, so the framing should match: otherwise an ascendc / cuda_c / tilelang task gets copy-as-shown plan items oriented around triton primitives, and parameter-tuning detection has dead clauses on one DSL while letting the others' tuning vocabulary slip through. guidance.py: - _PLAN_XML_EXAMPLE rewritten to three structural items (tree-style accumulation / inner-axis re-layout / vector-width padding) using DSL-neutral language; design-note comment updated to match - DIAGNOSE subagent prompt's hardcoded "Do NOT Write kernel.py" replaced with the editable-file list rendered from task.yaml, so multi-file editable sets (compile-pipeline DSLs) read right; threads the existing editable kwarg through _diagnose_subagent_ prompt's signature create_plan.py: - _PARAM_WORDS expanded with cross-backend tuning vocabulary (chunk / shard / lanes / threads / cores / queue / depth / count / dim / value / search) - _PARAM_PHRASES grew sections for ascendc (block_dim / tile_num / core_num / queue_depth / tiling_key) and cuda_c (threads_per_block / blocks_per_grid / shared_mem); kept as one shared set rather than DSL-bucketed since diversity-check semantics are identical across DSLs .claude/agents/ar-diagnosis.md: - first-time check-in (previously gitignored); content already in DSL-neutral form: kernel.py literals replaced with "editable file(s) listed in the parent's prompt", since the parent guidance.py already passes the resolved list Single _dsl_specific_hint branch in guidance.py was considered and rejected: DSL-specific knowledge (e.g. ascendc's three-source-string contract) belongs in the skills tree, not the prompt scaffolding. guidance stays one shared shape for every DSL. 5 天前
workspace_autoresearch prompts: DSL-neutralize PLAN / DIAGNOSE / REPLAN The shared planning prompts and diversity check were authored with triton vocabulary baked in (matmul epilogues, num_warps, num_stages, block_size_*). Same prompt flow runs for every DSL, so the framing should match: otherwise an ascendc / cuda_c / tilelang task gets copy-as-shown plan items oriented around triton primitives, and parameter-tuning detection has dead clauses on one DSL while letting the others' tuning vocabulary slip through. guidance.py: - _PLAN_XML_EXAMPLE rewritten to three structural items (tree-style accumulation / inner-axis re-layout / vector-width padding) using DSL-neutral language; design-note comment updated to match - DIAGNOSE subagent prompt's hardcoded "Do NOT Write kernel.py" replaced with the editable-file list rendered from task.yaml, so multi-file editable sets (compile-pipeline DSLs) read right; threads the existing editable kwarg through _diagnose_subagent_ prompt's signature create_plan.py: - _PARAM_WORDS expanded with cross-backend tuning vocabulary (chunk / shard / lanes / threads / cores / queue / depth / count / dim / value / search) - _PARAM_PHRASES grew sections for ascendc (block_dim / tile_num / core_num / queue_depth / tiling_key) and cuda_c (threads_per_block / blocks_per_grid / shared_mem); kept as one shared set rather than DSL-bucketed since diversity-check semantics are identical across DSLs .claude/agents/ar-diagnosis.md: - first-time check-in (previously gitignored); content already in DSL-neutral form: kernel.py literals replaced with "editable file(s) listed in the parent's prompt", since the parent guidance.py already passes the resolved list Single _dsl_specific_hint branch in guidance.py was considered and rejected: DSL-specific knowledge (e.g. ascendc's three-source-string contract) belongs in the skills tree, not the prompt scaffolding. guidance stays one shared shape for every DSL. 5 天前
workspace_autoresearch: Claude Code 驱动的算子优化工作区 7 天前
workspace_autoresearch: Claude Code 驱动的算子优化工作区 7 天前
workspace_autoresearch: Claude Code 驱动的算子优化工作区 7 天前
workspace_autoresearch: Claude Code 驱动的算子优化工作区 7 天前
workspace_autoresearch: Claude Code 驱动的算子优化工作区 7 天前
README.md

workspace_autoresearch

Claude Code 驱动的算子迭代优化工作区,集成在 akg_agents/ 下。给一对 (reference.py, kernel.py),Claude 自动跑 plan → edit → eval → KEEP/DISCARD 循环把 kernel 性能调优,连续失败自动 DIAGNOSE,预算耗尽自动收尾出报告。 整套阶段机由 hook 强约束,Claude 不能跳步、不能改 plan.md、不能手写 phase。

支持 DSL(端到端验证状态以 triton_ascend 为参考;其它 DSL 走 akg 主体相同 代码路径,但 NPU/CUDA 真实硬件覆盖度依本仓 CI 为准):

DSL 后端 skills 树 (op/resources/skills/<dsl>/)
triton_ascend Ascend
triton_cuda CUDA
tilelang_cuda CUDA
tilelang_npuir Ascend
ascendc Ascend
cuda_c CUDA
cpp CPU
pypto Ascend
swft Ascend
torch CPU

与 akg_agents 的关系

本目录是 Claude Code 表面层。底层 verifier / DSL 适配器 / 补丁等核心实现 直接来自 akg_agents.op.verifierakg_agents.op.utils,通过 Python import 复用。skills 知识库共用 akg_agents/python/akg_agents/op/resources/skills/ (通过 .claude/settings.jsonAKG_AGENTS_AR_SKILLS_ROOT 环境变量定位)。

本目录独有的部分:phase machine + workflow + hooks + slash command —— Claude Code 特定的交互式优化编排,akg 主线的 LangGraph workflow 不复用这一层。

依赖

  • Python ≥ 3.10
  • git 可执行(部分精简镜像不自带;workspace 依赖 git 做 per-round commit/rollback)
  • pip install pyyaml fastapi uvicorn
  • akg_agents 可导入:在 akg_agents/pip install -e .,或确保 akg_agents/python/PYTHONPATH
  • Claude Code CLI(或 VS Code 扩展)
  • 按 DSL 追加:torch_npu + triton + CANN(Ascend);triton + CUDA runtime(CUDA); msprof 在 PATH(ascendc);nsys 在 PATH(cuda_c)

Quick Start

cd akg_agents/workspace_autoresearch
# 把 (<op>_ref.py, <op>_kernel.py) 放到 workspace/
claude

在 Claude 里粘 slash 命令:

/autoresearch --ref workspace/sinkhorn_ref.py --kernel workspace/sinkhorn_kernel.py \
  --op-name sinkhorn --dsl triton_ascend --devices 5 --max-rounds 200

scaffold + 首轮 baseline 原子完成 → 进 PLAN → 自动迭代到 FINISH。

监控:

python .autoresearch/scripts/dashboard.py

Worker

远端 NPU / CUDA 通过 SSH tunnel 接入,eval 提交到远端跑。worker 直接用 akg 的 akg_cli(worker 端只要装好 akg_agents 即可):

# worker 机器
akg_cli worker --start --backend ascend --arch <your-ascend-arch> --devices 0 --port 9111

# 本地建隧道
ssh -f -N -L 127.0.0.1:9111:127.0.0.1:9111 npu_host
curl http://127.0.0.1:9111/api/v1/status

task.yaml 里写 worker.urls: [127.0.0.1:9111],workspace 的 eval 自动通过 akg_agents.core.worker.RemoteWorker 发到远端。

输出

每个 task 落在 ar_tasks/<op>_<ts>_<uuid>/

ar_tasks/<op>_<ts>_<uuid>/
├── kernel.py          ← 性能优化后的 kernel
├── reference.py       ← scaffold 拷过来的 ref
├── task.yaml          ← dsl / arch / metric / editable_files
└── .ar_state/
    ├── .phase         ← 当前 phase(结束时是 FINISH)
    ├── progress.json
    ├── plan.md        ← agent 优化历史(权威态)
    ├── history.jsonl
    └── report.md      ← 含 SVG 收敛曲线

内部入口

想了解 看哪里
Phase 流转规则 / Bash gate phase_machine/phase_policy.py
Hook 接线 .claude/settings.json + hooks/
Plan / history / progress 写入 phase_machine/state_store.py
评测桥(KernelVerifier + WorkerManager 包装为 sync dict) utils/akg_eval.py
DSL adapter / Profiler / Verify / Worker akg_agents.op.verifier.* / akg_agents.core.worker.*(akg 主体)
CodeChecker(async 包装为 sync) utils/code_checker.py → akg op.utils.code_checker
Git 操作(GitRepo 包装) utils/git_utils.py → akg op.autoresearch.framework.git_repo.GitRepo
不变量(plan 权威态 / pid 单调 / DIAGNOSE 契约 等) AGENTS.md
子代理 prompt(DIAGNOSE 用) .claude/agents/ar-diagnosis.md
批量跑 BATCH.md