akg-op End-to-End User Guide

Overview

akg-op is the operator optimization Agent in the AKG Agents + OpenCode integration. It generates high-performance operator code and supports two modes:

Mode	Trigger	Typical Scenario
Single Operator	Specify a concrete operator or provide source code	"Generate a relu kernel", "Optimize the layernorm in this file"
Fusion	Request model-level fusion analysis	"Analyze Qwen2 for operator fusion opportunities"

End-to-End Flow

┌─────────────────────────────────────────────────────────────┐
│  Switch to the akg-op Agent in opencode CLI                 │
│  (Tab key or /agents command)                               │
└────────────────────┬────────────────────────────────────────┘
                     ▼
          ┌─────────────────────┐
          │  Phase 0  Env Setup  │  Check akg_agents availability
          │  & Param Confirm     │  🛑 Confirm framework/backend/arch/dsl
          └─────────┬───────────┘
                    ▼
        ┌───────────────────────┐
        │  Fusion analysis       │
        │  needed?               │
        └───┬───────────────┬───┘
            │ Yes           │ No
            ▼               │
   ┌────────────────┐       │
   │  Phase 1       │       │
   │  Fusion        │       │
   │  Analysis      │       │
   │  🛑 User picks │       │
   │  opportunities │       │
   └───────┬────────┘       │
           │                │
           ▼                ▼
  ┌─────────────────────────────┐
  │  Phase 2  Build Task Desc    │  Extract operator logic from code
  │  Auto-validate + 🛑 Confirm  │  → {op_name}.py (KernelBench format)
  └─────────────┬───────────────┘
                ▼
  ┌─────────────────────────────┐
  │  Phase 3  Generate Operator  │  Pick a workflow to run
  │  🛑 User picks workflow      │  (kernelgen / adaptive_search / evolve)
  └─────────────┬───────────────┘
                │
            ┌───┴───┐
            │ Fail?  │──── Yes ──→ Output failure report, task ends
            └───┬───┘
                │ No
                ▼
  ┌─────────────────────────────┐
  │  Phase 4  Confirm Result     │  🛑 Show generated_code.py
  │                              │  Accept / Regenerate
  │  Regenerate → Back to Ph. 3  │
  └─────────────┬───────────────┘
                │ Accept
                ▼
  ┌─────────────────────────────┐
  │  Phase 5  Code Integration   │  Copy generated code to work dir
  │  (if source code provided)   │  Backup + import replacement
  └─────────────┬───────────────┘
                ▼
  ┌─────────────────────────────┐
  │  Phase 6  Output Report      │  report.md
  │  Config, results, changes    │
  └─────────────────────────────┘

Usage Examples

Example 1: Single Operator — Generate Only

User> Generate a relu kernel, input shape (128, 4096), fp32, on A100 with Triton

Phase 0: Check env (cache hit, skip install) → Confirm params: framework=torch, backend=cuda, arch=a100, dsl=triton_cuda
Phase 2: Generate relu.py (KernelBench format) → Validate → User confirms
Phase 3: User picks kernelgen → Run → Success
Phase 4: Show code → User accepts
Phase 5: Copy relu_generated.py to work dir
Phase 6: Output report

Example 2: Single Operator — Optimize Existing Code

User> Optimize the layernorm in /path/to/model.py using Triton

Phase 0: Check env (cache hit) → Confirm params
Phase 2: Extract layernorm from model.py → Generate layernorm.py → Validate → User confirms
Phase 3: User picks adaptive_search → Run (silent mode, ~15 min) → Success
Phase 4: Show code → User accepts
Phase 5: Backup model.py → Add "from layernorm_generated import ModelNew" → Save integrated file
Phase 6: Output report with file change records

Example 3: Fusion Mode (First Use, No Env Cache)

User> Analyze Qwen2 model for operator fusion opportunities and realize

Phase 0: Check env (no cache → detect akg_cli → collect hardware/framework/DSL → write cache)
         → Confirm params: framework=torch, backend=ascend, arch=ascend910b4, dsl=triton_ascend
Phase 1: Analyze Qwen2 forward → Found 3 fusion opportunities → User selects 2
  For each selected opportunity:
    Phase 2: Build fused operator task description → Validate → User confirms
    Phase 3: Generate fused operator
    Phase 4: Show code → Accept
    Phase 5: Backup → Integrate into model code
Phase 6: Output summary report

Working Directory

All artifacts are stored under ~/akg_agents_logs/op_{op_name}_{timestamp}_{random_id}/:

File	Description
`{op_name}.py`	KernelBench format task description (reference implementation + input definitions)
`{op_name}_generated.py`	The final generated operator code accepted by user
`{model}_generated.py`	Integrated source file with `from {op_name}_generated import ModelNew`
`output/{workflow}_{n}/`	Full output of each workflow run (code, summary, logs)
`backup/`	Original copies of replaced files (for rollback)
`report.md`	Final report

Available Workflows

Workflow	Strategy	Typical Duration	Best For
`kernelgen`	Iterative generate → verify → fix (subagent)	1-5 min	Clear requirements, quick results (default)
`adaptive_search`	UCB adaptive search (silent mode)	10-30 min	Higher quality, willing to wait
`evolve`	Island-model evolutionary algorithm (silent mode)	15-60 min	Diversity exploration, multi-device parallel

If unsatisfied with the result, choose a different workflow to regenerate. Previous results are preserved in separate subdirectories.

Architecture

Component	Type	Description
`akg-op`	Agent	Main orchestrator: mode detection, phase sequencing, user interaction
`akg-env-setup`	Skill	Environment check, hardware/framework/DSL detection, dependency install
`op-task-extractor`	Skill	Build KernelBench format task file from code or natural language
`vllm-ascend-operator-fusion`	Skill	Model fusion analysis
`kernelgen`	SubAgent	Iterative code generation + verification workflow
`kernel-generator`	Skill	DSL-aware kernel code generation (used by kernelgen)
`kernel-verifier`	Skill	Multi-framework, multi-backend correctness verification (used by kernelgen)
`search-workflow`	Skill	Background execution of adaptive_search / evolve workflows

Notes

Every phase includes a human confirmation step; nothing runs end-to-end without approval
On generation failure, the error is reported immediately with no automatic retry
Original files are always backed up before replacement; restore from backup/ if needed
adaptive_search and evolve run in silent mode (background) with ~1 min polling interval