yiyanzhi_akane1update docs for br_agents

AKG Kernel Agent

1. Overview

The AKG Kernel Agent is the first production scenario of AKG Agents, focused on multi-backend, multi-DSL high-performance kernel code generation and optimization.

CLI entry: akg_cli op
Domain: Kernel code generation for AI accelerators

2. Supported Backends and DSLs

Platform	Backend	DSL	Example
Huawei Atlas A2 Training Series	Ascend	Triton Ascend	`--backend ascend --dsl triton_ascend`
NVIDIA GPU	CUDA	Triton CUDA	`--backend cuda --dsl triton_cuda`
NVIDIA GPU	CUDA	CUDA C	`--backend cuda --dsl cuda`
NVIDIA GPU	CUDA	TileLang CUDA	`--backend cuda --dsl tilelang_cuda`
CPU	CPU	C++	`--backend cpu --dsl cpp`

3. Built-in Workflows

The Kernel Agent supports multiple workflow strategies:

Workflow	Description
Default	Full pipeline: Designer → Coder ↔ Verifier
CoderOnly	Code generation only (skip design phase)
Evolve	Evolutionary algorithm-based kernel optimization
AdaptiveSearch	UCB-based asynchronous pipeline search
KernelGenOnly	Kernel generation without verification
VerifierOnly	Verification only (for pre-existing code)

4. Core Agents

KernelDesigner

Algorithm sketch design agent. Analyzes the kernel requirement and produces a high-level algorithm design with optimization hints.

Skill-based: dynamically injects relevant domain knowledge
Supports DSL-specific design patterns

KernelGen

Kernel code generation agent. Takes the algorithm design and generates executable kernel code in the target DSL.

Skill-based: uses DSL-specific coding skills
Callable as a tool by other agents

TaskConstructor

Standardized task builder agent. Extracts and standardizes kernel definitions from user input (e.g., PyTorch code) into a structured task format.

5. Workflow: Default Pipeline

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Designer   │────▶│    Coder     │────▶│   Verifier   │
│              │     │  (KernelGen) │     │              │
│ Algorithm    │     │ Code Gen     │     │ Correctness  │
│ Design       │     │              │◀────│ Check        │
└──────────────┘     └──────────────┘     └──────────────┘
                           │                     │
                           │                     ▼
                           │              ┌──────────────┐
                           │              │   Profiler   │
                           │              │ Performance  │
                           │              │ Analysis     │
                           └──────────────┴──────────────┘

Designer analyzes the kernel requirement and produces an algorithm sketch
Coder generates kernel code based on the design
Verifier checks correctness by comparing against the framework implementation
If verification fails, the Coder receives error feedback and retries
Profiler measures performance (execution time, speedup ratio)

6. DSL Configuration

Each DSL backend has a YAML configuration file controlling workflow behavior:

# Key configuration fields
agent_model_config:
  kernel_designer: "complex"
  kernel_gen: "standard"

log_dir: "logs/"
default_workflow: "default"

profile_settings:
  run_times: 50
  warmup_times: 5

verify_timeout: 300

7. Quick Start

# Ascend 910B2
akg_cli op --framework torch --backend ascend --arch ascend910b2 \
  --dsl triton_ascend --devices 0,1,2,3,4,5,6,7

# CUDA A100
akg_cli op --framework torch --backend cuda --arch a100 \
  --dsl triton_cuda --devices 0,1,2,3,4,5,6,7

After launch, you can:

Describe what you need: "Generate a relu kernel"
Paste KernelBench-style PyTorch code for conversion

For more CLI details, see AKG CLI.