AKG Kernel Agent
1. Overview
The AKG Kernel Agent is the first production scenario of AKG Agents, focused on multi-backend, multi-DSL high-performance kernel code generation and optimization.
- CLI entry:
akg_cli op - Domain: Kernel code generation for AI accelerators
2. Supported Backends and DSLs
| Platform | Backend | DSL | Example |
|---|---|---|---|
| Huawei Atlas A2 Training Series | Ascend | Triton Ascend | --backend ascend --dsl triton_ascend |
| NVIDIA GPU | CUDA | Triton CUDA | --backend cuda --dsl triton_cuda |
| NVIDIA GPU | CUDA | CUDA C | --backend cuda --dsl cuda |
| NVIDIA GPU | CUDA | TileLang CUDA | --backend cuda --dsl tilelang_cuda |
| CPU | CPU | C++ | --backend cpu --dsl cpp |
3. Built-in Workflows
The Kernel Agent supports multiple workflow strategies:
| Workflow | Description |
|---|---|
| Default | Full pipeline: Designer → Coder ↔ Verifier |
| CoderOnly | Code generation only (skip design phase) |
| Evolve | Evolutionary algorithm-based kernel optimization |
| AdaptiveSearch | UCB-based asynchronous pipeline search |
| KernelGenOnly | Kernel generation without verification |
| VerifierOnly | Verification only (for pre-existing code) |
4. Core Agents
KernelDesigner
Algorithm sketch design agent. Analyzes the kernel requirement and produces a high-level algorithm design with optimization hints.
- Skill-based: dynamically injects relevant domain knowledge
- Supports DSL-specific design patterns
KernelGen
Kernel code generation agent. Takes the algorithm design and generates executable kernel code in the target DSL.
- Skill-based: uses DSL-specific coding skills
- Callable as a tool by other agents
TaskConstructor
Standardized task builder agent. Extracts and standardizes kernel definitions from user input (e.g., PyTorch code) into a structured task format.
5. Workflow: Default Pipeline
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Designer │────▶│ Coder │────▶│ Verifier │
│ │ │ (KernelGen) │ │ │
│ Algorithm │ │ Code Gen │ │ Correctness │
│ Design │ │ │◀────│ Check │
└──────────────┘ └──────────────┘ └──────────────┘
│ │
│ ▼
│ ┌──────────────┐
│ │ Profiler │
│ │ Performance │
│ │ Analysis │
└──────────────┴──────────────┘
- Designer analyzes the kernel requirement and produces an algorithm sketch
- Coder generates kernel code based on the design
- Verifier checks correctness by comparing against the framework implementation
- If verification fails, the Coder receives error feedback and retries
- Profiler measures performance (execution time, speedup ratio)
6. DSL Configuration
Each DSL backend has a YAML configuration file controlling workflow behavior:
# Key configuration fields
agent_model_config:
kernel_designer: "complex"
kernel_gen: "standard"
log_dir: "logs/"
default_workflow: "default"
profile_settings:
run_times: 50
warmup_times: 5
verify_timeout: 300
7. Quick Start
# Ascend 910B2
akg_cli op --framework torch --backend ascend --arch ascend910b2 \
--dsl triton_ascend --devices 0,1,2,3,4,5,6,7
# CUDA A100
akg_cli op --framework torch --backend cuda --arch a100 \
--dsl triton_cuda --devices 0,1,2,3,4,5,6,7
After launch, you can:
- Describe what you need: "Generate a relu kernel"
- Paste KernelBench-style PyTorch code for conversion
For more CLI details, see AKG CLI.