KernelGen Agent Design Document
Overview
KernelGen is a Skill System-based kernel code generation Agent in AKG Agents. It inherits from AgentBase (core_v2) and is responsible for generating high-performance kernel code by dynamically selecting relevant knowledge and strategies through the Skill System based on user input and historical context.
Core Features
- Layered Stage-Based Skill Selection: Three-layer skill selection (L0 always-inject, L1 LLM-selected guides+examples, L2 LLM-selected cases) adapted per generation stage (initial/debug/optimize)
- Backend Coarse Filter: Pre-filters all skills by backend metadata before stage-based selection
- AB Test Support:
exclude_skill_names/force_skill_namesfor precise A/B testing of evolved skills - Multi-DSL Support: Supports various target DSLs (Triton CUDA, Triton Ascend, AscendC, etc.)
- Multi-Framework Adaptation: Supports PyTorch, MindSpore, NumPy and other frontend frameworks
- Agent Registration: Automatically registered to the Agent registry via
@register_agentdecorator - Tool Configuration: Provides
TOOL_NAME,DESCRIPTION,PARAMETERS_SCHEMAfor invocation by KernelAgent as a tool
Architecture
core_v2/agents/base.py # AgentBase base class
↑
op/agents/kernel_gen.py # KernelGen Agent (inherits AgentBase)
↑
op/agents/kernel_agent.py # KernelAgent (ReAct Agent, calls KernelGen as tool)
Tool Configuration
When KernelGen is called as a tool by KernelAgent:
| Attribute | Value |
|---|---|
| TOOL_NAME | call_kernel_gen |
| Use Case | User explicitly says "no verification needed", "just give me the code", "quick draft" |
| Output | Generated kernel code (including class ModelNew and kernel function) |
⚠️ This tool only generates code without verification. For a complete generation + verification flow, use use_kernelgen_only_workflow.
Parameters
| Parameter | Type/Required | Description |
|---|---|---|
| op_name | str (Required) | Operator name |
| task_desc | str (Required) | Task description or algorithm specification |
| dsl | str (Required) | Target DSL: triton_ascend, triton_cuda, etc. |
| framework | str (Required) | Target framework: torch, mindspore, numpy, etc. |
| backend | str (Required) | Target hardware backend: cuda, ascend, etc. |
| arch | str (Optional) | Target hardware architecture: a100, ascend910b4, etc. |
| user_requirements | str (Optional) | Additional user requirements |
| task_id | str (Optional) | Task ID |
| history_compress | list (Optional) | Compressed history record list |
| verifier_error | str (Optional) | Verifier error message |
| conductor_suggestion | str (Optional) | Conductor fix suggestion |
| model_level | str (Optional) | Model level: standard, fast, complex, default standard |
| extra_skills | list (Optional) | Extra skill objects injected after selection, bypassing all filters |
| exclude_skill_names | list[str] (Optional) | Skill names to exclude (AB test A mode) |
| force_skill_names | list[str] (Optional) | Skill names to force-include (AB test B mode) |
Skill System Integration
Layered Stage-Based Skill Selection
KernelGen uses a layered stage-based skill selection mechanism adapted to the current generation stage (initial, debug, optimize):
Pre-filter: Backend coarse filter via OperatorSkillSelector.coarse_filter() removes skills incompatible with the target backend.
Layer 0 (Always Inject): fundamental and reference category skills are always included regardless of stage.
Layer 1 (LLM-Selected Guides + Examples): guide category skills are selected by LLM based on task description and operator characteristics. example skills are matched by the operator_type of the selected guides.
Layer 2 (LLM-Selected Cases): case category skills are only included in debug (fix cases) and optimize (improvement cases) stages. Cases are selected in the same LLM call as guides.
Stage Mapping:
| Stage | Trigger | Included Categories |
|---|---|---|
initial |
First generation | fundamental, reference, guide, example |
debug |
verifier_error present |
fundamental, reference, guide, example, case (fix) |
optimize |
inspirations present |
fundamental, reference, guide, example, case (improvement) |
AB Test Control:
exclude_skill_names: Skills matching these names are removed before selection (A mode — baseline without evolved skills)force_skill_names: Skills matching these names are force-appended after LLM selection (B mode — ensure evolved skills are included)
These can be set as instance attributes or passed as run() parameters (which temporarily override instance attributes).
kernel_gen = KernelGen()
# AB test A mode: exclude evolved skills
kernel_gen.exclude_skill_names = ["triton-ascend-error-fix", "triton-ascend-case-reduce-opt"]
# AB test B mode: force-include evolved skills
kernel_gen.force_skill_names = ["triton-ascend-error-fix"]
# Extra skills: bypass all filters, appended after selection
kernel_gen.extra_skills = [my_custom_skill]
Skills Directory
Skills are stored in the op/resources/skills/ directory. Evolved skills can be symlinked from ~/.akg/evolved_skills/{dsl}/ into the standard skills directory for automatic discovery. See Skill System Documentation for details.
Execution Flow
-
Initialization Stage
- Initialize parent class
AgentBase(configure LLM, load templates, etc.) - Create code parser via
parser_loader - Load Skill System (
SkillLoaderloads from skills directory) - Load Jinja2 Prompt templates (
system_prompt.j2,user_prompt.j2)
- Initialize parent class
-
Skill Selection Stage
- Apply backend coarse filter, then layered stage-based selection (L0 always-inject, L1 LLM guide+example, L2 LLM case)
- Apply
exclude_skill_namesbefore selection andforce_skill_namesafter selection (for AB test) - Append
extra_skills(if any) after all selection, ensuring specified Skills are always included - Return most relevant Skills list
-
Prompt Construction Stage
- Render system prompt using System Prompt template (with DSL, framework, backend info)
- Render user prompt using User Prompt template (with history, Skills content, task description, etc.)
-
Code Generation Stage
- Call LLM via
run_llmto generate code - Return
(generated code, full prompt, reasoning process)
- Call LLM via
Usage Examples
Direct Invocation
from akg_agents.op.agents.kernel_gen import KernelGen
# Initialize
kernel_gen = KernelGen()
# Execute code generation
code, prompt, reasoning = await kernel_gen.run(
op_name="relu",
task_desc="Implement ReLU activation function",
dsl="triton_cuda",
framework="torch",
backend="cuda",
arch="a100"
)
Via KernelAgent (as Tool)
KernelGen is registered as the call_kernel_gen tool and is automatically invoked by KernelAgent during the ReAct loop based on user needs. See Workflow Documentation for details.