Skill Evolution
1. Overview
The Skill Evolution system is a general-purpose experience extraction framework: it automatically distills reusable optimization knowledge from Agent runtime logs and interaction records, generating structured SKILL.md documents. This gives Agents the ability to "practice → summarize → reuse" in a closed learning loop.
The current implementation focuses on the operator layer. It runs as a SubAgent registered as the call_skill_evolution tool, callable by KernelAgent during operator generation/optimization workflows.
Four modes:
- search_log: Extract evolution chain diffs from search logs — automated optimization patterns
- expert_tuning: Extract human tuning experience from conversation history — "user advice → code change → performance delta" causal chains
- error_fix: Extract debugging experience from error fix records — "error type → fix strategy"
- organize: Consolidate evolved skills under the same DSL by optimization theme — "error type → fix strategy"
Goal: Turn automated search logs, human expertise, and debugging experience into structured knowledge for future kernel generation.
Architecture: SkillEvolutionBase (core_v2/agents/) provides shared capabilities (workspace management, logging utilities). SkillEvolutionAgent (op/agents/) inherits from the base class and implements the four operator-specific modes.
2. search_log Mode
2.1 Data Sources
The system reads exactly 3 files from the logs/ directory:
| File | Content | Key Fields |
|---|---|---|
verification_results.jsonl |
Verification records | task_id, passed, verify_dir, dsl, backend, arch |
{op}/profiling/speed_up_record.txt |
Performance records | task_id, generation_time, speedup |
{op}_lineage_graph.md |
Evolution tree table | task_id, parent_id, generation |
For each passed task, the actual implementation code is read from verify_dir/*_impl.py.
2.2 Pipeline
1. collect — Parse 3 files + read impl code → List[TaskRecord]
2. compress — Build evolution tree → monotonic stack per path → strip comments → diff
3. LLM — Best code + evolution diffs → generate SKILL.md body
4. Writer — YAML frontmatter + body → write SKILL.md
2.3 Collector
collect(log_dir, op_name) -> (records, metadata)
- Parses
verification_results.jsonlfor passed tasks and theirverify_dir - Parses
speed_up_record.txtforgen_timeandspeedup - Parses
lineage_graph.mdtable forparent_idandgeneration - Reads
*_impl.pyfrom each task'sverify_diras the code - Returns a flat list of
TaskRecordand environment metadata (dsl,backend,arch)
2.4 Compressor
compress(records, metadata) -> CompressedData
Best record: The record with the smallest gen_time (fastest execution). Its full code is included in the prompt.
Monotonic stack evolution chains:
- Reconstruct the evolution tree from
parent_idrelationships - DFS to collect all root-to-leaf paths
- For each path, maintain a monotonic stack: only keep nodes where
gen_timestrictly decreases (performance strictly improves) - For adjacent nodes in the filtered stack, strip comments then generate unified diff
- Skip pairs where
MIN_GEN_TIME_IMPROVE_PCT < 0.01(too small to be meaningful)
The comment stripping (strip_comments) removes docstrings, pure comment lines, and inline comments before diffing, eliminating noise from comment rewording.
2.5 LLM Analysis
The Jinja2 template (analyze_search_log.j2) injects:
- Operator info (name, DSL, backend, architecture)
- Best implementation (full code with gen_time and speedup)
- Evolution chain diffs (comment-stripped, monotonic-filtered)
- Performance summary
The LLM generates the SKILL.md body directly in Markdown, with skill_name and description as the first two lines.
Generation goal: Extract transferable, generalized optimization methodology rather than describing operator-specific characteristics. The document structure is: Task characteristics (define the problem class) → Optimization methods (each as an independent section with conditions, approach, and rationale) → Applicability boundaries.
2.6 Writer
Assembles YAML frontmatter (name, description, category, backend, dsl, source) + LLM body → writes to ~/.akg/evolved_skills/{dsl}/evolved-improvement/{skill_name}/SKILL.md.
Naming convention: skill_name follows the {dsl}-case-{op-category}-{optimization-detail} format, e.g. triton-ascend-case-reduction-amin-large, triton-ascend-case-elemwise-broadcast-3d. category is example, source is search_log.
2.7 Core Algorithm: Monotonic Stack
Original path: A(17us) → B(8us) → C(9us) → D(8us)
Monotonic stack: A(17us) → B(8us)
Diff pairs: (A→B) — only pair where performance strictly improved
- Comparison uses
gen_time(lower is better) seen_pairsset prevents duplicate diffs when paths share common prefixesMIN_GEN_TIME_IMPROVE_PCT = 0.01filters out negligible improvements
3. expert_tuning Mode
Extracts human tuning experience from conversation history to generate SKILL.md.
Use case: The user manually guides optimization during a conversation (e.g., "increase BLOCK_SIZE", "add more warps") and wants to distill these insights into a reusable skill.
3.1 Data Source
The collector reads {conversation_dir}/trace.json to obtain the conversation tree structure, then uses DFS to find all root-to-leaf paths. Each path becomes an independent branch. For each branch, actions are read from the corresponding actions/action_history_fact.json files in path order.
Falls back to node-number sorting (node_001, node_002, ...) when trace.json is not available.
Multi-branch handling: The conversation tree may contain multiple branches (e.g., root→node_001...019 and root→node_020...029). Each branch is output as an independent timeline segment, separated by branch headers. Branches include full shared prefix nodes to ensure complete causal chains. For single-branch scenarios (linear chains), no extra labels are added and behavior is identical to the previous implementation.
root → node_001 → ... → node_019 (Branch 1)
→ node_020 → ... → node_029 (Branch 2)
Output:
## Branch 1 (19 nodes)
### node_001 Turn 1 — ask_user ...
...
## Branch 2 (10 nodes)
### node_020 Turn 1 — ask_user ...
...
3.2 Incremental LLM Compression
The collector formats each action into a section, then builds the timeline incrementally:
accumulated = ""
for section in sections:
if len(accumulated + section) > threshold (60,000 chars):
accumulated = LLM_compress(accumulated) // compress accumulated history
accumulated += section // append new section
Before adding each new section, the total length is checked. When it exceeds the threshold, the accumulated portion is compressed by LLM first, then the new section is appended. This handles conversations of any length.
LLM compression retention rules (enforced via prompt):
| Category | Handling |
|---|---|
| User responses (optimization advice) | Preserve in full |
| Generated code from code gen tools | Preserve in full |
| Performance data (gen_time, speedup, etc.) | Preserve in full |
| Tool execution status | Preserve in full |
| Agent messages (explanations, confirmations) | May compress or remove |
| Redundant tool parameters | May compress or remove |
| Repeated error messages | May compress or remove |
3.3 Pipeline
1. collect — DFS trace.json for branch paths → read actions in path order → format as section list
2. build_timeline — Incrementally append sections, LLM-compress accumulated portion when threshold exceeded
3. LLM Analysis — Timeline → self-analyze causal chains → generate SKILL.md body
4. Writer — YAML frontmatter + body → write SKILL.md
3.4 Collector and Timeline Builder
collect(conversation_dir, op_name) -> (sections, metadata)
Responsibility: Read conversation tree structure and format. Reads the tree from trace.json and DFS to find all root-to-leaf paths; falls back to node-number sorting when unavailable. Returns a list of sections (including branch headers for multi-branch scenarios), no analysis or compression.
build_timeline(sections, llm_fn, max_chars=60000, work_dir="") -> str
Responsibility: Incrementally append sections and compress as needed. Returns the final timeline text. Optional work_dir outputs intermediate files for debugging.
3.5 LLM Prompt
Template analyze_expert_tuning.j2 injects the timeline and instructs the LLM to:
- Identify which turns contain substantive optimization advice (ignoring "confirm" messages)
- Match code versions to performance data (profile_kernel corresponds to the most recent code generation before it)
- Extract "user advice → code change → performance delta" causal chains
- Generate SKILL.md body
Naming convention: skill_name follows the {dsl}-exp-{op-category}-{tuning-detail} format. source is expert_tuning.
4. error_fix Mode
Extracts "fail → success" error fix records from search logs and accumulates them into a debugging SKILL.md.
Use case: During the code generation process, code fails multiple times before being successfully fixed. These debugging insights are distilled into reusable skills to help future generation stages avoid similar errors.
4.1 Data Source
Shares the same logs/ directory as search_log mode, but focuses on different information:
| File | Content | Key Fields |
|---|---|---|
verification_results.jsonl |
Verification records (both failures and successes) | task_id, passed, verify_dir, step |
verify_dir/*_impl.py |
Failed/successful code | Code content |
Data extraction logic: For each Task, sort verification records by step and find the first passed=true entry. Take the last passed=false entry before it as the "failed version". Extract failed code, successful code, and error log.
Task verification sequence: step2(fail) → step5(fail) → step8(fail) → step11(pass)
Extraction: failed_code=step8, success_code=step11, error_log=step8's error
Full diff: Unlike search_log mode's 200-line truncation limit, error_fix generates untruncated diffs to ensure the LLM sees the complete code change.
Multi-workflow compatibility: error_fix mode only depends on verification_results.jsonl and verify_dir. task_id is used purely as a grouping key, so it works with adaptive_search (_Gen1_Task3), evolve (1_Island1_Task0), and kernelgen (0) formats alike.
4.2 Pipeline
1. collect — Parse verification_results.jsonl → find fail→success pairs per Task
→ read failed/successful code + error log → full diff (untruncated)
2. LLM Analysis — Fix cases (error_log + diff) → generate error fix experience
3. LLM Dedup — If SKILL.md exists, inject existing + new content; LLM outputs only non-redundant items
4. Writer — First run creates `{dsl}-error-fix/SKILL.md`; later runs append deduplicated incremental content
4.3 Collector
collect(log_dir, op_name) -> (records, metadata)
- Parses
verification_results.jsonl, groups bytask_id - For each Task, sorts by step and finds the last failure before the first success
- Reads failed and successful
*_impl.pyfrom correspondingverify_dir - Reads the failure step's
error_log(truncated to the last 1000 chars) - Generates an untruncated unified diff from failed to successful code
- Returns a list of
SuccessfulFixRecordand environment metadata
Data structure:
@dataclass
class SuccessfulFixRecord:
task_id: str
op_name: str
error_log: str # Truncated error log
error_step: int # Failed step number
failed_code: str # Failed version code
success_code: str # Successful version code
diff: str # Unified diff (untruncated)
dsl: str
backend: str
arch: str
4.4 LLM Prompt
Template analyze_error_fix.j2 injects all fix cases (each containing error log and full code diff).
LLM tasks:
- Classify common errors (with short titles)
- For each error, provide only error signature and fix method, with brief code comparisons
- Merge similar errors and focus on transferable, generalized fix strategies
4.5 Dedup and Writer
Dedup (dedup_error_fix.j2): When {dsl}-error-fix/SKILL.md already exists, both the existing body and newly generated content are injected into LLM. The LLM determines which items are new and outputs only the non-redundant incremental content. If everything is duplicate, it outputs "无新增内容" and writing is skipped.
Writer (SkillWriter.write_error_fix):
- Skill directory name includes DSL prefix:
{dsl}-error-fix(e.g.triton-cuda-error-fix) - Default output path:
~/.akg/evolved_skills/{dsl}/evolved-fix/{dsl}-error-fix/SKILL.md - If
--output-dir DIRis provided:DIR/{dsl}-error-fix/SKILL.md - If the file does not exist, create it with frontmatter (
name: {dsl}-error-fix,description: {dsl}常见错误及修复方法...,category: implementation,metadata.source: error_fix) - If the file already exists, append the deduplicated incremental content (preserving existing frontmatter and body)
Run 1: LLM generates → create new SKILL.md
Run 2: LLM generates → compare with existing → append only new items
Run N: Same — continuously accumulate non-redundant debugging experience
5. organize Mode
Consolidates multiple evolved skills under the same DSL by optimization theme, reducing redundant documents.
Use case: After multiple search_log and expert_tuning runs, ~/.akg/evolved_skills/{dsl}/evolved-improvement/ accumulates many skills. Skills from different operators often contain overlapping optimization techniques. Merging produces fewer, more generalized documents.
5.1 Design Constraint
Injecting all skill contents into a single LLM call would overflow the context window. The solution is a two-phase strategy:
- Summary clustering: LLM only sees
name + description(~100 chars each), not full content - Per-cluster merging: LLM is called per cluster with only the full content of that cluster's skills
5.2 Pipeline
1. scan — Scan `evolved-improvement/` under the DSL directory for all SKILL.md (excluding `.archive/`)
2. classify — Extract name + description → LLM clusters by theme (summaries only, no full content)
3. merge per-cluster — For each cluster with >=2 skills, inject full content for LLM to merge and deduplicate
Large clusters (>5) are auto-split into sub-batches with rolling merge
4. archive + write — Archive original skills to .archive/{timestamp}/, write merged SKILL.md
5.3 Classification Phase
The classify_skills.j2 template injects only skill name and description, asking the LLM to group by optimization theme and provide a reason. Output format:
{
"clusters": [
{"reason": "These skills all address memory access pattern optimization and bandwidth utilization", "skills": ["skill-a", "skill-c"]},
{"reason": "These skills focus on compute block size tuning and register pressure control", "skills": ["skill-b"]}
]
}
5.4 Merge Phase
For each cluster containing >= 2 skills:
merge_cluster.j2injects the full content of all cluster skills along with the DSL prefix, instructing the LLM to deduplicate, generalize (remove operator-specific names), unify structure, and produce askill_name(format:{dsl}-merged-{tuning-feature}) anddescription- Large cluster protection: clusters exceeding 5 skills are automatically split into sub-batches. The first 5 are merged, then the result serves as the "existing merged document" for the next batch
- Clusters with only 1 skill are left unchanged
5.5 Merged SKILL.md Format
The merged skill name and description are produced by the merge-phase LLM. Example frontmatter:
---
name: triton-cuda-merged-memory-access-optimization
description: "Merged optimization methodology..."
category: example
metadata:
source: merged
backend: cuda
dsl: triton-cuda
---
nameanddescriptionare generated by the merge-phase LLMsourceismerged
5.6 Incremental Merging
Newly generated skills continue to be written under ~/.akg/evolved_skills/{dsl}/: error_fix outputs to evolved-fix/, while search_log and expert_tuning output to evolved-improvement/. The next time organize runs:
- Both existing merged skills and newly added individual skills participate in clustering
- If a new skill is clustered with an existing merged skill → incremental merge using the merged skill as base
- If a new skill forms its own cluster → kept as an independent skill
6. Deployment and AB Testing
6.1 Symlink Deployment
Evolved skills are generated under ~/.akg/evolved_skills/{dsl}/ (in evolved-fix/ and evolved-improvement/ subdirectories). To make them discoverable by KernelGen's standard skill loading mechanism, create symlinks into the standard skills directory:
ln -s ~/.akg/evolved_skills/triton-ascend/evolved-fix \
python/akg_agents/op/resources/skills/triton-ascend/evolved-fix
ln -s ~/.akg/evolved_skills/triton-ascend/evolved-improvement \
python/akg_agents/op/resources/skills/triton-ascend/evolved-improvement
Once symlinked, KernelGen's _load_skills_by_dsl() automatically discovers evolved skills alongside built-in skills, with no code changes needed.
6.2 AB Test Mechanism
The AB test compares kernel generation performance with and without evolved skills:
- A mode (baseline):
exclude_skill_namesis set to the target evolved skill names, causing KernelGen to exclude them during selection - B mode (with evolved):
force_skill_namesis set to the target evolved skill names, ensuring KernelGen force-includes them after LLM selection
Skill name sources (highest to lowest priority):
- Explicit names (
--skill-names): Directly specify skill names to exclude/force-include. Use this to test individual skill effectiveness. - Directory scan (
--evolved-skill-dir): Scan the specified directory'sevolved-fix/andevolved-improvement/subdirectories. - Default scan: Scan the standard skills directory for evolved subdirectories.
ab_test_utils.py provides _scan_evolved_skill_names() to automatically scan evolved-fix and evolved-improvement directories and collect skill names. These are injected into the agent config as exclude_skill_names (A mode) or force_skill_names (B mode), which are then set on the KernelGen instance during LangGraphTask._init_agents().
7. File Structure
core_v2/agents/
└── skill_evolution_base.py — SkillEvolutionBase (workspace management, logging utilities)
op/tools/skill_evolution/
├── common.py — Shared types, utilities, LLM output parsing, SKILL.md writer
├── search_log_utils.py — search_log mode: collect + compress + to_prompt_vars
├── expert_tuning_utils.py — expert_tuning mode: collect + build_timeline + to_prompt_vars
├── error_fix_utils.py — error_fix mode: collect + to_prompt_vars
├── merge_utils.py — organize mode: scan, classify parsing, archive, merge writing
└── __init__.py
op/agents/skill_evolution_agent.py — SkillEvolutionAgent (inherits base, four-mode dispatch)
op/resources/prompts/skill_evolution/
├── analyze_search_log.j2 — search_log: structured evolution diffs → LLM
├── analyze_expert_tuning.j2 — expert_tuning: action timeline → LLM
├── analyze_error_fix.j2 — error_fix: fix cases → LLM
├── dedup_error_fix.j2 — error_fix: existing + new content → LLM dedup, output increments only
├── classify_skills.j2 — organize: name + description → LLM clustering
└── merge_cluster.j2 — organize: cluster skill contents → LLM merge and dedup
examples/kernel_related/skill_evolution/
├── run_skill_evolution.py — Standalone CLI script (no Agent framework dependency)
├── run_ab_test.py — A/B test batch runner
├── ab_test_utils.py — A/B test utility functions
└── tracking.md — Experiment tracking document
8. Standalone CLI Script
examples/kernel_related/skill_evolution/run_skill_evolution.py provides an Agent-framework-free entry point.
# search_log mode
python examples/kernel_related/skill_evolution/run_skill_evolution.py search_log /path/to/logs relu
# expert_tuning mode
python examples/kernel_related/skill_evolution/run_skill_evolution.py expert_tuning ~/.akg/conversations/cli_xxx relu
# error_fix mode
python examples/kernel_related/skill_evolution/run_skill_evolution.py error_fix /path/to/logs matmul
# organize mode (CLI subcommand: organize)
python examples/kernel_related/skill_evolution/run_skill_evolution.py organize triton_cuda
python examples/kernel_related/skill_evolution/run_skill_evolution.py organize triton_cuda --skills-dir /path/to/evolved -o ./merged
# With output directory and model level
python examples/kernel_related/skill_evolution/run_skill_evolution.py error_fix /path/to/logs matmul -o ./output -m complex
| Argument | Description |
|---|---|
mode |
search_log, expert_tuning, error_fix, or organize (conceptually organize in Agent calls and descriptions) |
log_dir / conversation_dir |
Log directory (search_log / error_fix) or conversation directory (expert_tuning) |
op_name |
Operator name (e.g. relu, l1norm, matmul) |
-o / --output-dir |
SKILL.md output directory |
-m / --model-level |
LLM model level (default: standard) |
9. Workspace
In Agent mode, intermediate files are saved to {cur_path}/logs/skill_evolution/. In CLI mode, the default location is ~/.akg/skill_evolution/{mode}_{op_name}/ (overridable with -o):
search_log mode:
| File | Content |
|---|---|
collected_data.json |
Task records summary (task_id, parent_id, gen_time, speedup, has_code) |
compressed_data.json |
Best record + evolution chains |
llm_prompt.txt |
Rendered LLM prompt |
llm_response.txt |
Raw LLM output |
session.log |
Execution log |
result.json |
Final result summary |
expert_tuning mode:
| File | Content |
|---|---|
action_timeline.md |
Formatted action timeline (may contain compression markers) |
llm_prompt.txt |
Rendered LLM prompt (with timeline) |
llm_response.txt |
Raw LLM output |
session.log |
Execution log |
result.json |
Final result summary |
error_fix mode:
| File | Content |
|---|---|
collected_fix_records.json |
Fix records summary (task_id, error_step, has_conductor, diff_lines) |
llm_prompt.txt |
Rendered LLM prompt (with fix cases) |
llm_response.txt |
Raw LLM output |
session.log |
Execution log |
result.json |
Final result summary |
organize mode:
| File | Content |
|---|---|
skill_summaries.json |
All skills name + description summaries |
classify_prompt.txt |
Classification LLM prompt |
classify_response.txt |
Classification LLM output |
clusters.json |
Parsed clustering result |
merge_{theme}_prompt.txt |
Per-cluster merge LLM prompt |
merge_{theme}_response.txt |
Per-cluster merge LLM output |
result.json |
Final result summary |
10. Workflow Compatibility
Different workflows produce logs with different naming conventions:
| Workflow | File Naming Example | Characteristics |
|---|---|---|
| adaptive_search | Iteration_Gen1_Task3_Step02_{op}_coder_result.txt |
Gen + Task hierarchy |
| evolve | Iteration1_Island0_Task0_Step05_{op}_coder_prompt.txt |
Island + Task hierarchy |
| kernelgen | Iteration0_Step01_{op}_kernel_gen_prompt.txt |
No Task/Island hierarchy |
Mode compatibility:
| Mode | adaptive_search | evolve | kernelgen | Notes |
|---|---|---|---|---|
| error_fix | Y | Y | Y | Only depends on verification_results.jsonl + verify_dir, independent of file naming |
| search_log | Y | - | - | Depends on lineage_graph.md + speed_up_record.txt, currently only produced by adaptive_search |
| organize | - | - | - | Does not depend on any logs; processes existing evolved skill files |
| expert_tuning | - | - | - | Depends on conversation directory trace.json / action_history_fact.json |
Note: To extend
search_logmode to evolve/kernelgen workflows, additional parsing logic for their lineage and performance files would be needed.error_fixmode is already natively compatible with any workflow that producesverification_results.jsonl.