SuperKernel Migration Design
Overview & Purpose
This document analyzes the SuperKernel and AscendC decoupling solution, migrating SuperKernel implementation from the current AscendC repository and package to Graph-autofusion.
Through decoupling, SuperKernel can evolve independently. SuperKernel new features and bugfixes do not depend on upgrading AscendC.
Unless otherwise specified, this document mentions AscendC includes both AscendC project and AscendC API. AscendC project refers to AscendC compilation project infrastructure with compile_op as entry point; AscendC API refers to AscendC API class library used by AscendC developers.
Proposed Implementation
To extract SuperKernel, the following content needs consideration:
- SuperKernel project compilation entry extraction:
ascendc_super_kernel.ascendc_super_kernel_plusas entry point, compilation project infrastructure specifically for SuperKernel, extracted toGraph-autofusionrepository - gentask logic decoupling: CANN component graph mode adaptation
- Single operator AscendC project SuperKernel adaptation decoupling:
compile_op.compile_opas entry point, branches judging for SuperKernel, feature-specific processing, or other reduction measures. Target iscompile_op.compile_opdoes not sense SuperKernel scenarios - AscendC API SuperKernel decoupling: AscendC API and SuperKernel coupling mainly manifests in API implementation judging SuperKernel compilation macros and special processing. After decoupling, target is all AscendC APIs should not sense SuperKernel scenarios
From preliminary analysis:
- Content 1 (SuperKernel project compilation entry extraction), 2 and content 3, 4 are relatively independent, can proceed in phases, reducing risk
- Content 3, 4 consists of 17 coupling points, can parallel analyze, design, implement
1 SuperKernel Project Compilation Extraction
Extracted Files and Functions Overview
With ascendc_super_kernel.ascendc_super_kernel_plus as entry point, the following files are SuperKernel feature-specific, therefore extracted to Graph-autofusion:
| File Name | Function | New File |
|---|---|---|
| ascendc_super_kernel.py | SuperKernel compilation entry, includes parameter parsing, codegen wrapper code, compilation | super_kernel.py (entry changed from ascendc_super_kernel_plus to compile) |
| ascendc_super_operator_infos.py | SuperKernel top-level data structure | super_kernel_op_infos.py |
| ascendc_sub_super_kernel.py | Sub-kernel information storage | super_kernel_sub_op_infos.py |
The following functions are SuperKernel-specific, regular operator compilation flow does not use, therefore extracted to Graph-autofusion:
| Function Name | Function | New File |
|---|---|---|
| CommonUtility.is_support_super_kernel | Determine if current chip type supports SuperKernel | super_kernel_utils.py |
| ascendc_constants.SubOperatorType | Sub-operator dynamic/static types, where STATIC_OP_WITH_SEND\STATIC_OP_WITH_RECEIVE can be deleted | super_kernel_types.py |
| ascendc_constants.STR_TO_SUPER_TASK_TYPE | string type mapping ascendc_constants.SubOperatorType | super_kernel_types.py |
| ascendc_constants.SuperKernelPreLoadMode | Sub-kernel instruction prefetch mode selection | super_kernel_types.py |
| ascendc_constants.SuperKernelLinkMode | Sub-kernel link mode selection | super_kernel_types.py |
| ascendc_constants.SuperKernelEarlyStartMode | Sub-kernel early-start mode selection | super_kernel_types.py |
| ascendc_constants.SuperKernelStreamFusionMode | Whether to enable dual stream | super_kernel_types.py |
| ascendc_constants.SuperKernelDebugDcciAllMode | Whether to add entry cache dcci at sub-operator end | super_kernel_types.py |
| ascendc_constants.SuperKernelDebugSyncAllMode | Whether to add full-core synchronization at sub-operator end | super_kernel_types.py |
| ascendc_constants.SuperKernelFeedSyncAllMode | Whether to support full-core hard synchronization when operator not full-core | super_kernel_types.py |
| ascendc_constants.SuperKernelDataCacheMode | Sub-kernel data prefetch mode | super_kernel_types.py |
| ascendc_constants.SuperKernelProfilingMode | Support SuperKernel profiling data | super_kernel_types.py |
| ascendc_compile_gen_code.gen_super_dump_code | Generate SuperKernel dump initialization code | super_kernel_types.py |
| ascendc_compile_base.gen_symbol_rename_file | After splitting sub-kernel, rename | super_kernel_compile_base.py |
| ascendc_common_utility.gen_func_align_attribute | Generate alignment attr | super_kernel_compile_base.py |
| ascendc_compile_base.split_dynamic_o_in_super_kernel | Split sub-kernel into multiple binaries in SuperKernel | super_kernel_op_infos.py |
| ascendc_common_utility.parse_super_kernel_options | SuperKernel parameter parsing and validation | super_kernel_options.py |
| OptionParser | Same as above | super_kernel_options.py |
| ParserFactory | Same as above | super_kernel_options.py |
| CodeTextAlignParser | Same as above | super_kernel_options.py |
| EunmParser | Same as above | super_kernel_options.py |
| BinaryParser | Same as above | super_kernel_options.py |
| NumberParser | Same as above | super_kernel_options.py |
| NonEmptyParser | Same as above | super_kernel_options.py |
| setup_super_kernel_option_parsers | Same as above | super_kernel_options.py |
The following functions are shared by SuperKernel and regular operators, or tightly coupled, dependencies (possibly) unstable, first extracted, temporarily not migrated, consider analyzing in subsequent phase feature decoupling (no longer used, API standardization, or other measures)
| Function Name | Function | New File |
|---|---|---|
| ascendc_compile_base.super_kernel_gen_entry | Top-level SuperKernel entry generates quartered code | super_kernel_gen_entry_code.py |
| ascendc_compile_base.gen_spk_kernel_call | Same as above | super_kernel_gen_entry_code.py |
| ascendc_compile_base.split_spk_kernel_objs | Same as above | super_kernel_gen_entry_code.py |
| ascendc_compile_base.super_kernel_gen_entry | Same as above | super_kernel_gen_entry_code.py |
| ascendc_compile_base.gen_file_header | Same as above ps: generates dual page table global variables (implementation only supports v220, need consider macro isolation for different chip versions, can reference ascendc_compile_gen_code.gen_global_isolation_macro) and only supports split_mode=1, dual page table takes effect | super_kernel_gen_entry_code.py |
| ascendc_compile_base.gen_system_run_cfg | Same as above | super_kernel_gen_entry_code.py |
| ascendc_compile_base.localize_symbol_of_sk | Link all SuperKernel binaries | super_kernel_compile.py |
| ascendc_compile_base.gen_super_kernel_compile_info | Generate SuperKernel compile info | super_kernel_compile.py |
| ascendc_compile_base.gen_sub_super_kernel_early_start_compile_options | Process sub-kernel early-start feature compilation options | super_kernel_compile.py |
| ascendc_compile_base.gen_sub_super_kernel_compile_options | Process sub-kernel compilation options | super_kernel_compile.py |
| ascendc_compile_base.split_sub_kernel_obbjs | Sub-kernel splits into four object files | super_kernel_compile.py |
| ascendc_compile_base.split_kernel_arch_str | Same as above | super_kernel_compile.py |
| ascendc_compile_base.split_kernel | Same as above | super_kernel_compile.py |
| ascendc_compile_base.gen_super_kernel_link_obj_sequence | Get all sub-kernel binaries | super_kernel_compile.py |
| ascendc_compile_base.localization_sub_op_func_sym | After splitting sub-kernel binary, modify sub-kernel global symbols to local | super_kernel_compile.py |
| ascendc_compile_base.localize_symbol_of_sk | Modify SuperKernel global symbols to local | super_kernel_compile.py |
| compile_op.compile_super_kernel | SuperKernel compilation entry | super_kernel_compile.py |
The following interfaces are used in ascendc_sub_super_kernel.py, ascendc_super_kernel.py, ascendc_super_operator_infos.py files, need to add interface encapsulation in super_kernel_utils.py, avoid AscendC open source rectification introducing issues affecting SuperKernel functionality:
| Function Name | Function | New Interface |
|---|---|---|
| tbe.common.platform.platform_info.get_soc_spec | Get current chip type | get_soc_version |
| tbe.tvm.error_mgr.raise_tbe_python_err | Exception throwing | super_kernel_raise_err |
| tbe.tvm.error_mgr.TBE_DEFAULT_PYTHON_ERROR_CODE | Exception code | "EB0500" |
| tbe.common.buildcfg.get_current_build_config | Get compilation context information | get_compile_config_info |
| tbe.common.buildcfg.buildcfg.op_debug_config | Get log level | Integrate into get_compile_config_info interface |
| log_utils and related interfaces | Log templates | Use directly |
| ascendc_common_utility and related interfaces | Utility templates | Directly use in current phase |
| global_storage.global_var_storage | Global variable setting | Analyze if can delete |
If any functions are not covered, divide according to above principles.
Graph-autofusion Directory Planning
Create independent directory for SuperKernel under src directory:
├── doc
├── README.md
└── src
└── superkernel
├── sub_operator_infos.py --> functions from original ascendc_sub_super_kernel.py migrate to this file
├── super_kernel.py --> functions from original ascendc_super_kernel.py migrate to this file
├── super_operator_infos.py --> functions from original ascendc_super_operator_infos.py migrate to this file
├── super_kernel_compile_base.py --> functions from original ascendc_compile_base.py migrate to this file
└── super_kernel_constants.py --> functions from original ascendc_constants.py migrate to this file
Principles:
- File migration: whole file migration to
src/superkerneldirectory - Function/class migration: file prefix changes from
ascendc_tosuper_kernel_, new file created insrc/superkerneldirectory
Installation Directory
Install in site-packages.tbe.superkernel
2 GE Adaptation
todo GE/FE refactoring adaptation according to SuperKernel interface boundary in Graph-autofusion
3 Feature Decoupling
todo