CATLASS Project Documentation
1. Practices
Code practices that guide developers through the steps of using and developing at each level of CATLASS, gradually building the ability to perform complete operator development, testing, tuning, and model integration.
- 01_quick_start: Introduces environment setup for the template library and how to build and run the provided operator samples.
- 02_host_example_assembly: Explains host-side Matmul assembly.
- 03_kernel_development: Provides a kernel code breakdown, demonstrating mechanisms such as template assembly, arguments, params, and key functions.
- 04_block_mmad_development: Provides a block_mmad code breakdown, including template assembly mechanisms and major interface descriptions.
- 05_block_scheduler_development: Provides a block_schedule code breakdown, including template assembly mechanisms and major interface descriptions.
- 06_tile_development: Provides a tile_copy/tile_mmad code breakdown, including template assembly mechanisms and major interface descriptions.
- 07_epilogue_adaptation: Covers host/kernel layer epilogue adaptation for GEMM operators, as well as block/tile development for epilogues.
- 08_evaluation: Covers the use of debugging and profiling tools for precision issue locating and performance bottleneck analysis.
- 09_example_contribution_guide: Details the complete process for sample design, development, testing, and integration.
- 10_innovative_example_development_guide: Provides guidance on the development process of innovative samples.
- 11_matmul_optimization: Introduces basic tuning methods in the template library, including how to achieve performance gains through tiling parameter adjustment and applying different dispatch policies.
- 12_example_integration: Introduces sample adaptation and integration into the whole network (to be contributed).
- evaluation (folder): Debugging-related
- ascendc_dump
- msdebug
- performance_tools
- precision_analysis_basics: precision analysis basics
- precision_debug: sample precision issue locating
- bottleneck_analysis_and_optimization: performance bottleneck analysis and optimization
- others (folder): internally and externally contributed practice documents that are difficult to categorize
- tla_rebuild: TLA sample refactoring (to be contributed)
- migration_from_atlasA2_to_Ascend950_guideline: Recommended solution for the compatibility of existing operators on the Atlas A2 platform with Ascend 950
- conv_kernel_development: Conv operator development guide
- conv_kernel_optimization: Conv operator performance optimization
- FA_kernel_optimization: FA operator performance optimization
- fused_kernel_optimization: CV fused operator performance optimization cases
- kernel_execution:
<<<>>>Direct calls on new operators
2. Design
- 00_project_overview: Project introduction, layered modular design, and code repository structure design
- 01_kernel_design: Algorithm design
- 00_basics (folder): CATLASS development basics
- atlasA2_hardware_info: Atlas A2 hardware information
- atlasA2_gemm_instruction_set: Hardware instruction set related to Atlas A2 GEMM samples
- 01_example_design: Overview of sample design documents in the repository (each sample's document is placed in its own sample folder; this document only provides a summary and index).
- 02_swizzle: Basic introduction to the
Swizzlepolicies in the template library, which affects the order of basic blocks on AI Cores. - 03_dispatch_policies: Introduction to
DispatchPolicy, an important template parameter inBlockMmadat theBlocklayer. - 04_matmul_summary: Introduction to the existing
matmultemplate design in theexamplesdirectory of the template library, including the sample template list, theoretical template list, engineering optimization list, and brief introduction to template application. This document can be used as a reference for matmul performance tuning. - 05_aswt: Description of the adaptive sliding window tiling policy.
- 06_quant_summary: Low-precision topics (to be contributed)
- 00_basics (folder): CATLASS development basics
- 02_tla:
- 01_layout: Layout structure and related interfaces for TLA
- 02_layout_tag: Layout tags such as RowMajor, ColumnMajor, zN, and nZ and related interfaces, that is, the legacy layout structure
- 03_tensor: Tensor structure
- 03_evg:
- 01_evg_design: EVG positioning, layering, execution model, and graph organization
- 02_evg_extension: EVG extension conventions, describing when to add ComputeFn, when to add nodes, and the constraints to follow during implementation
- 03_evg_quick_start: Description on the EVG integration process using
Matmul + Addas an example.
3. APIs
- README: API list
- gemm api: Gemm APIs
- evg_api: Description on the EVG integration mode, parameter order, and common nodes
- Ascend C API: Ascend C API list
Appendix
External articles and videos
- Q&A
- Technical articles
- Fundamentals
- Concept understanding
- Troubleshooting
- Performance optimization
- Best practices
- Training videos
- Ascend community online courses: Get to know Ascend through structured course videos. Recommended courses:
- Ascend Training Camp on CATLASS
- Basic Concepts of CATLASS in One Stop:
Ascend CANN[Code Power] Lecture 1 in CATLASS Learning Series. This lecture provides a comprehensive introduction to CATLASS, including its general design, quick start guide for operators, development overview, and community contribution. - Hands-on CATLASS Operator Development:
Ascend CANN[Code Power] Lecture 2 in CATLASS Learning Series. Using the basic Matmul operators as an example, this video provides a comprehensive introduction to NPU-based matrix multiplication theoretical modeling and code implementation (at the host, kernel, and block layers).
- Basic Concepts of CATLASS in One Stop: