CATLASS
⚠ Important Changes
At the first community meeting in March 2026, we officially confirmed that the CATLASS community mainline will add support for the next-generation Ascend hardware Ascend 950PR/Ascend 950DT. To distinguish the underlying interface implementations on different platforms, this new support will introduce a new compilation macro. Users need to adapt the corresponding build commands accordingly.
-
New macro:
CATLASS_ARCH, used to specify the target architecture. You can query its value in SIMD BuiltIn Keywords (the__NPU_ARCH__column).Atlas A2 Training Series Products / Atlas A2 Inference Series Products:2201Atlas A3 Training Series Products / Atlas A3 Inference Series Products:2201Ascend 950PR/Ascend 950DT:3510
-
Related scenario descriptions:
bishengcommand-line scenario:bisheng ... -DCATLASS_ARCH=2201 ...cmakescenario:add_compile_definitions(CATLASS_ARCH=2201)msopgen/aclnnproject scenario:- Old usage:
add_ops_compile_options(ALL OPTIONS -DCATLASS_ARCH=2201 ...) - New usage:
npu_op_kernel_options(ascendc_kernels ALL OPTIONS -DCATLASS_ARCH=2201)(in an msopgen project, the first parameter defaults toascendc_kernelsand can be adjusted as needed)
- Old usage:
- CATLASS source repository:
bash scripts/build.sh -DCATLASS_ARCH=2201 ... - Code reference in the library: examples/CMakeLists.txt
Latest News
-
[2026/04] Community edition v1.5.0 released: added Ascend 950 series examples, such as Basic Matmul, Flash Attention Inference, and Per-Group & Per-Block Quant Matmul TLA; enhanced TLA capabilities, including
origin_shape,TileView, and more; added 103 Dynamic W8A8 Per-Token Quantization to the Matmul Generalization Project. -
[2026/03] The community mainline officially started to add support for the next-generation Ascend hardware Ascend 950PR/Ascend 950DT.
-
[2026/02] Community edition v1.4.0 released, adding examples such as StreamK Matmul, W4A4 Matmul, and Sparse Matmul.
-
[2025/12] Community edition v1.3.0 released, supporting
FixPipeInline Quantization, adding multiple templates to the Matmul Generalization Project, and adding examples such as INT4 Dequantization and 2D Convolution. -
[2025/10] Community edition v1.2.0 released, adding examples such as Matmul Operator Generalization.
-
[2025/09] The CATLASS template library was officially open sourced.
See CHANGELOG for detailed updates in current and historical versions.
📌 Introduction
CATLASS (CANN Templates for Linear Algebra Subroutines), known in Chinese as the Ascend Operator Template Library, is a code repository focused on providing base templates for high-performance matrix multiplication operators.
CATLASS templates matrix operator code through layered abstraction. Therefore, it enables white-box assembly of operator compute logic and makes operator code reusable, replaceable, and partially modifiable. It is designed for Ascend hardware characteristics and supports complex pipeline layouts for operators such as Flash Attention. In addition, it shares upper-layer code logic while supporting specialization for differences in underlying hardware.
The template library enables fast development for custom scenarios. It provides performance optimization modules for different scenarios, so developers can assemble and customize them. Under custom shapes, its performance can reach 0.98 to 1.2 times the benchmark performance of the corresponding operator.
This repository is the co-created repository for CATLASS. It combines the strengths of the Ascend ecosystem to jointly design and develop operator templates, and provides high-performance implementation code examples for typical operators. For an overview, see here.
⚡️ Quick Start
To quickly try CATLASS operator development and usage, see the following content.
-
Quick Start: Quickly get started with the template library, and compile and run existing operator examples.
-
Basic Development Guide: Uses the basic Matmul operator as an example to introduce CATLASS-based operator development practices.
-
Developer Practices: Provides practice examples from writing code at each operator layer to compilation and testing, then to Tiling tuning and operator optimization, from beginner to advanced levels.
📚 Advanced References
The following materials can help you further develop and tune CATLASS operators and implement GEMM-class operators with better performance.
-
CATLASS API: Introduces the layered features of CATLASS and the general matrix multiplication GEMM API.
-
CATLASS Design Summary: Summarizes documents such as example algorithm design, swizzle strategies, and TLA design in the CATLASS project.
📁 Directory Structure Description
The key directories are as follows. For the detailed directory structure, see Project Directory.
catlass
├── cmake # cmake project files
├── docs # Documentation directory
├── examples # Root directory for kernel operator examples
| ├── 00_basic_matmul # Single-operator example
| | ├── basic_matmul.cpp # Host-side operator invocation
| | ├── CMakeLists.txt
| | └── README.md # Operator description example
| ├── ...
| └── python_extension # Project component for calling CATLASS operators from Python
├── include # Template header file set
| ├── catlass # Operator implementation logic at different layers
| └── tla # Basic data structures related to computation
├── scripts # Build scripts
| └── build.sh # Operator example build script
├── tests # Test cases
└── tools # Related tools
└── tuner # Tiling auto-tuning tool
💻 Software and Hardware Requirements
CATLASS depends on the following software and hardware environments:
- Ascend products:
- CPU architecture:
aarch64/x86_64 - System: Linux supported by CANN (perform a compatibility query)
- Software dependencies:
gcc>= 7.5, < 13.0cmake>= 3.16python>= 3.8, < 3.12
The hardware platforms supported by different CATLASS releases and the required minimum CANN versions are shown in the following table:
| CATLASS Community Version | Minimum Supported CANN Package Version | Supported Ascend Products |
|---|---|---|
| Current | 8.5.0 9.0.0.beta2 (Ascend 950PR/Ascend 950DT) |
Atlas A2 Training Series Products / Atlas A2 Inference Series Products Atlas A3 Training Series Products / Atlas A3 Inference Series Products Ascend 950PR/Ascend 950DT |
| v1.5.0 | 8.2.RC1 9.0.0.beta2 (Ascend 950PR/Ascend 950DT) |
Atlas A2 Training Series Products / Atlas A2 Inference Series Products Atlas A3 Training Series Products / Atlas A3 Inference Series Products Ascend 950PR/Ascend 950DT |
| v1.4.0—v1.2.2 | 8.2.RC1 | Atlas A2 Training Series Products / Atlas A2 Inference Series Products Atlas A3 Training Series Products / Atlas A3 Inference Series Products |
| v1.2.1—v1.0.0 | 8.2.RC1.alpha002 | Atlas A2 Training Series Products / Atlas A2 Inference Series Products Atlas A3 Training Series Products / Atlas A3 Inference Series Products |
The following environments have been tested and support building current CATLASS:
| System | CANN |
gcc |
cmake |
python |
|---|---|---|---|---|
| Ubuntu 20.04.5 | 8.5.0 | 9.3 | 3.16 | 3.10 |
| Ubuntu 22.04.5 | 8.5.0 | 11.3 | 3.22 | 3.10 |
| openEuler 22.03 SP4 | 8.5.0 | 10.3 | 3.22 | 3.10 |
| Ubuntu 22.04.5 (Compiling 950 Examples) | 9.0.0.beta2 | 11.3 | 3.22 | 3.10 |