Ascend C Multi-Level Programming Interface Selection Guide
Welcome to use Ascend C for Ascend AI processor operator development. Ascend C not only provides complete programming capabilities for extreme performance but also enables you to flexibly choose the most suitable API based on project requirements, team skills, and performance goals through multi-level programming API design. You can achieve the best balance between development efficiency and runtime performance.
Design Goals
The design goals of Ascend C can be summarized as "high performance, completeness, ease of programming, debuggability, and compatibility". Through minimal extensions to C/C++ language standards, it supports both pointer-based C language development habits and Tensor-based C++ programming paradigms. Ascend C achieves seamless integration with existing ecosystems while supporting efficient Ascend operator development, ensuring consistent development experience.
We adhere to the following core principles:
- No silver bullet: Different scenarios have varying requirements for performance and development efficiency. A single interface cannot achieve optimal adaptation in all scenarios.
- Progressive learning: Beginners can start with easy-to-use interfaces for rapid algorithm verification. Experts can drill down for fine-tuning and leverage complex interfaces to fully unleash hardware potential.
API Levels
Ascend C provides three types of interfaces, all supporting complete low-level programming capabilities:
| API Level | Language | Features | Target Users | Main Purpose |
|---|---|---|---|---|
| Tpipe/Tque Framework Programming API | C++ | Based on Tensor programming Manages memory and synchronization through Tpipe/Tque framework |
Operator library developers | Leverage framework for automatic synchronization and memory management, improve programming ease of use |
| Basic API | C++ | Based on Tensor programming, provides C++ complete programming capabilities Allocates Tensor through MakeTensor / LocalMemoryAllocator, manages synchronization independently |
Operator library developers | Independently manage synchronization and memory, match C++ Tensor development habits, support extreme performance |
| Language Extension Layer SIMD & SIMT API |
C | Based on pointer programming, provides C complete programming capabilities Allocates memory through array [], manages synchronization independently |
Operator library developers | Independently manage synchronization and memory, match C language development habits, support extreme performance |
In addition, Ascend C provides high-level APIs and operator template libraries to further improve operator development efficiency.
| API Level | Target Users | Main Purpose |
|---|---|---|
| Operator Template Library (CATLASS / ATVOSS, etc.) | Algorithm developers | Customize and extend based on typical operator implementations for high-performance requirements in specific scenarios |
| High-level API | Algorithm developers | Reuse common single-core algorithms for rapid algorithm verification |
How to Quickly Choose the Right API Level?
The following decision flowchart helps you quickly locate the most suitable API level:
We recommend developing all operators based on <<<>>> invocation and Host/Device hybrid compilation
graph TD
A[**Ascend C Operator Development**] --> B[**1. Ease of use priority, performance insensitive**]
A --> C[**2. Extreme performance priority**]
A --> D[**3. Balance performance and ease of use**]
B -->|Other operator types| E[**SIMD C API**<br>(with sync suffix computation interface)]
B -->|Familiar with SIMT, discrete vector operators| F[**SIMT API**]
C -->|Discrete vector operators| F
C -->|Prefer **pointer programming**| P[**SIMD C API**]
C -->|Prefer **C++ Tensor programming & independent synchronization/memory management**| I[**Basic API**]
C -->|Prefer **C++ Tensor programming & automatic synchronization/memory management**| J[**Tpipe/Tque Framework Programming API**]
D -->|Reuse common algorithms, generalization priority| K[**High-level API**]
D -->|Typical operators, high performance in specific scenarios| L[**Operator Template Library**]
You can also refer to the following key dimensions for quick decision-making:
| Key Factor | Recommended Level | Reason |
|---|---|---|
| Discrete vector operators | SIMT API | Fully leverage SIMT advantages in discrete scenarios while matching industry programming habits |
| Pointer-based complete programming capabilities | SIMD C API | Match C language development habits, support extreme performance |
| C++ Tensor-based complete programming capabilities | Basic API | Match C++ Tensor development habits, support extreme performance |
| Rapid algorithm verification | High-level API or Operator Template Library | Encapsulates well-generalized implementations of common algorithms, high development efficiency |
Detailed Level Introduction
Language Extension Layer C API (SIMD & SIMT)
Features
- Matches traditional C language operator development habits in the industry. Supports array memory allocation, pointer computation interfaces, and uses
asc_xxxprefix snake_case naming style. - SIMT API programming model follows industry-common development practices, reducing learning curve.
- SIMD API provides easy-to-use continuous computation interfaces. It supports most operator development needs, for example,
asc_add(__ubuf__ half* dst, __ubuf__ half* src0, __ubuf__ half* src1, uint32_t count). - To improve ease of use for beginners, SIMD API simplifies synchronization management. It additionally provides synchronization operation interfaces with
_syncsuffix, such asasc_add_sync(...). - For extreme performance scenarios, SIMD API provides advanced computation interfaces with
repeat/strideparameters. These support flexible control of data layout and computation mode.
Applicable Scenarios
- Operator developers familiar with traditional C language development habits.
- Developers with SIMT programming experience who want to quickly migrate to NPU environments.
- Production environment operator development that requires extracting extreme hardware performance.
- Developers who want to quickly verify algorithms using computation interfaces with
syncsuffix.
Examples
- SIMD Add Operator Example (with Synchronous Computation Interface)
- SIMT Gather Operator Example (Matching Industry Habits)
- For more examples, refer to examples directory
Basic API: Tensor-based Single Instruction Abstraction
Features
- Abstracts NPU instructions based on Tensor and data types, provides Tensor programming model.
- Provides memory allocation and synchronization interfaces independent of
Tque/Tpipe. Developers can manage resources based on Tensor independently. - Extends Tensor to support
Layoutconcept. Simplifies computation interfaces through unified data layout representation, maintaining consistency with industry Tensor programming experience. - Framework Programming API: Introduces
Tque/Tpipeframework, borrowing C++Queuedesign concept to simplify NPU synchronization and memory management.
Applicable Scenarios
- Operator developers familiar with industry C++ Tensor development habits.
- Scenarios that require developing extreme performance operators in production environments while maintaining code maintainability and extensibility.
Examples
- SIMD Add Operator Example with Tque / Tpipe Automatic Memory and Synchronization Management
- SIMD Add Operator Example with LocalMemoryAllocator Independent Memory and Synchronization Management
- Tensor API Example Based on Layout (to be added)
High-level API: Single-core Common Algorithm Implementation
Features
- Encapsulates common single-core algorithm implementations, provides good generalization performance.
- Can also achieve near-extreme performance in typical network scenarios.
Applicable Scenarios
- Quickly verify algorithm feasibility without extreme performance requirements for specific scenarios.
- Scenarios that want to reuse mature algorithm implementations and shorten development cycles.
Examples
Operator Template Library: Operator Implementation Samples
Features
- Provides end-to-end complete implementations of typical operators in specific scenarios as best practice references.
- Typically optimized for specific scenarios. Generalization performance is not the primary goal.
Applicable Scenarios
- Need to customize and extend typical operators to quickly adapt to specific business scenarios.
Examples
Summary
The core concept of Ascend C multi-level interface design is: always use the most appropriate programming paradigm rather than passively adapting to a single abstraction. Whether you are a low-level expert pursuing extreme performance or a prototype developer hoping to quickly verify algorithms, you can find handy tools in the Ascend C hierarchical API ecosystem.
Start your operator programming journey now! If you have questions, refer to Ascend C detailed documentation or community examples. We continue to make NPU powerful computing capabilities accessible and efficient for you.