| gemm/kernel |
Entry point for device-side execution, representing the collective orchestration and execution logic of all blocks on the NPU. |
| gemm/block |
The primary interface governing the main loop of block-level matrix multiplication and accumulation (MMAD). |
| gemm/tile |
Leverages base APIs to construct the NPU microkernels required for GEMM primitives. |
| epilogue/block |
The block-level epilogue component for GEMM, which can also be applied to non-GEMM computations. |
| epilogue/fusion |
The graph orchestrator and foundational node components for EVG. |
| epilogue/tile |
Leverages base APIs to construct the NPU microkernels required for epilogue operations. |
| TLA |
Tensor Layout Architecture. Abstracts underlying data storage details and provides generalized algorithms for multidimensional array access. |