文件最后提交记录最后更新时间
2 天前
2 天前
2 天前
3 天前
2 天前
2 天前
2 天前
README.md

GE Architecture Documentation

This documentation set introduces GE (Graph Engine) architecture design from different dimensions, targeting developers who want to contribute code to GE, helping quickly understand the overall project structure, core design decisions, and implementation details of each module.

Architecture Overview

Document Description
GE Architecture Introduction System architecture overview, AscendIR introduction, compilation optimization, plugin extension mechanism

Module Architecture Documents

Document Description
AscendIR Detailed design of AscendIR graph intermediate representation
Compiler GE Compiler compilation flow, optimization passes, engine partitioning, build stages
Runtime GE Executor model loading, Sink mode, Hybrid execution, v2 architecture

Feature Design Documents

The following documents describe cross-module feature designs:

Document Description
Dump Module Dump module overall design: architecture layering, RT1.0/RT2.0 adaptation, HCCL processing, dynamic switch
External Weight FileConstant feature: weight separation from OM, compile-time Const→FileConstant conversion, RT V1/V2 loading flow, memory management, global weight manager
Constant Folding Constant folding optimization: compile-time constant expression evaluation, dimension calculation, empty tensor replacement, delayed effect mechanism, multi-stage compilation pipeline
Fusion Pattern Pass Fusion Pattern Pass mechanism: PatternFusionPass / DecomposePass matching, filtering, replacement, execution stages and Python/C++ integration
Dynamic Gear Dynamic gear feature: dynamic Batch / dynamic resolution / ND arbitrary dimension modes, gear enumeration, static subgraph generation and runtime dispatch
Memory Conflict Handling Memory conflict protection system: semantic read-write conflict, memory layout conflict, subgraph address isolation, Inplace reuse conflict, multi-stream concurrency management
Model Cache Compilation result persistence mechanism: graph compilation cache, JIT compilation cache, operator model cache three-level system, cache hit and invalidation strategies
Profiling Performance collection and observability: layered collection architecture (API/Host/Device), on-demand enablement, msprof unified reporting
SO in OM Operator self-contained packaging: packaging dependent operator .so files into OM on demand, eliminating runtime dependency on OPP operator packages
TensorMove Elimination TensorMove redundant node elimination optimization: identify and delete redundant memory copy nodes, O3 optimization level
Variable Management Variable lifecycle management: registration, memory allocation, format conversion, logical address mapping, serialization/deserialization full flow
Zero Copy Zero copy feature: input zero copy (eliminate H2D), output zero copy (eliminate D2H/D2D), compile-time planning and runtime execution
Concat No Task Concat continuous memory optimization: compile-time identification of continuous input Concat operators, mark as virtual operators to skip Task generation and memory movement
GE Local Operator GE Local engine: dedicated engine for non-computation nodes (Data, Constant, control flow, shape transformation, etc.), zero runtime computation overhead
Engine Engine system: plugin-based engine architecture, priority-driven automatic selection, compile-time engine registration and partitioning, runtime dispatch
Tiling Sink Tiling sink feature: move Tiling computation from Host to Device AICPU execution, eliminate Host-Device synchronization overhead
Graph Splitter Graph split feature: static/dynamic Shape split, engine-level split, pipeline stage split, JIT incremental split
Static Executor Static subgraph executor: Task Sink pre-dispatch, DavinciModel loading/execution, hybrid execution mode address refresh
Dynamic Executor RT2.0 dynamic Shape executor: Lowering mechanism, ExecuteGraph, ModelV2Executor, three-subgraph lifecycle, Kernel registration system
Stream Allocator Stream allocation feature: logical stream allocation, synchronization event management, physical stream split, stream activation mechanism
InferShape Shape inference: OriginShape/StorageShape dual system, compile-time InferShapePass, runtime inference node, symbolic inference
Format Inference Format inference: OriginFormat anchor propagation, StorageFormat automatic selection, TransData insertion optimization

Module Key Design Principles and Software Constraints

The following documents record key design constraints and development standards for features:

Document Description
Memory Module Software Constraints Static/dynamic memory reuse, Allocator threading model, memory release timing, process exit cleanup
RT2 Runtime Design Principles RT2 dynamic Shape module design principles: loading/execution rules, performance, compatibility, concurrency, debuggability
Graph Split Module Design Principles Graph split module design principles: responsibility boundaries, split basis, multi-threading concurrency, debugging logs, compatibility, review checklist
Stream Allocator Design Principles Static/dynamic Shape stream allocation design: Pass architecture, stream reuse, Event synchronization, stream activation mechanism
Static Shape Runtime Design Principles Static Shape module design principles: performance optimization, ArgsFormat, address refresh strategy, memory management
Graph Foundation Structure Design Principles Graph compilation common foundation structure design principles: independence, compatibility, observability, concurrency model, cross-platform consistency