GE Memory Constraints Document
Static Memory Reuse
Code Location: compiler/graph/build/memory/
Constraint 1: Graph compilation module memory reuse phase forbids graph modification
Precise boundary:
- Scope of graph modification prohibition:
BlockMemAssigner::AssignMemoryWithReuseimplementation and all functions it triggers - Multi-threaded entry:
HybridMemAssigner::Assignstarts multiple threads, concurrently calling AssignMemoryWithReuse - Explicitly prohibited: Adding, modifying, deleting attributes on ComputeGraph's Node
- Safe operations: Reading attributes/OpDesc is safe (system reads OpDesc extensively during traversal to judge memory allocation strategy)
Constraint 2: Dynamic multi-batch scenario impact analysis
- Code entity:
DynamicBatchMemAssigner(dynamic_batch_mem_assigner.h) - Meaning: System identifies different batches through
batch_label_(set by user or GE upper framework), supports different memory reuse strategies between different batches - Conflict with static memory reuse:
- Continuous input memory in different batches will be merged into one large block for alignment
- No reuse within/between batches, but alignment strategy between batches exists, leading to possibly lower memory usage efficiency
- Maximum split size limit:
kMaxSplitSizeForDynamicBatch = 400MB(dynamic_batch_mem_assigner.h)
Constraint 3: Scenarios to consider for static graph memory new features
| Scenario | Code marker/basis | Impact on reuse |
|---|---|---|
| Continuous memory | continuous_block_ (block_mem_assigner.h), ContinuousMemMng (continuous_mem.cc) |
Supports memory merging for continuous input nodes; continuous memory in different batches can merge and reuse |
| Atomic centralized zero-clear | atomic_addr_clean_id_ (block_mem_assigner.h) |
Memory blocks needing atomic zero-clear cannot be reused by other nodes; if node has no related attribute, skip zero-clear |
| Zero copy | is_zero_copy_ (block_mem_assigner.h), IsNodeAndPeerNodeTaskSupportZeroCopy (block_mem_assigner.cc) |
Zero copy blocks can be reused across nodes (IsRealSizeReuseBlock); zero copy memory cannot merge (multiple user input addresses may be discontinuous) |
| Immutable address output | is_fixed_addr_prior_ (block_mem_assigner.h) |
Output addresses of constant/const/variable/fileconstant/constplaceholder type operators are fixed at compile time, fixed address priority memory blocks can be reused but addresses are immutable |
| Operators not supporting address refresh | HCOM/rtsStreamSwitchByIndex etc | Input/output addresses of these operators must be stable, cannot use zero copy |
| P2P memory type | RT_MEMORY_P2P_DDR (block_mem_assigner.cc) |
P2P memory cannot merge zero-clear with other memory types (graph_mem_assigner.cc) |
Constraint 4: Particularity of HCOM operators
- Meaning of "continuous": Logically continuous, not physically continuous. Outputs of multiple HCOM operators form continuous memory region logically, managed through
ContinuousMemMngmanager for allocation and reuse. —continuous_mem.cc - featureBaseRefreshable configuration:
- Get method:
ge::GetContext().GetOption(ge::OPTION_FEATURE_BASE_REFRESHABLE, refreshable)—block_mem_assigner.cc - Member variable:
is_feature_map_refreshable_(block_mem_assigner.h) - Default value:
false, set totruewhen configuration value is "1" - Effect: Controls whether feature map is refreshable, affects
IsNoNeedAssignMemoryjudgment
- Get method:
Constraint 5: Other constraints
-
PreAssign/SetOpMemOffset not thread-safe: Can only be called by single thread, other concurrent operations need attention. —
block_mem_assigner.h -
Alignment strategy difference: Zero copy memory uses 32-byte alignment, others use 512-byte alignment. —
graph_mem_assigner.h -
Subgraph NETOUTPUT special handling: NETOUTPUT nodes in subgraphs cannot perform zero copy. —
block_mem_assigner.cc -
Multi-batch shape data node constraint: Multi-batch shape data nodes do not support zero copy. —
block_mem_assigner.cc -
Suspended memory block management: Suspended memory blocks are released during next node allocation, lifecycle managed through
life_time_begin_andlife_time_end_, cannot be modified once set. —block_mem_assigner.h -
Reuse strategy configurability: Supports dynamic configuration through parameters like
use_range_,ascending_sort_,reuse_first_release_,memory_priority_mode_. —block_mem_assigner.h
Dynamic Memory Reuse
Code Location:
- v2 layer:
runtime/v2/kernel/memory/allocator/(ScalableAllocator, MemoryPool) - v1 layer:
runtime/v1/graph/manager/active_memory_allocator.h(ActiveMemoryAllocator, ExpandableActiveMemoryAllocator, PhysicalMemoryAllocator) - Bridge layer:
runtime/v2/kernel/memory/device/device_allocator.h(DeviceAllocator)
Constraint 1: ScalableAllocator does not support multi-threaded concurrency
- Code location:
runtime/v2/kernel/memory/allocator/scalable_allocator.h - Lock-free design basis: Class internally has no
std::mutexorstd::recursive_mutex, onlystatic std::atomic_size_t global_allocator_id_for generating unique ID (scalable_allocator.h) - Safety guarantee method: Guaranteed by
aclmdlExecutecall constraint for single-threaded calling (seedocs/graph_engine_api/aclmdlExecute.mdfor details), underlying allocator guarantees thread safety through recursive_mutex - Underlying has lock protection: v1 layer's PhysicalMemoryAllocator uses
std::recursive_mutex(active_memory_allocator.h), ExpandableActiveMemoryAllocatorImp also usesstd::recursive_mutex(active_memory_allocator.h)
Constraint 2: ActiveMemoryAllocator/ExpandableActiveMemoryAllocator/PhysicalMemoryAllocator support multi-threading
- Thread safety mechanism: Uses
std::recursive_mutexto protect shared resources - New code requirements: Must lock when accessing shared resources, follow existing lock usage patterns
Memory Management
Code Location: runtime/v2/kernel/memory/ (excluding allocator subdirectory)
Constraint 1: Device id correctness
- Code location:
memory_kernel.ccusesaclrtGetDeviceto get device_id - Requirement: When calling rts interfaces, device id must be explicitly passed correct value, avoid using default parameters (default is 0), need to verify multi-device scenario test cases
Constraint 2: Memory release timing
- Order: First stream synchronization → then release memory → finally destroy device
- Code association:
caching_mem_allocator.cc,AllocateWithTryRecyclemethod ensures synchronization before release
Constraint 3: Virtual memory compatibility design
- rtReserveMemAddress purpose: Virtual address reservation, used for dynamic shape pre-allocated address space
- Actual call location:
runtime/v1/graph/manager/active_memory_allocator.cc - Fallback path: When
rtReserveMemAddressfails, mark as not supporting virtual address reservation, fallback to physical address allocation mode. —runtime/v1/graph/manager/active_memory_allocator.cc("Maybe not support rtReserveMemAddress.") - Requirement: Need to ensure business flow is normal, no ERROR logs