GE Memory Constraints Document

Static Memory Reuse

Code Location: compiler/graph/build/memory/

Precise boundary:

Scope of graph modification prohibition: BlockMemAssigner::AssignMemoryWithReuse implementation and all functions it triggers
Multi-threaded entry: HybridMemAssigner::Assign starts multiple threads, concurrently calling AssignMemoryWithReuse
Explicitly prohibited: Adding, modifying, deleting attributes on ComputeGraph's Node
Safe operations: Reading attributes/OpDesc is safe (system reads OpDesc extensively during traversal to judge memory allocation strategy)

Code entity: DynamicBatchMemAssigner (dynamic_batch_mem_assigner.h)
Meaning: System identifies different batches through batch_label_ (set by user or GE upper framework), supports different memory reuse strategies between different batches
Conflict with static memory reuse:
- Continuous input memory in different batches will be merged into one large block for alignment
- No reuse within/between batches, but alignment strategy between batches exists, leading to possibly lower memory usage efficiency
- Maximum split size limit: kMaxSplitSizeForDynamicBatch = 400MB (dynamic_batch_mem_assigner.h)

Scenario	Code marker/basis	Impact on reuse
Continuous memory	`continuous_block_` (`block_mem_assigner.h`), `ContinuousMemMng` (`continuous_mem.cc`)	Supports memory merging for continuous input nodes; continuous memory in different batches can merge and reuse
Atomic centralized zero-clear	`atomic_addr_clean_id_` (`block_mem_assigner.h`)	Memory blocks needing atomic zero-clear cannot be reused by other nodes; if node has no related attribute, skip zero-clear
Zero copy	`is_zero_copy_` (`block_mem_assigner.h`), `IsNodeAndPeerNodeTaskSupportZeroCopy` (`block_mem_assigner.cc`)	Zero copy blocks can be reused across nodes (`IsRealSizeReuseBlock`); zero copy memory cannot merge (multiple user input addresses may be discontinuous)
Immutable address output	`is_fixed_addr_prior_` (`block_mem_assigner.h`)	Output addresses of constant/const/variable/fileconstant/constplaceholder type operators are fixed at compile time, fixed address priority memory blocks can be reused but addresses are immutable
Operators not supporting address refresh	HCOM/rtsStreamSwitchByIndex etc	Input/output addresses of these operators must be stable, cannot use zero copy
P2P memory type	`RT_MEMORY_P2P_DDR` (`block_mem_assigner.cc`)	P2P memory cannot merge zero-clear with other memory types (`graph_mem_assigner.cc`)

Meaning of "continuous": Logically continuous, not physically continuous. Outputs of multiple HCOM operators form continuous memory region logically, managed through ContinuousMemMng manager for allocation and reuse. — continuous_mem.cc
featureBaseRefreshable configuration:
- Get method: ge::GetContext().GetOption(ge::OPTION_FEATURE_BASE_REFRESHABLE, refreshable) — block_mem_assigner.cc
- Member variable: is_feature_map_refreshable_ (block_mem_assigner.h)
- Default value: false, set to true when configuration value is "1"
- Effect: Controls whether feature map is refreshable, affects IsNoNeedAssignMemory judgment

PreAssign/SetOpMemOffset not thread-safe: Can only be called by single thread, other concurrent operations need attention. — block_mem_assigner.h
Alignment strategy difference: Zero copy memory uses 32-byte alignment, others use 512-byte alignment. — graph_mem_assigner.h
Subgraph NETOUTPUT special handling: NETOUTPUT nodes in subgraphs cannot perform zero copy. — block_mem_assigner.cc
Multi-batch shape data node constraint: Multi-batch shape data nodes do not support zero copy. — block_mem_assigner.cc
Suspended memory block management: Suspended memory blocks are released during next node allocation, lifecycle managed through life_time_begin_ and life_time_end_, cannot be modified once set. — block_mem_assigner.h
Reuse strategy configurability: Supports dynamic configuration through parameters like use_range_, ascending_sort_, reuse_first_release_, memory_priority_mode_. — block_mem_assigner.h

Code Location:

v2 layer: runtime/v2/kernel/memory/allocator/ (ScalableAllocator, MemoryPool)
v1 layer: runtime/v1/graph/manager/active_memory_allocator.h (ActiveMemoryAllocator, ExpandableActiveMemoryAllocator, PhysicalMemoryAllocator)
Bridge layer: runtime/v2/kernel/memory/device/device_allocator.h (DeviceAllocator)

Code location: runtime/v2/kernel/memory/allocator/scalable_allocator.h
Lock-free design basis: Class internally has no std::mutex or std::recursive_mutex, only static std::atomic_size_t global_allocator_id_ for generating unique ID (scalable_allocator.h)
Safety guarantee method: Guaranteed by aclmdlExecute call constraint for single-threaded calling (see docs/graph_engine_api/aclmdlExecute.md for details), underlying allocator guarantees thread safety through recursive_mutex
Underlying has lock protection: v1 layer's PhysicalMemoryAllocator uses std::recursive_mutex (active_memory_allocator.h), ExpandableActiveMemoryAllocatorImp also uses std::recursive_mutex (active_memory_allocator.h)

Thread safety mechanism: Uses std::recursive_mutex to protect shared resources
New code requirements: Must lock when accessing shared resources, follow existing lock usage patterns

Code Location: runtime/v2/kernel/memory/ (excluding allocator subdirectory)

Code location: memory_kernel.cc uses aclrtGetDevice to get device_id
Requirement: When calling rts interfaces, device id must be explicitly passed correct value, avoid using default parameters (default is 0), need to verify multi-device scenario test cases

Order: First stream synchronization → then release memory → finally destroy device
Code association: caching_mem_allocator.cc, AllocateWithTryRecycle method ensures synchronization before release

rtReserveMemAddress purpose: Virtual address reservation, used for dynamic shape pre-allocated address space
Actual call location: runtime/v1/graph/manager/active_memory_allocator.cc
Fallback path: When rtReserveMemAddress fails, mark as not supporting virtual address reservation, fallback to physical address allocation mode. — runtime/v1/graph/manager/active_memory_allocator.cc ("Maybe not support rtReserveMemAddress.")
Requirement: Need to ensure business flow is normal, no ERROR logs