GE Pass Python Implementation V1 Design Document
1. Background
GE currently has two types of basic capabilities directly related to this requirement:
- The existing custom pass loading chain already exists, GE will discover and
dlopenpass libraries throughopp/vendors/*/custom_fusion_passes/*.so. - Python side already has
ge.esgraph construction capability andge.graphbasic graph interface.
The goal of this design is to introduce formal Python pass development capability without overturning the existing GE pass execution framework, so that users can both quickly develop and debug locally and distribute passes as standard Python packages to teams.
2. Goals and Scope
2.1 Goals
- Support users to develop GE passes using Python.
- Reuse existing GE pass execution chain, do not add a second pass scheduling framework.
- Reuse existing
ge.esPython graph construction capability, do not redesign Python ES. - First support environment variable-driven development mode integration, then supplement release mode auto-discovery.
- Reserve extension points for subsequent Python ATC entry to reuse the same pass registration and discovery protocol.
2.2 V1 Scope
-
Support three types of passes:
FusionBasePassPatternFusionPassDecomposePass
-
The bring-up and independent bridge separation phase still uses
FusionBasePassregression as the minimum verification chain; after formal wrapper implementation, the first complete acceptance target shifts toPatternFusionPass, withDecomposePassadded afterwards, but still within V1 scope of this design document. -
Current phase discovery mechanism is first converged to:
- Environment variable
ASCEND_GE_PY_PASS_PATH
- Environment variable
-
Subsequent phase will add:
entry_points(group="ge.passes.plugins")
-
Current code has converged to the environment variable main path, with
entry_pointsas subsequent phase capability supplement. -
Complete minimum graph interface capabilities required for Python pass writing.
-
Current repository directly outputs
ge_pymain wheel and multi-version native sub-wheels. -
Existing 9 samples in
examples/fusion_passare planned to provide Python comparison versions. -
REGISTER_CUSTOM_PASSneeds to be supported, but not as the first main chain target, placed in subsequent extension phase based on the same bridge / registry / session mechanism reuse implementation. -
Current priority rectification items:
- Discovery mechanism first converged to environment variable
ASCEND_GE_PY_PASS_PATH entry_pointsto be added laterpython_pass_bootstrap_test.pymigrated totests/ge/ut/ge/graph/pyge_tests/and connected to current frontend script- New file year uniformly uses
2026, existing old files do not batch change year - First priority is to target "formal sample of
FusionBasePassend-to-end passing via environment variable", all capabilities involved in this goal must be formally completed, capabilities not involved can be deferred - After phase 2 closure,
PassContext/MatchResult/Pattern/PatternMatcherConfigformal Python form uniformly directly depends on_ge_pass_native.so, no longer retaining bring-up phase compatibility shim
- Discovery mechanism first converged to environment variable
2.3 Non-Goals
- V1 does not force users to package passes into whl.
- V1 first batch does not take legacy
REGISTER_CUSTOM_PASSsystem as the main Python implementation object, prioritizing coverage ofPassRegistrysystem's three pass types; but architecture and documentation need to reserve subsequent integration capability. - V1 does not create a second Python-only pass executor.
3. User Experience Design
3.1 Development Mode
After users install the ge_py provided by CANN, they only need to write ordinary .py files or ordinary Python packages and tell GE/ATC via environment variables:
ASCEND_GE_PY_PASS_PATH=/abs/path1:/abs/path2
The current phase focuses on stabilizing this path first, not requiring users to write their own wheel packaging logic, nor requiring users to understand entry_points.
3.2 Release Mode (Subsequent Phase)
When users need team sharing, version freezing, and auto-discovery, they can make passes into independent Python packages and declare:
entry_points = {"ge.passes.plugins": [...]}
GE automatically discovers installed pass plugin packages at runtime.
3.3 Notes
The current phase environment variable is not a fallback, but the main path:
ASCEND_GE_PY_PASS_PATH=/abs/path1:/abs/path2
entry_points auto-discovery to be added later.
4. Overall Architecture
4.1 Architecture Principles
- GE collects pass plugin loading through a unified upper-level loader during initialization phase.
- Legacy custom passes continue to use the existing
.so + dlopenmechanism. - From long-term productization and extensibility perspective, Python pass bridge should be designed as "independent internal bridge
.so" from the beginning, rather than directly compiled intoge_compiler.so. - This still does not adopt the "bridge
.sodiscovered and loaded bycustom_fusion_passesas a pass plugin" approach; the recommendation is a private bridge.soexplicitly loaded by GE internal loader. - The design goal is to keep
ge_compiler.soas Python ABI neutral as possible, only retaining stable pass runtime, registry and adapter protocols; all logic directly depending onPython.h/pybind11/libpythonshould be converged into replaceable independent bridge.so. - This way, whether going through pre-compilation or fallback codegen, the replacement target is the independent bridge
.so, not thege_compiler.soin the run package. - Python side uniformly manages plugin discovery, module import and registry through
ge.passes.bootstrap. - C++ side only cares about "getting executable pass descriptor and registering to PassRegistry", not about where user Python files specifically are.
4.2 Core Components
-
PassPluginLoader- Unified pass plugin loading entry at compiler layer
- Internally uniformly calls legacy
CustomPassHelper::Load()and Python pass registration logic - Maintains "one call entry closure", while not putting Python logic back into
graph_metadef/register
-
ge.passes.bootstrap- Python-side unified discovery entry
- Currently prioritizes environment variable discovery, with
entry_pointsto be added later
-
ge.passes.registry- Python-side registry
- Responsible for storing pass metadata, class objects, stages, types, and additional parameters
-
ge.passes._bridge- Protocol layer between Python and C++ bridge
- Responsible for normalizing Python registry objects into C++ consumable data structures
-
_ge_pass_native- Python helper module exported by
PYBIND11_MODULE - Only carries
Graph/PassContext/MatchResultand other native-backed wrappers and helpers - Does not carry
FusionBasePass/PatternFusionPass/DecomposePassthese user-inheritable pass base classes
- Python helper module exported by
-
C++ pass adapter
- Provides corresponding C++ adapter classes for three types of Python passes
- Calls back Python object methods in adapter classes
-
Independent bridge
.soloaded as pass plugin viadlopen- Current main approach has explicitly abandoned this
- Any description in the document based on "adding a new bridge
.sodiscovered bycustom_fusion_passes" should be understood as "unified loader + private internal bridge.so", not a new pass plugin discovery chain
-
Private internal bridge
.so- This is the recommended formal direction in this design, not an optional optimization
- It is not a pass plugin and does not participate in
custom_fusion_passesdiscovery; it carries complete Python version-sensitive bridge logic, including interpreter initialization, GIL, object conversion, exception translation, and Python callbacks - During formal delivery, it forms the same bridge artifact set with
_ge_pass_native.so, both need to be included in pre-compilation and fallback management
4.3 Native Binding Strategy
V1 adopts a "two-layer binding" strategy, rather than migrating all Python interfaces to the same binding method at once:
-
ge.graph/ge.es- Continue to reuse the existing C wrapper +
ctypesapproach in the repository - This minimizes changes and allows prioritizing reuse of existing Python graph interfaces and eager-style graph construction capabilities
- Continue to reuse the existing C wrapper +
-
Python pass bridge / adapter layer
- Uses
pybind11as core implementation strategy - Reason is that this part needs to more naturally handle
MatchResultwrapping, Python object lifecycle, exception translation, and GIL management
- Uses
-
Version release strategy
- Main wheel is responsible for Python code, discovery logic, and runtime selection logic
cp39-cp312provides pre-compiledpybind11native sub-wheels- When pre-compiled version is not matched, runtime fallback codegen as final fallback
That is, V1 is not "full pybind migration", but "graph interfaces continue with existing ctypes/C wrapper, pass bridge adopts pybind11".
4.4 pybind11 Usage Method Selection
pybind11 has two typical usage methods in this solution:
-
embed mode
- Python interpreter initialized by C++ process,
importPython modules, and callback Python objects
- Python interpreter initialized by C++ process,
-
extension mode
- Export C++ capabilities as Python-directly
import-able native modules throughPYBIND11_MODULE
- Export C++ capabilities as Python-directly
The existing compiler/graph/fusion/pass/python_fusion_base_pass_pybind_bridge.cc uses embed mode, not extension mode. The reasons are:
- Current main direction is "GE compiler calls Python pass", not "Python actively imports a C++ pass runtime then reversely drives compiler"
FusionBasePassfirst stage only needs to let C++ safely create Python objects and call theirrun(), without first exposing C++ base classes to Python for inheritance- embed mode can first reuse existing
ge.passes.bootstrap / registry / _bridgepure Python organization, reducing first implementation cost
Therefore, the current bridge file will not have PYBIND11_MODULE macro, this is not a missing feature, but an intentional mode difference.
But from long-term design perspective, embed bridge should not continue to be directly compiled into ge_compiler.so. Current workspace has already split it into independent libge_python_pass_bridge.so, which is also the formal boundary that should be continuously maintained. A more reasonable form is:
-
ge_compiler.so- Only holds stable loader, descriptor / adapter protocols, and minimal C/C++ interaction surface
-
Independent internal bridge
.so- Adopts embed or extension whichever is more suitable for specific implementation
- Undertakes all Python version-sensitive native logic
-
_ge_pass_native.so- As Python-directly
import-able helper extension - Provides native-backed wrappers and helpers for Python layer, but does not define user-inheritable pass base classes
- As Python-directly
What needs to be further emphasized is that the current solution has two clear boundaries that cannot be mixed:
-
User pass definition layer
- Continues to maintain pure Python form, users inherit
ge.passes.base.FusionBasePass/PatternFusionPass/DecomposePass
- Continues to maintain pure Python form, users inherit
-
native helper / wrapper layer
- Provided by
_ge_pass_native.soforGraph/PassContext/MatchResultand other wrappers - Provided by
libge_python_pass_bridge.sofor embed path runtime bridging
- Provided by
That is, this solution does not require or recommend exposing C++ FusionBasePass / PatternFusionPass base classes to Python for inheritance through PYBIND11_MODULE.
What needs special explanation is that Python version sensitivity is not only present with PYBIND11_MODULE. As long as native code directly depends on:
Python.hpybind11libpython
Whether embed or extension, it will naturally have binary coupling with Python minor version. PYBIND11_MODULE is just the export entry for extension mode, not the root cause of version sensitivity.
9. bootstrap discovers and imports user pass modules
10. User pass modules register pass to ge.passes.registry through decorators
11. Bridge reads registry and dynamically registers descriptor back to PassRegistry through registrar callback
Corresponding call relationship can be simplified为 following timeline:
PassPluginLoader / ge_compiler.so
-> python_fusion_base_pass_bridge_loader.cc
-> dlopen(libge_python_pass_bridge.so)
-> GeGetPythonFusionBasePassBridgeApi()
-> register_fusion_base_passes(registrar)
-> ge.passes._bridge.load_and_get_pass_descriptors()
-> ge.passes.bootstrap.load_pass_plugins()
-> registrar.register_pass(pass_desc, callbacks)
-> RegisterPythonFusionBasePass(...)
-> PassRegistry / runtime registry
Where:
registraris constructed by loader, representing "how to register descriptor back to compiler"bridgeis responsible for discovering Python pass, and after getting descriptor, callbackregistrarRegisterPythonFusionBasePass(...)is真正 landing point that挂 descriptor, callbacks and creator back to compiler侧 registration center
Phase 1 split初期, libge_python_pass_bridge.so can first only depend on pure Python bootstrap / _bridge protocol to complete minimum integration调试; Phase 2 closure后的 formal approach则要求 _ge_pass_native.so simultaneously到位, bridge and Python API default基于 same set of native helper 运行.
5.2 Execution Phase
FusionPassExecutorgets pass fromPassRegistryaccording to现有 flow- If pass is Python pass, then actually created is corresponding C++ adapter instance
- Adapter calls back Python pass object in
Runor related phase - Python pass reads/writes graph and builds replacement graph through
ge.graphandge.esinterfaces - Return value maps为 GE
Status
6. Python Public Interface Design
6.1 Package Structure
Add new ge.passes package, providing following public interfaces:
FusionBasePassPatternFusionPassDecomposePassregister_fusion_passregister_decompose_passPassStagePassContextPatternNodeIoMatchResultPatternMatcherConfigPatternMatcherConfigBuildercapture_tensorcreate_patterncreate_replacementload_pass_pluginsget_registered_passes
6.2 Method Style
Default use Python style naming
6.3 Registration Interface
Suggested form如下:
from ge.passes import FusionBasePass, PassStage, register_fusion_pass
@register_fusion_pass(name="ConvFormatPass", stage=PassStage.BEFORE_INFER_SHAPE)
class ConvFormatPass(FusionBasePass):
def run(self, graph, context):
return 0
from ge.passes import DecomposePass, PassStage, register_decompose_pass
@register_decompose_pass(
name="DecomposeGroupedConv",
stage=PassStage.AFTER_INFER_SHAPE,
op_types=["Conv2D"],
)
class DecomposeGroupedConv(DecomposePass):
def meet_requirements(self, node):
return True
def replacement(self, node):
...
6.4 Python-side Return Value Convention
runreturnsStatusLikemeet_requirementsreturnsboolpatternsreturnslist[Pattern | Graph]replacementreturnsGraph- If希望 skip current match, must return
Falseinmeet_requirements, not支持通过replacementreturningNoneto express "abandon replacement"
Where StatusLike in Python layer uniformly converts为 GE Status.
7. Discovery Mechanism Design
7.1 Unified Entry
Python side uniformly provides:
ge.passes.bootstrap.load_pass_plugins()ge.passes.bootstrap.get_registered_passes()
bridge registration chain路 before each round loading first由 C++ according to current process environment refresh Python os.environ中 ASCEND_GE_PY_PASS_PATH,避免 resident Python interpreter中 environment cache affecting下一轮 pass discovery.
7.2 Discovery Priority
Current phase priority converges to:
- Environment variable
ASCEND_GE_PY_PASS_PATH
Subsequent phase will add:
entry_points(group="ge.passes.plugins")
7.3 Environment Variable Mode
ASCEND_GE_PY_PASS_PATHsupports multiple directories,以:separated- Directory中 allows single file module or ordinary Python package
- bootstrap负责将这些 directories temporarily add to
sys.path
7.4 entry_points Mode (Subsequent Phase)
- group fixed为
ge.passes.plugins - value can point to module path, or return module's callable
- Module after import通过 decorator completes registration
8. Three Types Pass Bridge Design
8.1 FusionBasePass
Most direct type, C++ adapter calls Python object:
run(graph, context)- Return value constraint为
None/bool/intthree types status value - Formal pass contract中,
context始终为PassContext - Only
_bridge.py's direct bridge/pytest auxiliary entry allows passingNone
该 type优先打通,作为整条链路's minimum closure.
8.2 PatternFusionPass
该 type continues reusing existing C++ PatternMatcher mechanism. Python side only负责:
- Providing pattern graph
- According to
MatchResultjudging whether满足 condition - Constructing replacement graph
This means C++ side needs a Python adapter继承 PatternFusionPass, in following points callback Python:
Patterns()MeetRequirements()Replacement()
Here有一个明确方案 constraint:
- Not requiring Python user class directly inherit一个通过
PYBIND11_MODULEexposed C++PatternFusionPass - User continues inheriting pure Python's
ge.passes.base.PatternFusionPass - Reusing C++ base class public
Run()flow's responsibility放在PythonPatternFusionPassAdapter上,而不是放在 Python user class上 - Python subclass禁止 override
run(); if误 override, base class in class definition phase directly throwsTypeError,避免 "implemented but永远不会被调用" ambiguity
Recommended form是:
PythonPatternFusionPassAdapter : public PatternFusionPass- adapter overrides
Patterns()/MeetRequirements()/Replacement() - Override functions内部再 callback Python pass instance
Run()directly reuses existing C++PatternFusionPass::Run()
Choosing this approach reason是:
- 能最大化 reuse existing C++
PatternMatcher, rewrite, statistics and error handling logic - Not forcing Python user environment must first successfully import一个 native C++ base class module, keeping
ge.passespure Python API usability - 能让
FusionBasePass,PatternFusionPass,DecomposePassthree types pass在 user side keep unified style - 能把 Python version sensitive问题尽量收敛 in adapter / wrapper / native bridge layer,而不是扩散 to user pass base class definition layer
8.3 DecomposePass
该 type continues reusing existing DecomposePass semantics. Python side only负责:
MeetRequirements(const GNode &)Replacement(const GNode &)
Construction时 needs retaining op_types information.
与 PatternFusionPass same, Python user class不 directly接管 Run() main flow,而是只 implement hooks:
meet_requirements(node) -> boolreplacement(node) -> Graph
Here额外 contract constraint needs明确写死 in Python base class里:
- Python subclass禁止 override
run(); if误 override, base class in class definition phase directly throwsTypeError replacement()must return replacement Graph- If希望 skip current node,必须在
meet_requirements()phase returnFalse,不支持通过replacement()returnNone
与 PatternFusionPass same,这里也不要求 Python user class directly inherit pybind exposed C++ DecomposePass. Recommended form是:
PythonDecomposePassAdapter : public DecomposePass- adapter在 C++ side reuses
DecomposePass::Run() - adapter overrides
MeetRequirements()/Replacement()and转调 Python pass instance
This可以 retain existing C++ DecomposePass's main flow semantics,同时避免把 construction parameters, op_types and Python version sensitive logic directly expose给 Python base class inheritance system.
8.4 creator and Context Acquisition Design
Current CreateFusionPassFn是 naked function pointer:
using CreateFusionPassFn = FusionBasePass *(*)();
V1不建议把它直接改成 std::function<FusionBasePass *()>. Reasons有两点:
-
Python pass来自 bridge
.so's dynamic registration, if creator holds可捕获 lambda,析构链容易和dlcloseorder coupling -
Once
PassRegistryor other global object在 bridge.so已 unloaded后析构,std::functioninternal object析构就可能访问已 unloaded code,存在coredumprisk Here当前不再以 "独立 bridge.so'sdlcloserisk"作为 main reason. More accurate reason是: -
Existing creator ABI仍是无参 naked function pointer,直接改成可捕获 object会扩大 impact面
-
std::function会把 runtime routing information, object析构 and call path耦合进 creator本身,不利于 maintaining "creator只做最小 routing, runtime resources放在 bridge/runtime registry" layering -
Retaining naked function pointer + TLS routing context,仍是当前影响面最小、最稳妥方案
Therefore here建议采用更稳妥方案:
- Retain
CreateFusionPassFn为 naked function pointer - Add一个 "creation phase TLS context"
- Python pass's runtime object and metadata放在 bridge held process-level registry中
Here说 "identification information"不是 Python runtime context itself,而是用于 registry lookup's stable key and metadata,例如:
pass_namepass_kind,即fusion/pattern/decomposestage- Python module name
- Python class full name
decomposescenario下op_types
These information属于 registration phase static information,不是 execution phase context.真正 Python interpreter state, module object, pass instance,不放在 creator里携带,而是放在 bridge's global registry中 unified management.
Recommended implementation方式如下:
- 在 consume
create_fn()'s position set TLS creation context
- 最终建议传入
descriptor_key - 某些现有 call points短期更容易拿到
pass_name时,可先做过渡 mapping,但不建议把pass_namefixed化为最终 creator routing key
- 为三类 Python pass提供少量 generic creator functions
CreatePythonFusionPass()CreatePythonPatternPass()CreatePythonDecomposePass()
- Generic creator从 TLS中读取 current
descriptor_key
- 再据此从 bridge registry中找到对应 descriptor
- Then construct对应 adapter
- Adapter在 execution时再从 bridge registry获取 Python pass instance or its holder
This design's advantages是:
- 不需要把 Python object context塞进
create_fn - 不引入
std::function's析构顺序 risk - 与当前 GE's creator call方式 compatibility更好
8.5 TLS Creation Context Refinement
Current implementation需要保留一个轻量 creation phase TLS context. Its necessity不是 "为了传更多信息",而是因为 current FusionPassRegistrationData::CreatePassFn仍是无参 creator,而 Python pass side多个 descriptors会共用同一个 adapter factory;如果没有这份 TLS, upper layer PassRegistry::CreatePass() although knows "正在创建哪个 pass", shared factory却不知道应该绑定哪个 Python descriptor.
Current code form可表示为:
struct PythonPassCreateContext {
std::string descriptor_key;
std::string pass_name;
PythonPassKind kind;
};
并提供如下 auxiliary facilities:
SetCurrentPythonPassCreateContext(descriptor_key)GetCurrentPythonPassCreateContext()ClearCurrentPythonPassCreateContext()- RAII scope guard,例如
PythonPassCreateScope
Usage方式如下:
- 在调用
create_fn()前,由调用方设置当前descriptor_key - Generic creator读取 TLS中
descriptor_key - Generic creator再到 bridge registry中查找对应 descriptor
- 构造对应 adapter
create_fn()返回后自动清理 TLS
其中:
descriptor_key是真正 routing primary keypass_name/kind当前主要承担一致性校验,避免 shared factory误绑定到错误 descriptor
后续如果调用链进一步统一,仍建议把 PythonPassCreateContext收敛到 "只保留 descriptor_key这一最小必要字段",避免状态重复,并与运行时 descriptor / runtime entry主键 model保持一致.
V1建议在以下调用点接入该 scope:
compiler/graph/fusion/pass/fusion_pass_executor.ccFusionPassExecutor::InitPassesIfNeed
其中:
- 首批与后续主链路都只覆盖
FusionPassExecutor graph_fusion.cc不在本方案后续支持范围内
这样可以避免把 Python
8.4 Creator和 Context Retrieval Design
当前 CreateFusionPassFn是裸函数指针:
using CreateFusionPassFn = FusionBasePass *(*)();
V1不建议把它直接改成 std::function<FusionBasePass *()>. 原因有两点:
-
Python pass来自 bridge
.sodynamic registration,如果 creator持有可捕获 lambda,析构链容易和dlclose顺序耦合 -
一旦
PassRegistry或其他全局 objects在 bridge.so已卸载后才析构,std::function内部 object析构就可能访问已卸载 code,存在coredump风险 这里当前不再以 "独立 bridge.sodlclose风险"作为主原因.更准确原因是: -
现有 creator ABI仍是无参裸函数指针,直接改成可捕获 objects会扩大影响面
-
std::function会把运行时 routing information、object析构和调用路径耦合进 creator本身,不利于维持 "creator只做最小 routing、运行时 resources放在 bridge/runtime registry"分层 -
保留裸函数指针 + TLS routing context,仍然是当前影响面最小、最稳妥方案
因此这里建议采用更稳妥方案:
- 保留
CreateFusionPassFn为裸函数指针 - 增加一个 "创建期 TLS context"
- Python pass运行时 objects和 metadata放在 bridge持有进程级注册表中
这里说 "标识信息"不是 Python runtime context本身,而是 用于注册表查找稳定 key和 metadata,例如:
pass_namepass_kind,即fusion/pattern/decomposestage- Python模块名
- Python类全名
decomposescenarios下op_types
这些信息属于注册期静态 information,不是执行期 context.真正 Python解释器状态、模块 object、pass实例,不放在 creator里携带,而是放在 bridge全局注册表中统一管理.
推荐实现方式如下:
- 在消费
create_fn()位置设置 TLS创建 context
- 最终建议传入
descriptor_key - 某些现有调用点短期更容易拿到
pass_name时,可先做过渡 mapping,但不建议把pass_name固化为最终 creator routing key
- 为三类 Python pass提供少量通用 creator functions
CreatePythonFusionPass()CreatePythonPatternPass()CreatePythonDecomposePass()
- 通用 creator从 TLS中读取当前
descriptor_key
- 再据此从 bridge注册表中找到对应 descriptor
- 然后构造对应 adapter
- adapter在执行时再从 bridge注册表获取 Python pass实例或其 holder
这个设计优点是:
- 不需要把 Python objects context塞进
create_fn - 不引入
std::function析构顺序风险 - 与当前 GE creator调用方式 compatibility更好
8.5 TLS Creation Context细化
当前实现需要保留一个轻量创建期 TLS context.它必要性不是 "为了传更多信息",而是因为当前 FusionPassRegistrationData::CreatePassFn仍是无参 creator,而 Python pass侧多个 descriptors会共用同一个 adapter factory;如果没有这份 TLS,上层 PassRegistry::CreatePass()虽然知道 "正在创建哪个 pass",共享 factory却不知道应该绑定哪个 Python descriptor.
当前 code形态可以表示为:
struct PythonPassCreateContext {
std::string descriptor_key;
std::string pass_name;
PythonPassKind kind;
};
并提供如下辅助设施:
SetCurrentPythonPassCreateContext(descriptor_key)GetCurrentPythonPassCreateContext()ClearCurrentPythonPassCreateContext()- RAII scope guard,例如
PythonPassCreateScope
使用方式如下:
- 在调用
create_fn()前,由调用方设置当前descriptor_key - 通用 creator读取 TLS中
descriptor_key - 通用 creator再到 bridge注册表中查找对应 descriptor
- 构造对应 adapter
create_fn()返回后自动清理 TLS
其中:
descriptor_key是真正 routing primary keypass_name/kind当前主要承担一致性校验,避免共享 factory误绑定到错误 descriptor
后续如果调用链进一步统一,仍建议把 PythonPassCreateContext收敛到 "只保留 descriptor_key这一最小必要 fields",避免状态重复,并与运行时 descriptor / runtime entry主键 model保持一致.
V1建议在以下调用点接入该 scope:
compiler/graph/fusion/pass/fusion_pass_executor.ccFusionPassExecutor::InitPassesIfNeed
其中:
- 首批与后续主链路都只覆盖
FusionPassExecutor graph_fusion.cc不在本方案后续支持范围内
这样可以避免把 Python化范围扩散到 legacy兼容链路,同时保证 creator / TLS / descriptor方案围绕主链闭环演进.
8.6 Bridge Process-level Registry细化
Bridge内部建议维护两个层次注册 information,而不是继续让一个 holder_key同时承担 "静态身份"和 "运行时实例"两种 semantics.逻辑上可拆成两部分:
-
PythonPassDescriptor -
注册期静态 information
-
PythonPassInstanceHolder -
执行期实例 information
-
Python pass实例
-
运行期状态
-
异常状态
-
session / instance关联 information
其中 PythonPassDescriptor建议至少包含:
pass_namepass_kindstagemodule_nameclass_qualnameop_typesdescriptor_key
其中:
-
descriptor_key -
表示 "这个 pass类是谁"静态 key
-
建议格式为
module_name + class_qualname + pass_name -
用于注册去重、descriptor查找、日志定位
-
instance_id -
表示 "这次运行时实例是谁"动态 key
-
由 adapter / session在创建时生成
-
用于 holder查找、实例 lifecycle管理、执行期隔离
最小实现阶段曾把 holder_key同时用于 descriptor查找和 holder查找.当前 FusionBasePass已完成拆分,后续其余 pass也应保持这套 model:
-
descriptor_key -
静态身份
-
instance_id -
动态实例身份
V1建议注册表只由 bridge自己持有和析构,不暴露给其他全局 singleton持有 objects,避免再次产生跨 so lifecycle耦合.
这里刻意不把 Python pass实例做成进程级 singleton.原因是:
- User更容易自然地把
self当成 "本次 pass执行临时状态容器" - 可以避免跨图、跨执行残留状态污染
- 可以降低多线程并发下实例共享导致锁和重入要求
8.7 Python Pass Adapter细化
V1建议为三类 pass分别提供 adapter:
PythonFusionBasePassAdapterPythonPatternFusionPassAdapterPythonDecomposePassAdapter
三者共同特征:
- 构造时只接收
descriptor_key或pass_name - 构造时不直接持有 Python临时 object裸指针
- 构造期完成 descriptor绑定,并为当前 adapter创建独立
instance_id - Adapter lifecycle内独占自己 Python pass实例,不复用长期共享 holder
- 执行时通过
instance_id在 bridge实例仓库中查找当前 adapter对应 holder - 析构时通过
instance_id释放 holder - 执行时统一做 GIL获取、异常转译、状态映射
这样即使 adapter本身由 GE长期持有,它也只依赖 bridge稳定管理 holder,不依赖 creator闭包 object.
三类 adapter分工还需要进一步明确:
-
PythonFusionBasePassAdapter -
直接覆盖
Run(),内部调用 Pythonrun(graph, context) -
PythonPatternFusionPassAdapter -
继承 C++
PatternFusionPass -
不重写基类公共
Run()主流程 -
只覆盖
Patterns()/MeetRequirements()/Replacement()三个 hook,并在 hook内部转调 Python -
PythonDecomposePassAdapter -
继承 C++
DecomposePass -
不重写基类公共
Run()主流程 -
只覆盖
MeetRequirements()/Replacement(),并在 hook内部转调 Python
这也是当前设计里 "为什么不急着把 C++ pass基类通过 PYBIND11_MODULE暴露给 Python继承"核心原因:真正需要复用 C++非纯虚主流程是 adapter,不是 user写 Python pass classes.
8.8 Execution期 Session Design
为避免对 Python pass写法施加不必要限制, V1建议引入 "每次执行一个 session" model:
-
一个 adapter一次
Run调用,对应一个PythonPassExecutionSession -
Session内创建新 Python pass实例,并分配唯一
instance_id -
同一 session内,多次 Python回调共用同一个实例
-
Session结束时销毁实例,释放
instance_id -
Session提供 GIL获取、异常捕获、logging转发等基础封装
这样 user在 Python pass内部可以自由使用 self保存临时状态,而不必担心跨执行污染.
8.9 Bridge Runtime Registry和 Holder Lookup
当前建议 bridge实现两阶段查找:
-
Descriptor Lookup (创建期)
- 由 adapter构造时根据 TLS
descriptor_key进行 - 返回
PythonPassDescriptor *
- 由 adapter构造时根据 TLS
-
Holder Lookup (执行期)
- 由 adapter
Run内部根据自己instance_id进行 - 返回
PythonPassInstanceHolder * - Holder内包含 Python pass实例
- 由 adapter
查找流程:
Adapter构造:
descriptor_key → GetDescriptor(descriptor_key) → PythonPassDescriptor
Adapter Run:
instance_id → GetInstanceHolder(instance_id) → PythonPassInstanceHolder → python_pass_instance
Bridge registry内部建议使用 concurrent containers (如 std::unordered_map + lock或 tbb::concurrent_hash_map),以支持多线程创建和执行.
8.10 Exception Handling和 Status Mapping
Python pass执行时可能抛出多种异常. V1建议统一转译为 GE status codes:
| Python Exception Type | GE Status Code | Description |
|---|---|---|
PassSkipException |
GRAPH_SUCCESS (but不 apply changes) |
User主动跳过当前 match |
PassFatalError |
GRAPH_FAILED |
User主动标记失败 |
PyError (一般 exception) |
GRAPH_FAILED + exception message |
Python runtime error |
GeError (from GE API calls) |
Original GE error code | GE内部 error |
Adapter统一在以下位置捕获:
Run()内部Patterns()内部MeetRequirements()内部Replacement()内部
捕获后统一通过 context->SetErrorMessage()记录,并返回对应 status code.
8.11 Logging和 Tracing
V1建议 adapter在关键节点添加 tracing:
- Pass creation:
descriptor_key,instance_id - Pass execution:
graph_id,input shapes,output shapes - Pattern matching:
match_count,capture tensors - Replacement:
nodes_added,nodes_removed
Tracing可以通过 GE现有 profiling infrastructure输出,或通过 bridge内部 logging facade.
8.12 Backward Compatibility和 Versioning
Python pass registry和 adapter设计需要考虑 version compatibility:
- Descriptor compatibility: 不同 version pass定义可能存在 optional inputs/attributes差异
- Adapter compatibility: Adapter需要处理 legacy Python pass classes
- Bridge compatibility: Bridge registry需要支持多 version artifacts共存
V1建议:
PythonPassDescriptor包含versionfield- Bridge registry按
descriptor_key + version索引 - Adapter根据 descriptor version选择对应 execution strategy
9 Native Helper和 Code Generation
9.1 Native Helper Architecture
Native helper是 bridge内部辅助 module,负责:
- Python object和 C++ object conversion
- Graph structure marshalling
- Tensor data marshalling
- Attribute value marshalling
Native helper位于 bridge .so内部,不暴露给 user.
9.2 Code Generation Pipeline
Python pass code generation pipeline包含以下 stages:
Python Pass Registration
↓
Bridge Descriptor Registration
↓
Native Helper Binding
↓
Adapter Creation
↓
GE Pass Registry Entry
每个 stage详细说明:
Stage 1: Python Pass Registration
- Python user通过 decorator注册 pass
- Python module加载时执行 registration code
- Registration code调用 bridge C API注册 descriptor
Stage 2: Bridge Descriptor Registration
- Bridge接收 descriptor information
- Bridge创建
PythonPassDescriptorentry - Bridge分配
descriptor_key - Bridge保存 descriptor到 process-level registry
Stage 3: Native Helper Binding
- Bridge根据 descriptor type选择 native helper
- Native helper生成对应 C++ binding code
- Binding code负责 Python和 C++ marshalling
Stage 4: Adapter Creation
- GE调用
CreatePassFn - TLS context设置
descriptor_key - Bridge根据 descriptor创建 adapter
- Adapter分配
instance_id
Stage 5: GE Pass Registry Entry
- Adapter注册到 GE
PassRegistry - GE
PassRegistry保存 adapter pointer - GE编译流程调用 adapter
Run
9.3 Generated Code Structure
Generated code包含以下 components:
Header Files:
python_pass_bridge.h: Bridge public APIpython_pass_adapter.h: Adapter definitionspython_pass_types.h: Type definitions
Source Files:
python_pass_bridge.cc: Bridge implementationpython_pass_adapter.cc: Adapter implementationpython_pass_registry.cc: Registry implementation
Generated Files:
python_pass_generated_bindings.cc: Marshalling bindingspython_pass_generated_creators.cc: Creator functions
9.4 Build和 Deployment
Python pass build和 deployment流程:
-
Native Artifact Generation
- C++ code生成 native helper和 adapter
- 编译为
_ge_pass_native.so - 编译为
libge_python_pass_bridge.so
-
Python Package Generation
- Python code打包为 wheel package
- Wheel package包含 Python pass modules
- Wheel package依赖 native artifacts
-
Deployment
- Native artifacts部署到 GE run package
- Python wheel部署到 Python environment
- Bridge
.so加载到 GE process
10 Runtime Fallback和 Multi-version Support
10.1 Runtime Fallback Overview
Runtime fallback是指当 native artifacts不可用时,自动生成 fallback code:
- Scenario 1: Python pass requires native helper, but native helper
.somissing - Scenario 2: Python pass requires specific version artifact, but artifact version mismatch
- Scenario 3: Python pass requires platform-specific artifact, but platform not supported
Fallback code generation流程:
Detect Missing Native Artifact
↓
Generate Fallback Code
↓
Compile Fallback Code
↓
Load Fallback Artifact
↓
Continue Pass Execution
10.2 Fallback Code Generation
Fallback code generation策略:
Strategy 1: Pure Python Fallback
- 纯 Python implementation of pass logic
- 无需 native helper依赖
- 性能较低,但可用性最高
Strategy 2: Minimal Native Fallback
- 最小 native helper实现
- 仅支持基本 marshalling
- 性能中等,可用性中等
Strategy 3: JIT Compilation Fallback
- 运行时 JIT compile native code
- 支持完整 marshalling
- 性能接近 native,但需要 JIT infrastructure
V1建议采用 Strategy 2,因为:
- Pure Python fallback性能过低
- JIT compilation fallback infrastructure复杂
- Minimal native fallback是当前最优 trade-off
10.3 Multi-version Artifact Support
Multi-version artifact是指支持多 Python version artifacts:
cp39: Python 3.9cp310: Python 3.10cp311: Python 3.11cp312: Python 3.12cp313: Python 3.13cp314: Python 3.14
每个 version artifact包含:
- Version-specific native helper
- Version-specific adapter
- Version-specific bridge
.so
Artifact selection流程:
Detect Current Python Version
↓
Search Version-specific Artifact
↓
Load Matching Artifact
↓
If Not Found, Generate Fallback
10.4 Artifact Cache和 Pre-built Artifacts
Pre-built artifacts是指提前编译 native artifacts:
- Build阶段: 编译所有 version artifacts
- Package阶段: 打包所有 artifacts到 run package
- Install阶段: 选择匹配 version artifact安装
Artifact cache是指缓存已生成 fallback artifacts:
- Cache位置:
ge/passes/python_pass_artifacts/ - Cache key:
python_tag + platform + bridge_abi - Cache管理: 自动清理过期 artifacts
11 Implementation Roadmap
11.1 V1 Implementation Scope
V1 implementation scope包含:
Core Components:
PythonPassDescriptorregistryPythonPassInstanceHoldermanagementPythonFusionBasePassAdapterPythonPatternFusionPassAdapterPythonDecomposePassAdapter- TLS creation context
- Bridge registry
Infrastructure:
- Native helper binding
- Exception handling
- Logging和 tracing
- Status mapping
Deployment:
- Pre-built artifacts generation
- Multi-version support
- Runtime fallback
11.2 V1 Implementation Milestones
Milestone 1: Basic Framework (Week 1-2)
- Bridge registry implementation
- Descriptor registration API
- TLS creation context
- Basic adapter skeleton
Milestone 2: Adapter Implementation (Week 3-4)
PythonFusionBasePassAdapterimplementationPythonPatternFusionPassAdapterimplementationPythonDecomposePassAdapterimplementation- Exception handling
Milestone 3: Native Helper (Week 5-6)
- Native helper generation
- Marshalling bindings
- Python-C++ conversion
- Logging和 tracing
Milestone 4: Multi-version和 Fallback (Week 7-8)
- Pre-built artifacts generation
- Version-specific artifacts
- Runtime fallback code generation
- Artifact cache management
11.3 Future Enhancements
Future enhancements考虑:
Performance优化:
- Cache Python pass instances
- Optimize marshalling overhead
- Reduce GIL持有时间
功能扩展:
-
Support更多 pass types
-
Support更多 Python versions
-
Support更多
-
Session结束时统一释放该实例及其临时包装objects
这意味着:
-
FusionBasePass -
一次
Run对应一个 Python实例 -
PatternFusionPass -
一次
Run内部Patterns、MeetRequirements、Replacement共用同一个 Python实例 -
DecomposePass -
一次
Run内部对多个匹配节点处理共用同一个 Python实例
这样设计后, Python user可以自然使用:
self.xxx作为一次执行期间临时 cache- 普通Python objects作为辅助状态
- 普通异常作为失败信号
而无需理解 "这个实例是不是跨图复用"这种 bridge内部细节.
8.9 Memory Management细化
V1 memory管理目标是:
- 不要求 Python user手工释放任何 bridge objects
- 不要求 Python user显式使用
with、close()、release()之类 interfaces - 不允许因为 user把 object存在局部变量或成员变量里就触发双释放或悬挂指针
建议按三层 objects分别处理.
8.9.1 Registration期 Objects
Registration期 objects包括:
- descriptor
- Python模块 object
- Python类 object
- descriptor注册表
这些 objects由 bridge注册表统一持有,桥接层负责引用计数和清理.对 Python user透明.
8.9.2 Execution期 Objects
Execution期 objects包括:
- instance holder
- Python pass实例
- Callback中创建
Graph/Node/MatchResult/NodeIoPython包装 objects - 可能临时
TensorDesc/Shape/Tensor包装 objects instance_id
这些 objects都绑定到 execution session,而不是绑定到全局 descriptor.
Session结束时:
- Python pass实例释放
- Execution期包装 cache释放
- Execution期有效性 token失效
8.9.3 Borrowed Graph Objects
Graph、Node、Tensor等 objects很多是对 GE当前执行图借用视图.为保证 Python experience不变差,建议:
- Python包装 object内部持有一个 execution期 owner token
- 在 owner token有效时,所有访问都正常工作
- 一旦 user把 object跨 session保存并再次访问,不允许崩溃,而是抛出明确 Python exception,例如:
RuntimeError: graph handle has expired
这样做效果是:
- 不要求文档里给 user增加 "不要缓存这些 objects"硬限制
- 即使 user这么写,也应得到可理解 Python错误,而不是 coredump
8.9.4 TensorDesc / Shape Value Semantics
TensorDesc、Shape这类 objects建议按 value semantics暴露给 Python:
- Python获得是独立 object
- 可安全保存在局部变量或
self上 - 不依赖原始 borrowed graph句柄继续存活
这样更符合 Python user预期,也能减少悬挂引用问题.
8.9.5 Bridge卸载与析构顺序
GE提供两级卸载 semantics,分别对应 "一轮业务结束"和 "进程退出"两个 lifecycle:
Unload — 业务级卸载
一轮图编译完成后, GE调用 UnloadPassPlugins()清理本轮 pass注册态,但不关闭 Python解释器也不卸载 bridge so.这样下一轮业务可以复用已初始化 Python runtime,避免反复初始化/终结带来开销.
当前实现链路:
UnloadPassPlugins()
→ PassPluginLoader::Unload() [pass_plugin_loader.cc]
├─ UnloadPythonFusionBasePasses() // 仅在 python_pass_loaded_为 true时执行
│ → BridgeLoader::Unload() [bridge_loader.cc]
│ ├─ api_->reset_bridge_state() // 通知 bridge清理 Python侧状态并释放 bridge模块引用
│ ├─ ClearPythonFusionBasePassRuntimeRegistry() // 清理 C++侧 runtime注册表
│ └─ PassRegistry::ClearPythonPasses() // 清理 C++侧 pass注册表
│ python_pass_loaded_ = false
└─ CustomPassHelper::Unload() // 清理 C++自定义 pass
Unload不触及 Python解释器 lifecycle和 bridge so句柄,确保下一轮 Load()可以直接复用.
ShutdownForProcess — 进程级关闭
进程退出时, GE调用 ShutdownPassPluginsForProcess()执行完整资源释放.当前有3个入口可以触发:
GEFinalizeV2()— 在线模式进程结束时aclgrphBuildFinalize()— 离线编译结束时GeGenerator::Finalize()— 生成器模式结束时
当前实现链路:
ShutdownPassPluginsForProcess()
→ PassPluginLoader::ShutdownForProcess() [pass_plugin_loader.cc]
├─ 一次性守卫: if (shutdown_done_) return // 确保进程级 shutdown只执行一次
├─ shutdown_done_ = true
│
├─ if (python_pass_loaded_):
│ UnloadPythonFusionBasePasses() // 先清理注册态(同 Unload)
│ → BridgeLoader::Unload()
│ ├─ api_->reset_bridge_state()
│ ├─ ClearPythonFusionBasePassRuntimeRegistry()
│ └─ PassRegistry::ClearPythonPasses()
│ python_pass_loaded_ = false
│
├─ ShutdownPythonFusionBasePassesForProcess() // 无条件执行
│ → BridgeLoader::ShutdownForProcess() [bridge_loader.cc]
│ ├─ if (api_ != nullptr):
│ │ api_->shutdown_bridge() // 调用 bridge so导出 shutdown
│ │ → PybindBridge::Shutdown() [pybind_bridge.cc]
│ │ ├─ ResetBridgeStateUnlocked() // 清理 Python侧状态、释放 bridge模块引用并 gc.collect()
│ │ └─ if (owns_interpreter_): // 仅当解释器由 bridge自己拉起时
│ │ py::finalize_interpreter() // 终结 Python解释器
│ │ owns_interpreter_ = false
│ ├─ api_ = nullptr // 置空,防止后续再调用
│ ├─ if (handle_ != nullptr):
│ │ dlclose(handle_) // 卸载 bridge so
│ │ handle_ = nullptr // 置空,防止 dlclose重复
│ └─ loaded_path_.clear()
│
└─ CustomPassHelper::Unload() // 清理 C++自定义 pass
幂等性保证
由于 ShutdownPassPluginsForProcess()可能从多个入口被重复调用,整条链路通过以下守卫保证幂等:
- PassPluginLoader层 —
shutdown_done_标志:首次执行后置为true,后续调用直接返回SUCCESS - BridgeLoader层 —
api_/handle_空指针守卫:首次执行后置为nullptr,后续调用跳过 shutdown和 dlclose - PybindBridge层 —
Py_IsInitialized()守卫:解释器已终结后不再进入 Python清理逻辑;owns_interpreter_守卫确保只终结自己初始化解释器
卸载顺序核心约束
当前实现遵循以下顺序原则:
- 先清理 C++注册表,再 dlclose bridge so —
UnloadPythonFusionBasePasses()先清理PassRegistry和PythonFusionBasePassRuntimeRegistry,之后才执行ShutdownForProcess()进行dlclose.这保证 dlclose时没有任何 C++ object仍在持有 bridge侧回调函数指针. - 先清理 Python objects,再终结解释器 —
PybindBridge::Shutdown()先调用ResetBridgeStateUnlocked()清理 Python侧注册表、holder和动态加载 pass模块,并在 reset内释放bridge_module_引用、调用gc.collect()打破循环引用,最后才调用py::finalize_interpreter(). - 先终结解释器,再 dlclose so —
shutdown_bridge()在BridgeLoader::ShutdownForProcess()中先于dlclose(handle_)执行,保证 dlclose时 Python解释器已不再运行. - 如果解释器已被外部终结 —
Py_IsInitialized()返回0, bridge跳过所有 Python清理逻辑,仅清理 C++侧状态,不会对已释放 Python objects做DECREF.
这里优先保证 "不崩溃",而不是极限回收所有尾声内存. CPython内部 arena分配器在 Py_Finalize()后可能仍有残余内存不被回收,这是 CPython已知行为,不影响进程正常退出.
8.10 Lock和 GIL策略细化
Lock和 GIL设计目标是:
- 不把 lock概念暴露给 Python user
- 不要求 Python pass作者自己理解或管理 GIL
- 在 bridge内部把 lock粒度控制到最小,避免把整个 GE pass执行路径串行化
建议分三类 lock.
8.10.1 Bridge Management Lock
用于保护:
- 注册表初始化
- 插件发现
- Holder懒加载
- Unload / finalize状态切换
这类 lock只包围 bridge自己状态管理,不包围 user pass逻辑执行.
8.10.2 Execution Session Lock
每个 execution session可有自己轻量状态保护,但不建议让多个 session共享粗粒度互斥锁.
目标是允许:
- 不同 pass不同执行互不阻塞
- 非 Python纯 C++匹配/改图逻辑继续按原有路径运行
8.10.3 Python GIL
统一规则如下:
- 进入Python前获取 GIL
- 离开Python后立即释放 GIL
- 纯C++图匹配、图遍历、数据整理逻辑不持有 GIL
对三类 pass具体策略:
-
FusionBasePass -
回调
run时持有 GIL -
Python返回后立刻释放 GIL
-
PatternFusionPass -
C++ pattern匹配过程不持有 GIL
-
调
Patterns、MeetRequirements、Replacement时短时持有 GIL -
DecomposePass -
C++搜索匹配节点时不持有 GIL
-
调
meet_requirements、replacement时短时持有 GIL
这样可以把 Python执行和 C++执行边界清楚分开,减少不必要全局串行化.
8.10.4 Callback重入策略
V1建议默认支持 "多 session并发、单 session内串行":
- 一个 execution session内部不做并发 Python回调
- 不同 session之间如果由 GE上层并发触发,则通过 GIL自然串行进入 Python
这对 Python user含义是:
- 不需要为了 bridge额外给 pass写 lock
- 如果 user自己使用模块级全局可变状态,仍需自行保证逻辑正确
Bridge不额外限制 user这么写,但也不为 user自己全局共享状态提供自动事务 semantics.
8.11 Pythonic Experience约束
V1设计原则是 "把 lifecycle和并发复杂度收敛在 bridge内部",尽量不给 Python user增加非 Pythonic规则.具体要求如下:
- 不要求 user手工管理 memory
- 不要求 user手工管理 lock或 GIL
- 不要求 user通过特定 context manager才能写 pass
- 不要求 user为了避免复用问题而人为拆散普通 Python写法
在可做到范围内, Python user应当能按普通类来写:
- 用构造函数初始化固定配置
- 用
self保存一次执行内临时状态 - 用普通 Python异常表示错误
- 用普通返回值表示结果
需要如实说明边界只有两类:
-
注册协议边界
-
User仍需通过 decorator或等价注册接口声明 pass,这属于框架接入协议,不属于非 Pythonic限制
-
过期 object边界
-
如果 user跨执行长期保存 borrowed graph视图 object,后续再次访问时会得到 Python异常,而不是被静默支持到无限期
这两类边界是框架接入必需,但不应把 user逼到 "必须按非 Pythonic模式写 code".
8.12 REGISTER_CUSTOM_PASS后续支持设计
REGISTER_CUSTOM_PASS需要支持,但建议放在三类 PassRegistry pass稳定之后扩展阶段实施.原因是:
- 其执行路径与
FusionPassExecutor体系不同,当前更多走CustomPassHelper/ legacy custom pass链路 - 首批优先打通三类 pass,能更快把 descriptor、session、bridge、holder、GIL、异常隔离这些公共底座做稳定
- 等底座稳定后,再接
REGISTER_CUSTOM_PASS,可以显著提高代码复用度,避免再做第二套 Python bridge
推荐复用方式如下:
-
继续复用同一套 Python发现机制
ge.passes.bootstrap- 当前阶段以环境变量为主路径,后续再补
entry_points
-
继续复用同一套 Python注册表与 descriptor机制
- 在
PythonPassKind中新增legacy_custom - Descriptor新增 legacy custom pass所需 metadata
- 在
-
继续复用同一套 pybind bridge
- 不另起第二套 Python runtime初始化
- 不另起第二套 holder / session管理
8.13 PatternFusionPass桥接协议
PatternFusionPass桥接协议定义 adapter如何与 Python pass交互:
8.13.1 Patterns Hook
Patterns() hook返回 pattern列表:
- Python实现应返回
List[Pattern]或等价结构 - Adapter在 C++侧调用
Patterns(),并将返回结果转换为 C++ pattern objects - GIL只在
Patterns()调用期间持有
8.13.2 MeetRequirements Hook
MeetRequirements(match_result) hook判断是否满足融合条件:
- Input:
MatchResultPython wrapper - Output:
bool(True/False) - Adapter在 C++侧调用
MeetRequirements(),传入转换后MatchResult - GIL只在 hook调用期间持有
8.13.3 Replacement Hook
Replacement(match_result) hook生成替换图:
- Input:
MatchResultPython wrapper - Output:
GraphPython object - Adapter在 C++侧调用
Replacement(),传入转换后MatchResult - Adapter将返回 Graph转换为 C++ Graph并执行替换
- GIL只在 hook调用期间持有
9. Python Graph Interfaces补齐设计
9.1 必补能力
Python graph wrapper需要补齐以下能力以支持 Python pass:
Graph::FindNodeByName(node_name)Node::GetAllOutAnchors()Node::GetAllInAnchors()Graph::GetParentGraph()Graph::GetParentNode()- Subgraph相关 interfaces
9.2 Borrowed Handle
所有 graph wrapper objects都是 borrowed handle:
- 不拥有底层 C++ Graph lifetime
- 依赖 execution session有效性
- Session失效后访问会抛出 Python exception
10. Packaging和 Release设计
10.1 Artifacts
最终产物包含:
Native Artifacts:
_ge_pass_native.so: Native helper和 adapter implementationslibge_python_pass_bridge.so: Bridge shared library
Python Packages:
ge_py_pass_bridge: Main Python wheel packagege_py_pass_bridge_cp39: Native sub-wheel for Python 3.9ge_py_pass_bridge_cp310: Native sub-wheel for Python 3.10- ... (cp311 to cp314)
Installation Artifacts:
- Header files:
python_pass_bridge.h,python_pass_adapter.h - Runtime libraries:
.sofiles - Python wheels:
.whlfiles
10.2 Version Strategy
版本策略遵循 Python ABI compatibility:
- Each Python version requires dedicated native artifact
- Native artifacts are backward compatible within same Python minor version
- Python wheel version independent from native artifact version
10.3 Installation Strategy
安装策略:
Run Package Installation:
- All native artifacts packaged in run package
- Installation script selects matching version artifact
- Installation copies header files and
.soto target directory
Python Wheel Installation:
- Main wheel installed to Python environment
- Native sub-wheel installed based on current Python version
pip install --no-index --find-links <path> ge_py_pass_bridge
10.4 Fallback
Fallback机制当 native artifact缺失时:
Fallback Trigger Conditions:
- Native artifact file not found
- Native artifact version mismatch
- Native artifact ABI incompatible
Fallback Code Generation:
- Generate minimal native helper code
- Generate adapter skeleton code
- Compile generated code to temporary
.so - Load temporary artifact
- Cache artifact for reuse
10.5 Local验证约束
本地验证需要:
Environment Requirements:
- Python interpreter available
- Required Python packages installed
- CANN toolkit installed
Build Requirements:
- C++ compiler available
- Required headers and libraries available
Validation Checklist:
- Pass registration successful
- Pass execution successful
- Graph modification correct
- Exception handling correct
10.6 pybind模块边界
pybind模块边界清晰划分:
Bridge Module (pybind_bridge):
- Python解释器管理
- Bridge state management
- Pass module loading
- Exception translation
Native Helper Module:
- Graph/Node/Tensor marshalling
- Attribute value marshalling
- Shape/Format conversion
清晰分离原因:
- Bridge module依赖
Python.h - Native helper可独立编译和替换
- Avoid circular dependencies
10.7 pybind子Wheel组织建议
pybind子wheel组织:
Directory Structure:
ge_py_pass_bridge/
├── ge/
│ └── passes/
│ ├── __init__.py
│ ├── base.py
│ ├── registry.py
│ └── _bridge.so # Platform-specific native
└── pyproject.toml
ge_py_pass_bridge_cp39/
├── ge/
│ └── passes/
│ └── native/
│ └── _ge_pass_native.so
└── pyproject.toml
Wheel Metadata:
- Wheel tag:
cp39-cp39-manylinux2014_x86_64 - Platform: Linux x86_64
- Python version: 3.9
10.8 Fallback Codegen边界
Fallback codegen边界:
What Gets Generated:
- Minimal marshalling functions
- Adapter skeleton
- Bridge registration entry
What NOT Generated:
- Full native helper (性能 critical)
- Complex optimization code
- Platform-specific optimizations
Generated Code Location:
ge/passes/python_pass_artifacts/<python_tag>/<platform>/- Temporary,可被清理
- Cacheable for reuse
10.9 当前工程与后续Codegen兼容策略
兼容策略保证当前工程与后续 codegen平滑演进:
Phase 1: Pre-built Artifacts:
- 手动编写 native helper
- 手动编写 adapter
- 手动 build and package
Phase 2: Codegen Integration:
- 自动生成部分 marshalling code
- 自动生成 adapter skeleton
- Semi-automated build
Phase 3: Full Codegen:
- 自动生成 all native code
- 自动 build and package
- Minimal manual intervention
Compatibility Guarantee:
- Phase 1 artifacts compatible with Phase 2
- Phase 2 artifacts compatible with Phase 3
- User Python code unchanged across phases
10.10 Python解释器来源与Fallback选择约束
Python解释器来源影响 fallback选择:
解释器由Bridge初始化:
- Bridge owns interpreter lifecycle
- Fallback codegen can use current interpreter
- Shutdown时解释器由 bridge终结
解释器由外部初始化:
- Bridge does not own interpreter lifecycle
- Fallback codegen limited by external interpreter state
- Shutdown时解释器由外部管理
Fallback选择约束:
- Prefer pre-built artifact over fallback
- Use fallback only when necessary
- Fallback codegen limited by interpreter ABI stability
11. ATC Extension设计
ATC扩展支持 Python pass参数:
New ATC Parameters:
--py_pass_path: Python pass plugin path--py_pass_module: Specific Python module to load
Parameter Processing:
- Environment variable
ASCEND_GE_PY_PASS_PATHprimary - ATC参数补充 specific paths
- Bootstrap discovers all Python passes
Integration Point:
- ATC编译前调用
PassPluginLoader::Load() - Pass registration completes before compilation starts
- Compilation uses registered Python passes
12. File-level Development Plan
12.1 Python Package和 Discovery Layer
Files to Implement:
ge/passes/__init__.py: Module initializationge/passes/base.py: Pass base classesge/passes/pattern.py: Pattern matching helpersge/passes/registry.py: Pass registryge/passes/bootstrap.py: Plugin discoveryge/passes/runtime.py: Runtime artifact loading
12.2 Python Graph Wrapper补齐
Files to Update:
ge/graph/graph.py: Add new interfacesge/graph/node.py: Add anchor interfacesge/graph/tensor.py: Add tensor interfaces
12.3 C Wrapper和 Native Bridge
Files to Implement:
pybind_bridge.cc: Bridge implementationpython_pass_adapter.h: Adapter definitionspython_pass_adapter.cc: Adapter implementationspython_pass_registry.cc: Registry implementation
12.4 Pass注册核心改造
Files to Modify:
fusion_pass_executor.cc: Add Python pass loadingpass_registry.h: Add Python pass supportpass_registry.cc: Add Python pass creation
12.5 A/B分工与联调边界
Team A Responsibilities:
- Python package implementation
- Graph wrapper补齐
- Bootstrap discovery mechanism
Team B Responsibilities:
- Bridge implementation
- Adapter implementation
- Registry implementation
Integration Boundary:
- Python registration API ↔ Bridge C API
- Bridge ↔ GE Pass Registry
12.6 ATC参数接入
Files to Modify:
atc/main_impl.cc: Add parameter parsingpass_plugin_loader.cc: Add ATC integration
12.7 Documentation、Samples、Testing
Documentation:
- User guide
- API reference
- Design document
Samples:
- Simple fusion pass example
- Pattern fusion pass example
- Decompose pass example
Testing:
- Unit tests for Python pass
- Integration tests for bridge
- End-to-end compilation tests
13. Collaboration和推进方式
13.1 Collaboration总原则
- Python侧和 C++侧并行开发
- 清晰接口定义优先
- 集成测试及时进行
- 文档同步更新
13.2 A/B工作流边界
Team A: Python侧开发
- Week 1-2: Package结构搭建
- Week 3-4: Base classes和 registry
- Week 5-6: Bootstrap和 discovery
Team B: C++侧开发
- Week 1
12.3 C wrapper和 Native Bridge
修改和新增如下 files:
api/python/ge/ge_api_c_wrapper/c_graph.ccapi/python/ge/ge_api_c_wrapper/c_gnode.ccapi/python/ge/ge_api_c_wrapper/c_tensor.ccapi/python/ge/ge_api_c_wrapper/c_match_result.ccapi/python/ge/ge_api_c_wrapper/ge_api_c_wrapper_utils.hcompiler/graph/fusion/pass/pass_plugin_loader.cccompiler/graph/fusion/pass/pass_plugin_loader.hcompiler/graph/fusion/pass/python_fusion_base_pass_bridge_c_api.hcompiler/graph/fusion/pass/python_fusion_base_pass_bridge_loader.cccompiler/graph/fusion/pass/python_fusion_base_pass_pybind_bridge.cccompiler/graph/fusion/pass/python_fusion_base_pass_pybind_bridge.h- 新增
_ge_pass_native.so源码、导出 header和 build script api/python/ge/ge_api_c_wrapper/CMakeLists.txtapi/python/ge/CMakeLists.txtcompiler/CMakeLists.txt
职责如下:
- 为 Python graph wrapper提供 C接口
- 提供基于
pybind11Python pass bridge / helper so - 接入 wheel打包与安装
其中建议进一步拆分 responsibility:
-
c_graph.cc/c_gnode.cc/c_tensor.cc/c_match_result.cc -
继续服务
ge.graph/ge.esctypes路线 -
独立 bridge
.so -
负责 Python runtime初始化、descriptor同步、holder管理、adapter原生 logic以及 Python/C++ object转换
-
负责承接预编译 / fallback产物
-
_ge_pass_native.so -
负责
Graph/PassContext/MatchResult等 native-backed wrapper与 helper -
负责与
base.pyobject来源对接 -
不承载 user pass基类,也不要求 user直接 import C++ pass基类继承
-
pass_plugin_loader.cc/.h -
负责定位并
dlopenbridge.so -
负责和 bridge
.so稳定 ABI对接 -
python_fusion_base_pass_bridge_c_api.h -
定义 bridge loader与
libge_python_pass_bridge.so之间稳定 C ABI -
当前入口为
GeGetPythonFusionBasePassBridgeApi() -
python_fusion_base_pass_bridge_loader.cc -
位于
ge_compiler.so一侧 -
负责
dladdr定位、dlopen/dlsym、cache bridge API、以及把 registrar callback传入 bridge -
当前显式使用
RTLD_GLOBAL装载 bridge,以便 embedded CPython后续导入标准库/native extension时能解析到libpythonsymbols -
python_fusion_base_pass_pybind_bridge.cc/.h -
位于
libge_python_pass_bridge.so一侧 -
负责 Python runtime初始化、descriptor同步、holder lifecycle和
create/run/destroycallback实现 -
对
pass_plugin_loader暴露稳定 ABI,对 Python侧复用bootstrap / _bridge协议
正式架构边界应是 "bridge artifact set可替换, ge_compiler.so稳定".其中:
libge_python_pass_bridge.so是 GE内部 loader视角主入口_ge_pass_native.so是 Python视角 helper extension- 二者必须作为同一 version、同一 build key配套产物管理
- 当前第一批已把
python_fusion_base_pass_pybind_bridge.cc从ge_compilertarget中迁出,并新增ge_python_pass_bridgetarget产出libge_python_pass_bridge.so ge_compiler.so当前只保留 loader、adapter、registry/runtime entry等稳定 semantics; bridge so才直接依赖Python3::Python与pybind_options
12.4 Pass注册核心改造
修改如下 files:
compiler/graph/fusion/pass/pass_registry.cccompiler/graph/fusion/pass/fusion_pass_executor.cc- 新增创建期 context管理 files,例如:
compiler/graph/fusion/pass/pass_create_context.hcompiler/graph/fusion/pass/pass_create_context.cc
职责如下:
- 在
create_fn()调用点注入 TLS创建 context - 让通用 creator能按
pass_name找到对应 Python descriptor - 保持现有 C++ pass行为不变
建议职责再细化为:
-
pass_create_context.h/.cc -
定义 TLS context与 RAII scope
-
fusion_pass_executor.cc -
在
InitPassesIfNeed中围绕create_fn()加 scope -
Bridge注册函数
-
将 Python pass descriptor注册为 "固定 creator函数 + pass_name metadata"
说明:
graph_fusion.cc属于 legacy兼容链路,不纳入本方案后续支持范围REGISTER_CUSTOM_PASS后续支持走独立扩展阶段,但仍复用同一套 descriptor / bridge / session机制
12.5 A/B分工与联调边界
当前建议按以下边界并行推进:
- A负责
libge_python_pass_bridge.so、pass_plugin_loader、ge_compiler.so稳定 ABI、adapter路由、fallback装载,以及现有ge.graph.Graphborrowed view接入 - B负责
_ge_pass_native.so、base.py、PassContext/MatchResultnative-backed wrapper,以及 Python sample / Python API补齐
B需要明确交付:
_ge_pass_native.so构建脚本与模块导出PassContextborrowed view wrapperMatchResult最小可用 wrapper- 必要 helper /工厂 interfaces,供
libge_python_pass_bridge.so构造 Python objects base.py/pattern.py中PassContext/MatchResult/Pattern/PatternMatcherConfignative-backed直接导出- Python sample和 Python API补齐所需最小能力清单
A需要明确交付:
graph.py中Graph._create_from(handle, owns_handle, owner)borrowed / non-owning semanticspython_fusion_base_pass_pybind_bridge.cc中BuildPythonGraph()对现有ge.graph.Graph正式接入libge_python_pass_bridge.so与_ge_pass_native.so桥接集成点
关于 Graph这条边界,需要特别固定如下原则:
Graph优先复用当前已经存在ge.graph.Graph_ge_pass_native.so不再引入第二套 user可见Graphtype- A负责把 runtime
GraphPtr以 borrowed view方式接到现有ge.graph.Graph - B不直接拥有
Graphtype本身,而是围绕PassContext/MatchResult/ helper提供配套能力
B完成后, base.py应收敛为:
FusionBasePass/PatternFusionPass/DecomposePass仍然保持纯 Python基类PassContext/MatchResult/PatternMatcherConfig直接 re-export_ge_pass_native提供 typesPattern通过ge.passes.pattern直接导出_ge_pass_native提供 types- 不再为
_ge_pass_native缺失场景保留兼容 shim
A在本地拆分 bridge .so与 loader时,阶段1可先不依赖 _ge_pass_native做最小验证,但这只是一条临时 bring-up策略,不应继续保留到阶段2收口后正式 code中:
- 先继续使用
FusionBasePass现有纯 Python合约类 _bridge.py与 bridge.so先验证 descriptor、holder、create/run/destroy、dlopen、fallback产物选择- 涉及
PatternFusionPass正式端到端验收,等 B_ge_pass_native落地后再合流完成
补充约束:
- A不拥有
base.py,只消费 B暴露出来稳定 Python interfaces - B不直接拥有
libge_python_pass_bridge.so,只提供 bridge需要使用稳定 Python / native helper能力
12.6 ATC参数接入
修改如下 files:
api/atc/main_impl.cc
职责如下:
- 当前不新增新 user pass路由 parameters
- 后续若补 CLI / option,也应最终归一到
ASCEND_GE_PY_PASS_PATH或ge.passes.bootstrap统一发现协议
12.7 Documentation、Samples、Testing
新增和修改如下 files:
examples/fusion_pass/README.mdexamples/fusion_pass/python_pass开发指南.mdexamples/fusion_pass/*/python/*tests/ge/ut/ge/graph/pyge_tests/*_test.pytests/ge/python_pass/*
职责如下:
- 给出 Python sample
- 给出 user使用 documentation
- 完成单测与集成验证
13. Collaboration and Advancement Approach
Notes:
- This design document is responsible for maintaining long-term stable architectural boundaries, A/B collaboration constraints, interface freeze points, and acceptance principles
PLAN.mdis the sole source for phase progress, checklists, and completion status- Phase objectives, A/B sub-objectives, and completed/pending status will only be updated in
PLAN.md; this design document will no longer maintain similar progress information
13.1 Overall Collaboration Principles
It is recommended to continuously advance along two parallel workflows, with the basic principle of "freezing interfaces first, separating write sets as much as possible, and unifying phase status back to PLAN.md".
Parallel collaboration follows these constraints:
- Freeze public interfaces first, then develop concurrently:
- Descriptor/callback protocol between
ge.passes._bridgeand native bridge - Minimum interfaces visible to Python in
ge.graph/ge.es/_ge_pass_native.so
- Descriptor/callback protocol between
- Progress tracking is maintained uniformly in
PLAN.md - The design document only retains long-term valid boundaries, not process records of "who completed what"
- Phase acceptance can be executed gradually by phase, but whether a "phase is complete" is determined by the checklist in
PLAN.md
13.2 A/B Workflow Boundaries
The long-term boundaries of the two workflows are as follows:
- A focuses on compiler / native bridge / loader / adapter / fallback / existing
ge.graph.Graphintegration - B focuses on
_ge_pass_native.so,base.py,PassContext/MatchResult, Python samples, and Python API completion
The fixed boundaries directly related to current implementation are as follows:
- A is responsible for
libge_python_pass_bridge.so,pass_plugin_loader,ge_compiler.sostable ABI, and existingge.graph.Graphborrowed view integration - B is responsible for
_ge_pass_native.so,base.py, native-backed form ofPassContext/MatchResult, and subsequent completion of samples / Python API Graphprioritizes reusing the existingge.graph.Graph_ge_pass_native.sono longer introduces a second user-visibleGraphtype- A does not own
base.py - B does not own
libge_python_pass_bridge.so
13.3 Phase Advancement and Delivery Approach
Subsequent phases should collaborate in the following order:
- First freeze phase boundaries and completion definitions, and write them into
PLAN.md - Then freeze minimum interfaces between A/B
- Develop in parallel according to write sets
- Prioritize completing test samples and targeted validation during integration
- After passing, update checklist status back to
PLAN.md
For phase definitions themselves, it is recommended to maintain the following long-term order:
FusionBasePassformally connectedPatternFusionPassconnectedDecomposePassconnected- Fallback and prebuilt version connected
- Python equivalent implementation of samples
- Supporting documentation, validation, and delivery materials completed
Detailed sub-items, status, and blockers for these phases are uniformly referenced in PLAN.md and will not be expanded again in the design document.
13.4 Phase Acceptance and Documentation Synchronization
At the conclusion of each phase, it is recommended to complete the following actions synchronously:
- Update the completion status of the corresponding checklist in
PLAN.md - If interface boundaries change, update this design document
- If only status changes, do not modify phase descriptions in this design document
- No longer add independent phase progress/acceptance markdown files; process progress is uniformly written back to
PLAN.md - Preserve minimum validation commands, result summaries, and known blockers for phase acceptance
This ensures:
PLAN.mdreflects real-time progress- The design document remains stable and is not polluted by process status
- A/B always has a single source of progress during integration
14. Validation and Acceptance Requirements
14.1 Test Layering
V1 testing is recommended to be organized in four layers:
- Python unit tests:
registry,decorator,bootstrap- Environment variable primary path, subsequent
entry_pointsauto-discovery - Descriptor normalization
- Stale handle checking for borrowed handles
- C++ / native unit tests:
- Bridge initialization and repeated initialization
- TLS creation context
- Dynamic registration
- Session lifecycle
- Exception isolation
- Integration tests:
FusionBasePassPatternFusionPassDecomposePassMatchResultGeUtils.InferShapeGeUtils.CheckNodeSupportOnAicore
- Packaging and installation tests:
- Main wheel installation
- Native sub-wheel selection
- Fallback compilation when no matching version
- Development path / direct module passing
14.2 Phase Acceptance Principles
The completion definition, A/B sub-objectives, status, and blockers for each phase are uniformly based on PLAN.md, and this design document will no longer maintain hard-coded acceptance nodes for "phase one/phase two/phase three".
Phase acceptance is recommended to uniformly adopt the following structure:
- Completion definition:
- Directly reference the "phase completion definition" for the corresponding phase in
PLAN.md
- Directly reference the "phase completion definition" for the corresponding phase in
- Required validations:
- At least cover one positive main chain
- At least cover exception paths or failure isolation for newly added interfaces in that phase
- At least cover lifecycle, repeated loading, or resource release parts directly related to that phase
- Conclusion requirements:
PLAN.mdcorresponding checklist status update completed- Affected interface boundaries, loading relationships, or responsibility divisions in the design document updated
- Validation commands, result summaries, and known blockers preserved in phase acceptance records
Additionally, one constraint needs to be followed:
- A capability belongs to whichever phase it is in, and is accepted in that phase, not merged into other phases in advance
- For example, productization capabilities like
entry_points, prebuilt versions, multi-version native artifacts, and fallback should be based on the corresponding phase inPLAN.md, rather than being written into the completion criteria of earlier phases
14.3 Milestone Organization Recommendations
Considering the project advances on a "two-week" rhythm, formal milestones are recommended to be organized dynamically according to PLAN.md current priorities, rather than hard-coding "certain two phases must be accepted together" in this design document.
It is recommended that each milestone follows these principles:
- One milestone should try to conclude only
1~2themes that can form a closed loop - Prioritize organizing around the current main objective, for example:
FusionBasePassclosed loopPatternFusionPassclosed loopDecomposePassclosed loop- Fallback / prebuilt version closed loop
- Sample and delivery materials closed loop
- Each milestone should output:
- Completed checklist delta for this round
- Targeted validation commands and result summaries
- Remaining blockers
- Handoff prerequisites for the next milestone
Extension items are recommended to be organized as separate milestones, not strongly bound to V1 main chain acceptance. For example, REGISTER_CUSTOM_PASS is more suitable for separate acceptance after the main chain stabilizes.
14.4 Recommended Acceptance Deliverables
Each formal acceptance is recommended to preserve at least the following deliverables:
- Test result summary table:
- Test case name
- Covered capability points
- Execution result
- Failure reason and conclusion
- Sample execution records:
- Input model or script
- Triggered pass
- Key logs or result summaries
- Known issues list:
- Whether it blocks the next phase
- Workaround approach
- Planned fix phase
14.5 Overall Acceptance Dimensions
V1 final acceptance is recommended to be based on the following dimensions:
- Discovery and loading:
- Discovery mechanisms currently within scope are consistent with
PLAN.mdand consistent with user documentation - If
entry_points, prebuilt versions, or fallback are included in this round, corresponding chains have independent acceptance records
- Discovery mechanisms currently within scope are consistent with
- Three types of pass main chains:
FusionBasePass,PatternFusionPass,DecomposePassare discoverable, registrable, creatable, and executable within their respective scopes
- Python / native wrapper:
Graph,PassContext,MatchResult, and helpers required by the phase have corresponding capabilities
- Stability:
- Python pass import failures can be isolated
- Python pass execution exceptions can be isolated
- Lifecycle, repeated loading, and resource release semantics have validation records
- Delivery and materials:
PLAN.md, design documents, samples, validation records, and limitation descriptions remain consistent
14.6 Phase Progress Reference
Current phase completion status, A/B sub-objectives, incomplete items, and blockers are not repeatedly maintained in this design document; PLAN.md is the sole reference.
This design document only retains the following long-term acceptance requirements:
- Each phase conclusion must synchronously update
PLAN.md - If interfaces, responsibility boundaries, or loading relationships change, this design document must be synchronously updated
- If only task status changes, only update
PLAN.md - Phase acceptance requires preserving validation commands, result summaries, and known blockers
15. Risks and Points of Attention
- If
CreateFusionPassFnis directly changed tostd::function, it may introducedlcloseand global destructor ordering coupling risks; prioritize adopting the "function pointer + TLS creation context + process-level registry" approach. Graphborrowed wrapper has been implemented, but subsequently added runtime wrappers must still comply with the constraint of "not taking runtime handle ownership by default" to avoid reintroducing double-free risks.- Local development environment is Python 3.13, while formal wheel planning is
cp39-cp314; local testing can only cover Python tags available in the current environment, and the full matrix requires release pipelines to validate separately by Python minor version. - V1 needs to clearly distinguish two types of native strategies in documentation and implementation:
ge.graphcontinues to use ctypes/C wrapper, Python pass bridge usespybind11. - If more Python version-sensitive logic continues to be compiled directly into
ge_compiler.so, the evolution space for subsequent bridge artifact sets and fallback will be compressed; therefore, subsequent implementations must converge version differences into replaceablelibge_python_pass_bridge.so/_ge_pass_native.so. - Within
atc.bin, Python may be initialized first by TBE, or Python may be initialized first by Python pass bridge; fallback selection must be based on the current in-process interpreter or unified selector, and bridge state cleanup should be designed separately from interpreter finalize, ensuring the interpreter is only finalized by the owner after all Python users have cleaned up. PatternFusionPassPython implementation is not a simple function callback; it must ensure that the pattern/match/replacement three-stage semantics are consistent with the existing C++ framework.- Before B's
_ge_pass_nativeis implemented, local validation can temporarily not depend on it, but this can only be used for independent bridge.sosplitting andFusionBasePassregression; it cannot replace the final acceptance ofPatternFusionPass. - Providing Python equivalents for all 9 samples will expand wrapper coverage; priority should be given to ensuring main chain availability before gradually completing edge interfaces.
16. Reuse Boundaries with Subsequent Custom Operator Python Implementation
This scheme is not just for Python pass as a single point service, but is laying a generic foundation for "GE/CANN main framework safely calling Python extension capabilities".
Subsequently, if "custom operator Python implementation" is advanced, a large part of the infrastructure in this scheme can be directly reused, but operator definition, delivery, and sinking-related capabilities still need new dedicated designs.
16.1 Directly Reusable Capabilities
The following capabilities can be directly reused in subsequent custom operator Python implementation:
- Python plugin discovery protocol: Currently based on environment variable primary path as baseline, with
entry_pointsauto-discovery to be added later; this protocol can be abstracted into a generic Python extension discovery framework. - Main wheel + native sub-wheel + fallback codegen: This "pure Python main package + multi-Python version native companion package + local fallback compilation when version not matched" release mode is essentially independent of pass/custom op.
- Bridge lifecycle management: Python interpreter initialization, repeated initialization idempotency, exception isolation, unload order control, logging, and diagnostics can all be reused.
- GIL and lock strategies: The strategy of "management lock + session lightweight state + GIL precise surround for Python callbacks" defined in the current document applies to the vast majority of subsequent Python extension integration points.
- Execution session / holder model: Runtime objects and Python wrapper owner tokens, stale handle checks, borrowed objects converting to Python exceptions rather than crashes after becoming invalid; this mechanism also applies to custom operator callback period objects.
- Python registry and descriptor model: The registry, descriptor, bootstrap, and bridge exit in the current pass design can evolve into a generic "Python extension registry".
- Pythonic constraints: User experience goals of not requiring manual object release, not requiring manual GIL management, and not requiring explicit close/release from users should continue to be maintained.
Looking only at the foundation layer of "Python capabilities formally integrated into GE/CANN", the reuse rate at this layer is estimated to reach 60%~70%.
16.2 Partially Reusable Capabilities
The following capabilities can reuse a portion but need to be tailored according to custom operator semantics:
ge.graph/ge.eswrapper: If subsequent custom operator Python implementation needs to express schema, infer logic, graph-level replacement, or eager graph entry, these graph wrappers still have value; but they cannot replace operator schema's own modeling.- Plugin path and installation path management: The current repository already has custom opp path and plugin manager capabilities, such as
custom_op_lib_path_related logic ingraph_metadef/base/common/plugin/plugin_manager.cc, which indicates there is an existing foundation for "plugin discovery" and "custom deliverable installation paths"; but Python custom op still needs to define clearer mapping rules between Python packages and OPP packages. - Error propagation and degradation strategy: The "single plugin failure does not bring down the main chain" approach in current passes still applies, but custom ops are often closer to the compilation main flow than passes; which errors allow degradation and which errors must hard fail needs to be re-graded.
- ATC parameter entry: The parameter integration approach reserved for
--py_pass_path,--py_pass_modulein this scheme can be extended to Python custom op later; but parameter naming, priority, and relationship with existing custom opp directory configuration still need to be determined.
This layer is more like "mechanism reuse", not "implementation direct copy". Reuse ratio is roughly 20%~30%.
16.3 Capabilities Needing New Addition for Custom Operators
The following parts cannot be directly obtained from the Python pass scheme and need separate design:
- Custom operator schema/OpDef registration model: Including inputs, outputs, attributes, constraints, default values, version, and namespace.
- Shape/type inference Python registration and execution model.
- Kernel delivery chain: For example, AscendC/Triton/TBE/host-side implementation, compilation artifacts, binary layout, version validation.
- OPP package layout and installation protocol: Directory organization relationships between Python packages, op proto, kernel binaries, tiling files, and configuration files.
- Compile/runtime recognition logic: How ATC and online compilation phases discover, validate, and sink Python custom ops.
- Custom operator and framework adaptor connection: For example, the PyTorch/TensorFlow graph entry methods shown in current
examples/custom_op, this part is clearly beyond the scope of pass responsibilities.
If the subsequent goal is only "Python writing schema + infer + registration", the reuse from the current pass scheme will be higher; if the goal also includes "Python side fully driving kernel development, packaging, and publishing", the new work will increase significantly.
16.4 Reuse Conclusion
It is recommended to view "Python pass implementation" and "Python custom op implementation" as two phases:
- The first phase completes Python pass first, stabilizing the generic foundation of Python plugin integration, lifecycle, memory management, lock/GIL, packaging, and fallback.
- The second phase adds custom operator-specific models on this foundation, including schema, infer, kernel delivery, and OPP layout.
Rough estimate from project dimension:
- Infrastructure layer reuse:
60%~70% - Overall project reuse:
40%~50%
This ratio is sufficient to demonstrate that the current Python pass design is not a one-time solution, but is proactively building a public foundation that subsequent Python custom ops can continuously reuse.