Fork
0
代码
介绍
代码
Issues
Pull Requests
流水线
Actions
讨论
Wiki
项目成员
分析
项目设置
Fork
0
master
msprobe
/
python
/
msprobe
下载当前目录
ascend-robot
【bugfix】Add npu device type check for int8 tensor fractal_nz format
912c6e9a
创建于
22 小时前
历史提交
文件
最后提交记录
最后更新时间
core
【bugfix】Add npu device type check for int8 tensor fractal_nz format Co-authored-by: mnhdxnh<947098055@qq.com> # message auto-generated for no-merge-commit merge:
!744
merge master into master 【bugfix】Add npu device type check for int8 tensor fractal_nz format Created-by: mnhdxnh Commit-by: mnhdxnh Merged-by: ascend-robot Description: fix: Add npu device type check for int8 tensor fractal_nz format See merge request: Ascend/msprobe
!744
22 小时前
infer
【安全】delete rest owner check in offline dump and compare Co-authored-by: ylw1234<lwying007@126.com> # message auto-generated for no-merge-commit merge:
!741
merge master into master 【安全】delete rest owner check in offline dump and compare Created-by: ylw1234 Commit-by: ylw1234 Merged-by: ascend-robot Description: delete rest owner check in offline dump and compare See merge request: Ascend/msprobe
!741
22 小时前
lib
[feat]Device MD5构建工程 Co-authored-by: Tjh-UKN<2559659915@qq.com> # message auto-generated for no-merge-commit merge:
!711
merge md5 into master [feat]Device MD5构建工程 Created-by: Tjh-UKN Commit-by: Tjh-UKN Merged-by: ascend-robot Description: 背景:MD5计算需要host同步,现在提供device上计算MD5的算子,加速MD5计算 自验证:
https://www.yuque.com/taejohnson/tzgbbf/cbai414pbcpgr9mc
See merge request: Ascend/msprobe
!711
1 天前
mindspore
【安全】delete chmod in python dir Co-authored-by: ylw1234<lwying007@126.com> # message auto-generated for no-merge-commit merge:
!707
merge master into master 【安全】delete chmod in python dir Created-by: ylw1234 Commit-by: ylw1234 Merged-by: ascend-robot Description: delete chmod in python dir See merge request: Ascend/msprobe
!707
4 天前
msaccucmp
【安全】delete rest owner check in offline dump and compare Co-authored-by: ylw1234<lwying007@126.com> # message auto-generated for no-merge-commit merge:
!741
merge master into master 【安全】delete rest owner check in offline dump and compare Created-by: ylw1234 Commit-by: ylw1234 Merged-by: ascend-robot Description: delete rest owner check in offline dump and compare See merge request: Ascend/msprobe
!741
22 小时前
overflow_check
【优化】overflow_check性能优化 Co-authored-by: wugengjun<wugengjun1@huawei.com> # message auto-generated for no-merge-commit merge:
!456
merge master0317 into master 【优化】overflow_check性能优化 Created-by: wugengjun Commit-by: wugengjun Merged-by: ascend-robot Description: 针对
overflow_check
模块中的
_connect_comm_nodes()
方法,通过引入哈希索引和缓存机制,将时间复杂度从 O(R² × N²) 降低到 O(R × N) 在256卡场景下测试,由原本的6小时多,优化为1小时45分左右 See merge request: Ascend/msprobe
!456
2 个月前
pytorch
【bugfix】monitor删除format配置项 Co-authored-by: l30036321<lvkaimeng@gmail.com> # message auto-generated for no-merge-commit merge:
!745
merge master into master 【bugfix】monitor删除format配置项 Created-by: lv-kaimeng Commit-by: l30036321 Merged-by: ascend-robot Description: 【bugfix】monitor删除format配置项 See merge request: Ascend/msprobe
!745
1 天前
response_anomaly
推理服务亚健康检查模块:添加功能材料、脚本转换代码和一些代码优化 Co-authored-by: g30066442<gaoyelu@h-partners.com> # message auto-generated for no-merge-commit merge:
!685
merge master into master 推理服务亚健康检查模块:添加功能材料、脚本转换代码和一些代码优化 Created-by: gaoyelu1996 Commit-by: g30066442 Merged-by: ascend-robot Description: 添加功能材料、模型参数和词表映射脚本转换代码和一些代码优化 See merge request: Ascend/msprobe
!685
7 天前
visualization
【安全】delete chmod in python dir Co-authored-by: ylw1234<lwying007@126.com> # message auto-generated for no-merge-commit merge:
!707
merge master into master 【安全】delete chmod in python dir Created-by: ylw1234 Commit-by: ylw1234 Merged-by: ascend-robot Description: delete chmod in python dir See merge request: Ascend/msprobe
!707
4 天前
__init__.py
【bugfix】解决环境中有tensorflow环境,导入工具模块报错的问题 Co-authored-by: h00613304<hekunkun@huawei.com> # message auto-generated for no-merge-commit merge:
!651
merge master into master 【bugfix】解决环境中有tensorflow环境,导入工具模块报错的问题 Created-by: kun_8 Commit-by: h00613304 Merged-by: ascend-robot Description: 执行以下代码时发生段错误: ```python from msprobe.pytorch import PrecisionDebugger # ← 触发崩溃 ``` **错误输出:** ``` Segmentation fault (core dumped) ``` --- ## 崩溃堆栈分析 ### 调用链 ``` Python 导入链: test_core_dump_error.py └── msprobe.pytorch └── msprobe/pytorch/monitor/data_writers.py:24 └── torch.utils.tensorboard.SummaryWriter └── tensorboard.compat.tf (懒加载) └── tensorflow └── tensorflow.python.platform.self_check.preload_check() └── _pywrap_cpu_feature_guard.InfoAboutUnusedCPUFeatures() ← 崩溃点 ``` ### 崩溃点代码 ```python # tensorflow/python/platform/self_check.py:59-64 else: # Load a library that performs CPU feature guard checking. from tensorflow.python.platform import _pywrap_cpu_feature_guard _pywrap_cpu_feature_guard.InfoAboutUnusedCPUFeatures() # ← Segfault here ``` ## 根因分析 TensorFlow 的
_pywrap_cpu_feature_guard
是一个 C++ 扩展模块,用于检测 CPU 支持的指令集特性(如 AVX, SSE, NEON 等)。 在 **ARM aarch64** 架构上,TensorFlow 2.21.0 的预编译 wheel 可能存在以下问题: 1. **CPU 特性检测代码路径未正确适配 ARM 架构** 2. **与 NPU 驱动/运行时环境存在内存冲突** 3. **缺少必要的 CPU 指令集支持检测** ### 触发路径 msprobe 使用 PyTorch 的
SummaryWriter
进行日志记录,而 PyTorch 的 tensorboard 集成会尝试导入 TensorBoard。TensorBoard 的
compat
模块会懒加载 TensorFlow 以提供兼容性支持,最终导致 TensorFlow 的初始化代码被执行。 ## 解决方案 ### 创建自动补丁文件 ```python """msprobe 导入补丁 - 在导入 msprobe 前加载此模块""" import sys try: import tensorboard.compat.tensorflow_stub as tensorflow_stub sys.modules['tensorflow'] = tensorflow_stub sys.modules['tensorflow.python'] = tensorflow_stub except ImportError: pass ``` See merge request: Ascend/msprobe
!651
19 天前
config.json
add extra info config to enable stack&construct Co-authored-by: TAJh<taojiaheng1@huawei.com>
2 个月前
msprobe.py
【安全】add Input Path Security Statement (softlink) Co-authored-by: ylw1234<lwying007@126.com> # message auto-generated for no-merge-commit merge:
!740
merge master into master 【安全】add Input Path Security Statement (softlink) Created-by: ylw1234 Commit-by: ylw1234 Merged-by: ascend-robot Description: add Input Path Security Statement (softlink) See merge request: Ascend/msprobe
!740
1 天前