<!DOCTYPE html><html lang='zh'><head><meta charset='utf-8'><meta name='viewport' content='width=device-width, initial-scale=1'><title>msmonitor_review_run - msmonitor 文档体验审查报告</title><style>
body{margin:0;background:#f5f2ea;color:#171612;font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',sans-serif;line-height:1.62}
*{box-sizing:border-box}
a,td,th,div,p,li,span{overflow-wrap:anywhere;word-break:break-word}
.page{max-width:1160px;margin:0 auto;padding:28px 18px 64px}
.topbar{display:flex;flex-wrap:wrap;justify-content:space-between;gap:12px;align-items:center;margin-bottom:18px}
.topbar a{color:#8a4b2d;text-decoration:none;font-weight:600}
.hero{background:#1d1f1b;color:#fff;border-radius:18px;padding:22px 22px 18px;box-shadow:0 12px 28px rgba(0,0,0,.16)}
.hero h1{margin:0;font-size:1.68rem;line-height:1.22;font-weight:800;letter-spacing:-.01em}
.hero-meta{display:flex;flex-wrap:wrap;gap:10px 16px;margin-top:14px;color:rgba(255,255,255,.84);font-size:.9rem}
.hero-meta a{color:#ffd7ca}
.card-grid{display:grid;grid-template-columns:repeat(auto-fit,minmax(190px,1fr));gap:14px;margin:18px 0 24px}
.mini-card{background:#fff;border:1px solid #e6ded0;border-radius:14px;padding:14px 15px;box-shadow:0 8px 18px rgba(20,20,18,.05)}
.mini-card .label{font-size:.73rem;text-transform:uppercase;letter-spacing:.05em;color:#817869;margin-bottom:6px}
.mini-card .value{font-size:1rem;font-weight:650;color:#171612;line-height:1.45}
.section{background:#fff;border:1px solid #e6ded0;border-radius:18px;padding:22px 20px;margin-top:18px;box-shadow:0 10px 24px rgba(20,20,18,.05)}
.section h2{margin:0 0 14px;font-size:1.28rem;line-height:1.28;font-weight:800}
.section h3{margin:0 0 11px;font-size:1.06rem;line-height:1.36;font-weight:760}
.section p{font-size:.97rem;line-height:1.7;color:#1d1b17}
.summary-list,.note-list{margin:10px 0 0;padding-left:18px}
.summary-list li,.note-list li{margin-top:7px;font-size:.96rem}
.scenario-grid{display:grid;grid-template-columns:repeat(auto-fit,minmax(300px,1fr));gap:16px}
.scenario-card{background:#fcfbf7;border:1px solid #e8dfd0;border-radius:16px;padding:18px 16px;text-decoration:none;color:inherit;display:block;box-shadow:0 8px 18px rgba(20,20,18,.04)}
.scenario-card:hover{border-color:#d0b79f;transform:translateY(-1px)}
.scenario-card h3{margin:0 0 8px;font-size:1.08rem;line-height:1.35}
.scenario-card p{margin:0 0 10px;color:#5f584d;font-size:.94rem;line-height:1.62}
.meta-row{display:flex;flex-wrap:wrap;gap:8px 10px;margin-top:12px}
.pill{display:inline-flex;align-items:center;border-radius:999px;padding:4px 10px;font-size:.74rem;font-weight:750;line-height:1.2}
.table-pill{justify-content:center;min-width:72px;max-width:132px;white-space:normal;text-align:center;padding:6px 12px;font-size:.78rem;line-height:1.3}
.sev-blocker{background:#f8d6d6;color:#8f2020}.sev-high{background:#fde4cf;color:#9a531e}.sev-medium{background:#f6edca;color:#7b6615}.sev-low{background:#dff0db;color:#2f6b34}.sev-neutral{background:#ece7dc;color:#645f55}
.status-ok{background:#dff0db;color:#2f6b34}.status-blocked{background:#f8d6d6;color:#8f2020}.status-deviation{background:#fde4cf;color:#9a531e}.status-skipped{background:#ece7dc;color:#645f55}.status-neutral{background:#ece7dc;color:#645f55}
.report-table{width:100%;border-collapse:collapse;border:1px solid #e8dfd0;border-radius:14px;overflow:hidden;background:#fff}
.report-table th,.report-table td{border:1px solid #e8dfd0;padding:11px 12px;vertical-align:top;text-align:left;font-size:.92rem;line-height:1.58}
.report-table th{background:#f4efe4;color:#645c4f;font-size:.83rem;text-transform:uppercase;letter-spacing:.04em;font-weight:800;text-align:center;vertical-align:middle}
.report-table tbody tr:nth-child(even) td{background:#fcfaf5}
.report-table td.cell-center{text-align:center;vertical-align:middle}
.timeline{display:grid;grid-template-columns:1fr;gap:16px;margin:12px 0 18px}
.timeline-card{border:1px solid #e6ddd0;border-left:4px solid #d3ba9b;border-radius:18px;background:linear-gradient(180deg,#fcfbf7 0%,#fff 100%);padding:16px 16px 15px;box-shadow:0 10px 20px rgba(20,20,18,.05)}
.timeline-card h4{margin:0 0 6px;font-size:1.1rem;line-height:1.4;font-weight:820;letter-spacing:-.01em}
.timeline-step-tag{display:inline-flex;align-items:center;padding:4px 9px;border-radius:999px;background:#efe5d6;color:#7c5936;font-size:.73rem;font-weight:800;letter-spacing:.04em;text-transform:uppercase;margin-bottom:9px}
.timeline-meta{display:flex;flex-wrap:wrap;gap:8px;margin-bottom:10px}
.timeline-block{margin-top:10px;padding:11px 12px;border-radius:12px;background:#fff;border:1px solid #ece3d6}
.timeline-label{display:inline-flex;align-items:center;padding:4px 8px;border-radius:999px;background:#f2ebe0;color:#6b5a43;font-size:.76rem;letter-spacing:.03em;font-weight:800;margin-bottom:8px}
.timeline-content{font-size:.97rem;line-height:1.66;color:#1d1b17}
.timeline-block.intro{background:#f8f3eb;border-color:#e7dccb}
.issue-grid{display:grid;grid-template-columns:1fr;gap:18px}
.issue-card{border:1px solid #e8dfd0;border-radius:16px;background:#fff;padding:16px 15px;box-shadow:0 8px 18px rgba(20,20,18,.04)}
.issue-card h4{margin:0;font-size:1.08rem;line-height:1.42;font-weight:800}
.issue-card-header{display:flex;justify-content:space-between;gap:10px;align-items:flex-start;margin-bottom:10px}
.meta-line{margin:8px 0;font-size:.96rem;line-height:1.62}
.quote-box{background:#f8f4ea;border:1px solid #e7decd;border-radius:12px;padding:12px 13px;margin:13px 0}
.issue-label{display:inline-flex;align-items:center;padding:4px 8px;border-radius:999px;background:#f2ebe0;font-size:.73rem;text-transform:uppercase;letter-spacing:.04em;font-weight:800;color:#6f614d;margin-bottom:7px}
.issue-block{margin-top:13px}
.issue-block p,.issue-block li,.quote-box p,.quote-box li{font-size:.96rem;line-height:1.68}
.doc-links{display:flex;flex-wrap:wrap;gap:8px}.doc-link{display:inline-flex;align-items:center;padding:4px 9px;border-radius:999px;background:#f0ebe1;color:#8c4b2e;text-decoration:none;font-size:.81rem;font-family:ui-monospace,SFMono-Regular,Menlo,monospace}.doc-link:hover{background:#e6ddcb}
.details-wrap{margin-top:18px}.details-wrap details{border:1px solid #e7decd;border-radius:14px;background:#faf8f2;padding:12px 14px}.details-wrap summary{cursor:pointer;font-weight:700;color:#5b5447}
.source-card{border:1px solid #e6ddcd;border-radius:12px;background:#fff;padding:12px 12px;margin-top:12px}.source-label{display:flex;justify-content:space-between;gap:10px;align-items:center;font-family:ui-monospace,SFMono-Regular,Menlo,monospace;font-size:.83rem;color:#5e564a;margin-bottom:8px}.source-label span{color:#8b8477;font-size:.77rem}
pre{background:#17191c;color:#f6f3ec;border-radius:12px;padding:13px;overflow:auto;font-size:.83rem;line-height:1.55}code{font-family:ui-monospace,SFMono-Regular,Menlo,monospace}
table code,p code,li code,div code{background:#f0ebe1;border:1px solid #e3dac7;border-radius:4px;padding:1px 5px;color:#4d463a}
.backlink{display:inline-flex;align-items:center;gap:6px;color:#8a4b2d;text-decoration:none;font-weight:700}
@media (max-width:720px){.page{padding:18px 12px 48px}.hero{padding:18px 16px}.section{padding:16px 14px}.hero h1{font-size:1.42rem}.section h2{font-size:1.16rem}.timeline-card h4{font-size:1rem}.mini-card .value{font-size:.96rem}}
</style></head><body><div class='page'><div class='topbar'><a class='backlink' href='index.html'>← 返回总览</a></div><section class='hero'><h1>msmonitor_review_run</h1><div class='hero-meta'><div>代表一个真实的文档体验场景。</div></div></section><section class='section'><h2>总体结论</h2><div class='card-grid'><div class='mini-card'><div class='label'>总体评分</div><div class='value'>58/100</div></div><div class='mini-card'><div class='label'>走通状态</div><div class='value'>部分走通</div></div><div class='mini-card'><div class='label'>仓库链接</div><div class='value'><a href='https://gitcode.com/Ascend/msmonitor'>https://gitcode.com/Ascend/msmonitor</a></div></div><div class='mini-card'><div class='label'>评审分支</div><div class='value'>master</div></div><div class='mini-card'><div class='label'>评审提交</div><div class='value'>0dbb833d716e19ee41caf1ae81e08a949b0f7c8c</div></div><div class='mini-card'><div class='label'>审查时间</div><div class='value'>2026-03-24 14:33:44 UTC</div></div><div class='mini-card'><div class='label'>体验环境</div><div class='value'>Linux msprof 5.15.0-119-generic #129-Ubuntu SMP Fri Aug 2 20:37:01 UTC 2024 aarch64 GNU/Linux</div></div><div class='mini-card'><div class='label'>已具备环境</div><div class='value'><code>/usr/local/Ascend/ascend-toolkit/set_env.sh</code> 可用;<code>conda</code> 环境 <code>tmc_verl_vllm13_py311</code> 可用</div></div><div class='mini-card'><div class='label'>隔离策略</div><div class='value'><code>/tmp/msmonitor_review</code> 独立克隆仓库;Python 构建与安装使用 <code>conda activate tmc_verl_vllm13_py311</code>;未做系统级安装,不修改宿主全局服务配置</div></div></div><p>文档结构完整、特性说明较全,但“新人按 README/安装指南直接上手”的主路径并不稳。<code>mindstudio_monitor</code> wheel 编译安装可以在给定环境中走通;但 <code>dynolog/dyno</code> 的源码编译路径被外网 GitHub 子模块依赖阻塞,快速入门要求的二进制前提未在仓内直接提供,导致核心命令无法在本次真实环境中继续验证。</p><h3>关键风险</h3><ul class='summary-list'><li>快速入门默认读者已经拿到 <code>dynolog/dyno</code> 可执行文件,但仓库内没有现成二进制,源码构建又依赖多个 GitHub 子模块,普通新人很容易被网络或权限卡住。</li><li>文档中对证书目录、训练脚本、成功判定、失败排查写得不够闭环,新人需要自行猜测 <code>/home/server_certs</code><code>/home/client_certs</code><code>train.sh</code> / <code>run_ai_task.sh</code> 如何准备。</li><li>README 与子文档对 <code>LD_PRELOAD</code> 路径写法不一致,容易让用户在 CANN 安装路径判断上产生混淆。</li></ul></section><section class='section'><h2>体验流程</h2><div class='timeline'><article class='timeline-card'><div class='timeline-step-tag'>选择路径</div><h4>步骤 1:识别安装路径与依赖</h4><div class='timeline-meta'><span class='pill status-ok'>OK</span><span class='pill sev-low'></span></div><div class='timeline-block intro'><div class='timeline-label'>本步说明</div><div class='timeline-content'>目标:识别安装路径与依赖。这一环已按文档完成,可以作为后续步骤继续推进的可靠前提。</div></div><div class='timeline-block'><div class='timeline-label'>文档依据</div><div class='timeline-content'><div class="doc-links"><a class="doc-link" href="#ref-readme-md-63">README.md:63</a> <a class="doc-link" href="#ref-docs-zh-install-guide-md-65">docs/zh/install_guide.md:65</a></div></div></div><div class='timeline-block'><div class='timeline-label'>结果</div><div class='timeline-content'>成功梳理出软件包安装与编译安装两条路径</div></div><div class='timeline-block'><div class='timeline-label'>判断依据</div><div class='timeline-content'>文档结构清晰,入口明确</div></div></article><article class='timeline-card'><div class='timeline-step-tag'>验证结果</div><h4>步骤 2:验证 CANN 与 conda 环境</h4><div class='timeline-meta'><span class='pill status-ok'>OK</span><span class='pill sev-low'></span></div><div class='timeline-block intro'><div class='timeline-label'>本步说明</div><div class='timeline-content'>目标:验证 CANN 与 conda 环境。这一环已按文档完成,可以作为后续步骤继续推进的可靠前提。</div></div><div class='timeline-block'><div class='timeline-label'>文档依据</div><div class='timeline-content'><div>用户提供环境信息</div></div></div><div class='timeline-block'><div class='timeline-label'>结果</div><div class='timeline-content'><code>/usr/local/Ascend/ascend-toolkit/set_env.sh</code> 存在;<code>tmc_verl_vllm13_py311</code> 可激活</div></div><div class='timeline-block'><div class='timeline-label'>判断依据</div><div class='timeline-content'>实机验证通过</div></div></article><article class='timeline-card'><div class='timeline-step-tag'>源码构建</div><h4>步骤 3:执行 <code>./plugin/build.sh</code> 编译安装 wheel</h4><div class='timeline-meta'><span class='pill status-ok'>OK</span><span class='pill sev-medium'></span></div><div class='timeline-block intro'><div class='timeline-label'>本步说明</div><div class='timeline-content'>目标:执行 <code>./plugin/build.sh</code> 编译安装 wheel。这一环已按文档完成,可以作为后续步骤继续推进的可靠前提。</div></div><div class='timeline-block'><div class='timeline-label'>文档依据</div><div class='timeline-content'><div class="doc-links"><a class="doc-link" href="#ref-docs-zh-install-guide-md-191">docs/zh/install_guide.md:191</a></div></div></div><div class='timeline-block'><div class='timeline-label'>结果</div><div class='timeline-content'>真实构建出 <code>mindstudio_monitor-26.0.0-cp311-cp311-linux_aarch64.whl</code><code>pip install</code></div></div><div class='timeline-block'><div class='timeline-label'>判断依据</div><div class='timeline-content'>构建日志显示 wheel 生成成功,耗时约 <code>0:07.98</code></div></div></article><article class='timeline-card'><div class='timeline-step-tag'>源码构建</div><h4>步骤 4:执行 <code>bash scripts/build.sh</code> 编译 <code>dynolog/dyno</code></h4><div class='timeline-meta'><span class='pill status-blocked'>阻塞</span><span class='pill sev-blocker'>阻塞</span></div><div class='timeline-block intro'><div class='timeline-label'>本步说明</div><div class='timeline-content'>目标:执行 <code>bash scripts/build.sh</code> 编译 <code>dynolog/dyno</code>。这一环已经成为主流程中断点,如果不补齐前置条件或修正文档,后续步骤无法正常继续。 从审查优先级看,这也是一个需要优先修复的关键节点。</div></div><div class='timeline-block'><div class='timeline-label'>文档依据</div><div class='timeline-content'><div class="doc-links"><a class="doc-link" href="#ref-docs-zh-install-guide-md-158">docs/zh/install_guide.md:158</a></div></div></div><div class='timeline-block'><div class='timeline-label'>结果</div><div class='timeline-content'>构建在拉取 <code>third_party/dynolog/third_party/tensorboard_logger</code> 及其他 GitHub 子模块阶段卡住</div></div><div class='timeline-block'><div class='timeline-label'>判断依据</div><div class='timeline-content'>真实报错:<code>Failed to connect to github.com port 443</code>;220s 超时后仍未完成</div></div></article><article class='timeline-card'><div class='timeline-step-tag'>启动服务</div><h4>步骤 5:启动 <code>dynolog</code> daemon</h4><div class='timeline-meta'><span class='pill status-blocked'>阻塞</span><span class='pill sev-blocker'>阻塞</span></div><div class='timeline-block intro'><div class='timeline-label'>本步说明</div><div class='timeline-content'>目标:启动 <code>dynolog</code> daemon。这一环已经成为主流程中断点,如果不补齐前置条件或修正文档,后续步骤无法正常继续。 从审查优先级看,这也是一个需要优先修复的关键节点。</div></div><div class='timeline-block'><div class='timeline-label'>文档依据</div><div class='timeline-content'><div class="doc-links"><a class="doc-link" href="#ref-readme-md-132">README.md:132</a> <a class="doc-link" href="#ref-docs-zh-quick-start-md-14">docs/zh/quick_start.md:14</a></div></div></div><div class='timeline-block'><div class='timeline-label'>结果</div><div class='timeline-content'><code>dynolog</code> 命令不存在</div></div><div class='timeline-block'><div class='timeline-label'>判断依据</div><div class='timeline-content'>上一步未拿到二进制;仓内也无现成 <code>dynolog</code> / <code>dyno</code> 文件</div></div></article><article class='timeline-card'><div class='timeline-step-tag'>执行步骤</div><h4>步骤 6:配置 <code>LD_PRELOAD</code></h4><div class='timeline-meta'><span class='pill status-deviation'>偏差继续</span><span class='pill sev-high'></span></div><div class='timeline-block intro'><div class='timeline-label'>本步说明</div><div class='timeline-content'>目标:配置 <code>LD_PRELOAD</code>。这一环需要带着执行偏差继续,说明文档对环境或路径存在隐含假设。 这一步虽然不一定立刻卡死,但会显著增加新手试错成本。</div></div><div class='timeline-block'><div class='timeline-label'>文档依据</div><div class='timeline-content'><div class="doc-links"><a class="doc-link" href="#ref-readme-md-150">README.md:150</a> <a class="doc-link" href="#ref-docs-zh-npumonitor-instruct-md-72">docs/zh/npumonitor_instruct.md:72</a></div></div></div><div class='timeline-block'><div class='timeline-label'>结果</div><div class='timeline-content'>两份文档给出不同路径风格</div></div><div class='timeline-block'><div class='timeline-label'>判断依据</div><div class='timeline-content'>实机存在 <code>/usr/local/Ascend/ascend-toolkit/latest/lib64/libmspti.so</code>,但子文档示例使用 <code>/usr/local/Ascend/cann/lib64/libmspti.so</code></div></div></article><article class='timeline-card'><div class='timeline-step-tag'>运行样例</div><h4>步骤 7:启动训练任务</h4><div class='timeline-meta'><span class='pill status-blocked'>阻塞</span><span class='pill sev-high'></span></div><div class='timeline-block intro'><div class='timeline-label'>本步说明</div><div class='timeline-content'>目标:启动训练任务。这一环已经成为主流程中断点,如果不补齐前置条件或修正文档,后续步骤无法正常继续。 这一步虽然不一定立刻卡死,但会显著增加新手试错成本。</div></div><div class='timeline-block'><div class='timeline-label'>文档依据</div><div class='timeline-content'><div class="doc-links"><a class="doc-link" href="#ref-readme-md-157">README.md:157</a> <a class="doc-link" href="#ref-docs-zh-quick-start-md-39">docs/zh/quick_start.md:39</a> <a class="doc-link" href="#ref-docs-zh-nputrace-instruct-md-77">docs/zh/nputrace_instruct.md:77</a></div></div></div><div class='timeline-block'><div class='timeline-label'>结果</div><div class='timeline-content'>文档引用 <code>run_ai_task.sh</code> / <code>train.sh</code>,仓内未提供对应脚本</div></div><div class='timeline-block'><div class='timeline-label'>判断依据</div><div class='timeline-content'>新人无法直接执行最小验证</div></div></article><article class='timeline-card'><div class='timeline-step-tag'>功能验证</div><h4>步骤 8:执行 <code>dyno ... npu-monitor</code> / <code>nputrace</code></h4><div class='timeline-meta'><span class='pill status-skipped'>未执行</span><span class='pill sev-blocker'>阻塞</span></div><div class='timeline-block intro'><div class='timeline-label'>本步说明</div><div class='timeline-content'>目标:执行 <code>dyno ... npu-monitor</code> / <code>nputrace</code>。这一环没有进入执行,通常是因为前序步骤已经阻塞。 从审查优先级看,这也是一个需要优先修复的关键节点。</div></div><div class='timeline-block'><div class='timeline-label'>文档依据</div><div class='timeline-content'><div class="doc-links"><a class="doc-link" href="#ref-readme-md-163">README.md:163</a> <a class="doc-link" href="#ref-readme-md-173">README.md:173</a></div></div></div><div class='timeline-block'><div class='timeline-label'>结果</div><div class='timeline-content'><code>dyno</code> 二进制未就绪无法继续</div></div><div class='timeline-block'><div class='timeline-label'>判断依据</div><div class='timeline-content'>前置条件未满足</div></div></article></div></section><section class='section'><h2>问题概览</h2><table class="report-table issues"><thead><tr><th>ID</th><th>严重程度</th><th>分类</th><th>文档位置</th><th>问题简述</th></tr></thead><tbody><tr><td>ISSUE-01</td><td class="cell-center"><span class='pill table-pill sev-blocker'>阻塞</span></td><td>完整性 / 正确性</td><td><div class="doc-links"><a class="doc-link" href="#ref-docs-zh-install-guide-md-158">docs/zh/install_guide.md:158</a></div></td><td class="sev-neutral cell-center">源码编译 <code>dynolog</code> 依赖多个 GitHub 子模块,文档未提前声明网络前提与替代方案</td></tr><tr><td>ISSUE-02</td><td class="cell-center"><span class='pill table-pill sev-high'></span></td><td>完整性</td><td><div class="doc-links"><a class="doc-link" href="#ref-readme-md-132">README.md:132</a> <a class="doc-link" href="#ref-docs-zh-quick-start-md-14">docs/zh/quick_start.md:14</a></div></td><td class="sev-neutral cell-center">快速入门直接要求运行 <code>dynolog</code> / <code>dyno</code>,但仓内无现成二进制,也未说明如何先获得它们</td></tr><tr><td>ISSUE-03</td><td class="cell-center"><span class='pill table-pill sev-high'></span></td><td>正确性 / 可读性</td><td><div class="doc-links"><a class="doc-link" href="#ref-readme-md-152">README.md:152</a> <a class="doc-link" href="#ref-docs-zh-npumonitor-instruct-md-74">docs/zh/npumonitor_instruct.md:74</a></div></td><td class="sev-neutral cell-center"><code>LD_PRELOAD</code> 路径写法不一致,容易误导用户</td></tr><tr><td>ISSUE-04</td><td class="cell-center"><span class='pill table-pill sev-high'></span></td><td>完整性 / 易用性</td><td><div class="doc-links"><a class="doc-link" href="#ref-readme-md-157">README.md:157</a> <a class="doc-link" href="#ref-docs-zh-quick-start-md-39">docs/zh/quick_start.md:39</a> <a class="doc-link" href="#ref-docs-zh-nputrace-instruct-md-77">docs/zh/nputrace_instruct.md:77</a></div></td><td class="sev-neutral cell-center">示例依赖 <code>run_ai_task.sh</code> / <code>train.sh</code>,但仓库不提供最小可运行样例</td></tr><tr><td>ISSUE-05</td><td class="cell-center"><span class='pill table-pill sev-medium'></span></td><td>易用性 / 完整性</td><td><div class="doc-links"><a class="doc-link" href="#ref-readme-md-137">README.md:137</a> <a class="doc-link" href="#ref-docs-zh-dynolog-instruct-md-31">docs/zh/dynolog_instruct.md:31</a> <a class="doc-link" href="#ref-docs-zh-dyno-instruct-md-14">docs/zh/dyno_instruct.md:14</a></div></td><td class="sev-neutral cell-center">证书目录使用 <code>/home/server_certs</code><code>/home/client_certs</code>,缺少 “NO_CERTS” 直连示例闭环</td></tr><tr><td>ISSUE-06</td><td class="cell-center"><span class='pill table-pill sev-medium'></span></td><td>环境友好性</td><td><div class="doc-links"><a class="doc-link" href="#ref-docs-zh-install-guide-md-195">docs/zh/install_guide.md:195</a></div></td><td class="sev-neutral cell-center">文档默认 <code>pip install</code> 到当前环境,未提醒建议隔离环境;实际执行出现 root pip 警告</td></tr><tr><td>ISSUE-07</td><td class="cell-center"><span class='pill table-pill sev-medium'></span></td><td>正确性 / 完整性</td><td><div class="doc-links"><a class="doc-link" href="#ref-docs-zh-install-guide-md-202">docs/zh/install_guide.md:202</a></div></td><td class="sev-neutral cell-center">文档宣称安装成功输出固定字符串,但真实执行可能显示 “already installed” 而非示例文案</td></tr></tbody></table></section><section class='section'><h2>详细问题</h2><div class="issue-grid"><section class="issue-card sev-blocker"><div class="issue-card-header"><h4>ISSUE-01 dynolog 源码编译被 GitHub 子模块网络阻塞</h4><span class="pill sev-blocker">阻塞</span></div><div class="meta-line"><strong>分类:</strong>完整性 / 正确性</div><div class="meta-line"><strong>文档位置:</strong><div class="doc-links"><a class="doc-link" href="#ref-docs-zh-install-guide-md-158">docs/zh/install_guide.md:158</a></div></div><div class="quote-box"><strong>原文摘录</strong><p><code>bash scripts/build.sh</code>;文档仅列出 gcc / rust / protobuf 等依赖,没有说明还需要稳定访问多个 GitHub 子模块</p></div><div class="issue-block"><div class="issue-label">复现上下文</div><p><code>/tmp/msmonitor_review</code> 按文档执行 <code>timeout 220 bash scripts/build.sh</code></p></div><div class="issue-block"><div class="issue-label">实际现象</div><p>真实输出先进入 <code>git submodule update --init</code>,随后在 <code>Cloning into &#x27;/tmp/msmonitor_review/third_party/dynolog/third_party/tensorboard_logger&#x27;...</code> 及其他 GitHub 子模块阶段卡住;错误摘要为 <code>Failed to connect to github.com port 443 after 130188 ms: Connection timed out</code></p></div><div class="issue-block"><div class="issue-label">影响分析</div><p>新人即使满足编译器、Rust、CANN 条件,也会因为文档没提前声明外网访问依赖而卡死在构建入口,无法拿到 <code>dynolog/dyno</code></p></div><div class="issue-block"><div class="issue-label">修改建议</div><p><code>编译并安装dynolog</code> 前补一节“网络前提”,明确需要访问 <code>github.com</code>;同时提供国内镜像/离线包/关闭 TensorBoard 依赖的替代路径,或直接推荐优先使用软件包安装</p></div></section><section class="issue-card sev-high"><div class="issue-card-header"><h4>ISSUE-02 快速入门缺少可执行前提闭环</h4><span class="pill sev-high"></span></div><div class="meta-line"><strong>分类:</strong>完整性</div><div class="meta-line"><strong>文档位置:</strong><div class="doc-links"><a class="doc-link" href="#ref-readme-md-132">README.md:132</a> <a class="doc-link" href="#ref-docs-zh-quick-start-md-14">docs/zh/quick_start.md:14</a></div></div><div class="quote-box"><strong>原文摘录</strong><p><code>dynolog --enable-ipc-monitor --certs-dir /home/server_certs</code><code>dyno --certs-dir /home/client_certs ...</code></p></div><div class="issue-block"><div class="issue-label">复现上下文</div><p>真实检查 <code>command -v dynolog</code><code>command -v dyno</code></p></div><div class="issue-block"><div class="issue-label">实际现象</div><p>结果为 <code>DYNLOG_NOT_INSTALLED</code><code>DYNO_NOT_INSTALLED</code></p></div><div class="issue-block"><div class="issue-label">影响分析</div><p>README/快速入门像是“拿来即跑”,但新人从源码仓库开始时并不能直接得到这两个命令;如果没先下载 zip 包或成功编译,整个快速入门不可执行</p></div><div class="issue-block"><div class="issue-label">修改建议</div><p>在快速入门开头明确写出“以下步骤要求已通过软件包安装或编译安装拿到 <code>dynolog/dyno</code>”;增加 <code>which dynolog &amp;&amp; which dyno</code> 自检步骤和期望输出</p></div></section><section class="issue-card sev-high"><div class="issue-card-header"><h4>ISSUE-03 <code>LD_PRELOAD</code> 路径说明前后不一致</h4><span class="pill sev-high"></span></div><div class="meta-line"><strong>分类:</strong>正确性 / 可读性</div><div class="meta-line"><strong>文档位置:</strong><div class="doc-links"><a class="doc-link" href="#ref-readme-md-152">README.md:152</a> <a class="doc-link" href="#ref-docs-zh-npumonitor-instruct-md-74">docs/zh/npumonitor_instruct.md:74</a></div></div><div class="quote-box"><strong>原文摘录</strong><p>README 写 <code>export LD_PRELOAD=&lt;CANN toolkit安装路径&gt;/ascend-toolkit/latest/lib64/libmspti.so</code><code>npumonitor_instruct.md</code><code>export LD_PRELOAD=&lt;CANN Toolkit安装路径&gt;/cann/lib64/libmspti.so</code></p></div><div class="issue-block"><div class="issue-label">复现上下文</div><p>实机检查发现 <code>/usr/local/Ascend/ascend-toolkit/latest/lib64/libmspti.so</code> 存在,且它链接到 <code>/usr/local/Ascend/cann/tools/mspti/lib64/libmspti.so</code></p></div><div class="issue-block"><div class="issue-label">实际现象</div><p>两份文档都可能在部分环境可用,但表达不统一,且第二种写法与本机真实软链位置并不完全一致</p></div><div class="issue-block"><div class="issue-label">影响分析</div><p>新人很容易误以为两种路径都应直接存在,遇到软链或目录差异时不知该以哪份文档为准</p></div><div class="issue-block"><div class="issue-label">修改建议</div><p>统一成一条推荐写法,并补充“可通过 <code>ls -l $(python - &lt;&lt;&lt;&#x27;&#x27;)</code> 不现实,建议直接 <code>ls -l /usr/local/Ascend/ascend-toolkit/latest/lib64/libmspti.so</code> 自检” 的说明;同时解释该软链可能指向 <code>cann/tools/mspti/lib64/libmspti.so</code></p></div></section><section class="issue-card sev-high"><div class="issue-card-header"><h4>ISSUE-04 文档依赖外部训练脚本但未提供最小可运行样例</h4><span class="pill sev-high"></span></div><div class="meta-line"><strong>分类:</strong>完整性 / 易用性</div><div class="meta-line"><strong>文档位置:</strong><div class="doc-links"><a class="doc-link" href="#ref-readme-md-157">README.md:157</a> <a class="doc-link" href="#ref-docs-zh-quick-start-md-39">docs/zh/quick_start.md:39</a> <a class="doc-link" href="#ref-docs-zh-nputrace-instruct-md-77">docs/zh/nputrace_instruct.md:77</a></div></div><div class="quote-box"><strong>原文摘录</strong><p><code>bash run_ai_task.sh</code><code>bash train.sh</code></p></div><div class="issue-block"><div class="issue-label">复现上下文</div><p>检查仓库目录后未发现 <code>run_ai_task.sh</code><code>train.sh</code> 样例</p></div><div class="issue-block"><div class="issue-label">实际现象</div><p>文档要求启动训练或推理任务,但没有给出最小 demo、环境变量注入方式、成功标志或参考仓库</p></div><div class="issue-block"><div class="issue-label">影响分析</div><p>新手不知道要在哪个项目里启动任务,也不知道任务需要满足什么条件(PyTorch/torch_npu、优化器约束、step 概念来源等)</p></div><div class="issue-block"><div class="issue-label">修改建议</div><p>补充一个最小 PyTorch 示例仓或脚本片段,至少说明“需在目标训练脚本所在 shell 中导出 <code>MSMONITOR_USE_DAEMON</code> / <code>LD_PRELOAD</code> 后再启动任务”,并给出 10~20 行可复现 demo</p></div></section><section class="issue-card sev-medium"><div class="issue-card-header"><h4>ISSUE-05 证书路径示例不闭环,缺少 <code>NO_CERTS</code> 主路径示例</h4><span class="pill sev-medium"></span></div><div class="meta-line"><strong>分类:</strong>易用性 / 完整性</div><div class="meta-line"><strong>文档位置:</strong><div class="doc-links"><a class="doc-link" href="#ref-readme-md-137">README.md:137</a> <a class="doc-link" href="#ref-docs-zh-dynolog-instruct-md-31">docs/zh/dynolog_instruct.md:31</a> <a class="doc-link" href="#ref-docs-zh-dyno-instruct-md-14">docs/zh/dyno_instruct.md:14</a></div></div><div class="quote-box"><strong>原文摘录</strong><p>大量示例使用 <code>/home/server_certs</code><code>/home/client_certs</code>;而 <code>dyno_instruct.md</code> 仅在参数表中提到 <code>NO_CERTS</code></p></div><div class="issue-block"><div class="issue-label">复现上下文</div><p>本次未生成 TLS 证书,按新手最小路径理应优先尝试免证书模式</p></div><div class="issue-block"><div class="issue-label">实际现象</div><p>文档没有把“无证书快速验证”串成完整命令序列</p></div><div class="issue-block"><div class="issue-label">影响分析</div><p>新人会先被 <code>/home/server_certs</code><code>/home/client_certs</code> 卡住,不知道这些目录如何创建、内容从哪来、是否可以跳过</p></div><div class="issue-block"><div class="issue-label">修改建议</div><p>在 README 和 quick start 直接增加一套 <code>--certs-dir NO_CERTS</code> 的最小验证命令;把 TLS 证书模式下沉到进阶章节</p></div></section><section class="issue-card sev-medium"><div class="issue-card-header"><h4>ISSUE-06 文档缺少隔离环境建议,实际执行出现 root pip 风险提示</h4><span class="pill sev-medium"></span></div><div class="meta-line"><strong>分类:</strong>环境友好性</div><div class="meta-line"><strong>文档位置:</strong><div class="doc-links"><a class="doc-link" href="#ref-docs-zh-install-guide-md-195">docs/zh/install_guide.md:195</a></div></div><div class="quote-box"><strong>原文摘录</strong><p><code>./plugin/build.sh</code> -&gt; <code>pip install ${files}</code></p></div><div class="issue-block"><div class="issue-label">复现上下文</div><p>在用户指定 conda 环境中执行 <code>plugin/build.sh</code></p></div><div class="issue-block"><div class="issue-label">实际现象</div><p><code>pip</code> 输出 <code>WARNING: Running pip as the &#x27;root&#x27; user can result in broken permissions...</code></p></div><div class="issue-block"><div class="issue-label">影响分析</div><p>共享机器或 root 环境下,新人可能污染环境或误把宿主环境问题当作文档问题</p></div><div class="issue-block"><div class="issue-label">修改建议</div><p>在安装章节前增加建议:优先 <code>conda activate &lt;env&gt;</code><code>python -m venv</code>;同时脚本和文档都建议使用 <code>python -m pip install</code>,并显式提示不要在未隔离的 root 基础环境中执行</p></div></section><section class="issue-card sev-medium"><div class="issue-card-header"><h4>ISSUE-07 成功判定示例过于理想化,与真实输出不完全一致</h4><span class="pill sev-medium"></span></div><div class="meta-line"><strong>分类:</strong>正确性 / 完整性</div><div class="meta-line"><strong>文档位置:</strong><div class="doc-links"><a class="doc-link" href="#ref-docs-zh-install-guide-md-202">docs/zh/install_guide.md:202</a></div></div><div class="quote-box"><strong>原文摘录</strong><p><code>Successfully installed mindstudio_monitor-&lt;version&gt; pybind11-&lt;version&gt;</code></p></div><div class="issue-block"><div class="issue-label">复现上下文</div><p>重复执行 <code>plugin/build.sh</code></p></div><div class="issue-block"><div class="issue-label">实际现象</div><p>真实输出为 <code>mindstudio-monitor is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.</code>,随后只安装了 <code>xlsxwriter</code></p></div><div class="issue-block"><div class="issue-label">影响分析</div><p>用户可能以为安装失败,因为输出与文档示例不一样</p></div><div class="issue-block"><div class="issue-label">修改建议</div><p>在成功判定中补充“若已安装相同版本,可能出现 <code>already installed</code> 提示,属于可预期现象;如需覆盖安装请加 <code>--force-reinstall</code></p></div></section></div></section><section class='section'><h2>小白用户视角</h2><ul class='note-list'><li>从小白视角看,文档的章节划分和功能概念是清楚的,但真正上手时需要猜测的点很多:二进制从哪里来、证书是否必须、训练脚本在哪、<code>LD_PRELOAD</code> 到底写哪个路径、成功后应该看到什么。</li><li>文档在“概念介绍”上较完整,在“最小闭环验证”上明显偏弱。尤其是 <code>dynolog/dyno</code> 不可执行时,README 没有给出回退路线。</li><li>本次为了继续判断可执行性,我读取了 <code>scripts/build.sh</code><code>plugin/build.sh</code><code>plugin/setup.py</code>。这些信息本应由安装文档明确告诉用户,例如:源码编译会拉取外部子模块、plugin 构建会自动下载三方依赖并直接 <code>pip install</code></li><li>如果完全不读脚本,只按文档操作,新人很难提前知道 dynolog 编译需要 GitHub 网络,也很难理解 plugin 构建发生了哪些下载和安装动作。</li></ul></section><section class='section'><h2>优先修复建议</h2><ol class='note-list'><li>先修复阻塞主流程的问题:在 <code>docs/zh/install_guide.md</code> 明确 dynolog 编译的网络前提,并提供国内镜像、离线依赖或关闭 TensorBoard 依赖的构建方式。</li><li>再补齐前置条件和验证步骤:在 README / quick start 增加 <code>dynolog</code><code>dyno</code> 是否就绪、<code>NO_CERTS</code> 免证书模式、<code>LD_PRELOAD</code> 自检命令。</li><li>再提供最小样例:补一个可直接运行的 PyTorch demo 或至少给出训练脚本模板。</li><li>统一路径和术语:统一 <code>LD_PRELOAD</code> 示例,统一 <code>mindstudio_monitor</code> / <code>msmonitor_plugin</code> 命名,减少歧义。</li><li>最后优化环境友好性:明确推荐在 conda/venv 中安装,补充“重复安装”和“already installed”属于什么状态。</li></ol></section><section class='section'><h2>文档依据</h2><p class='section-note'>点击正文中的文档位置会跳到下方对应摘录。为避免正文过长,这部分默认折叠。</p><div class="details-wrap"><details><summary>查看文档依据摘录</summary><section class="source-section"><h3>文档依据摘录</h3><p class="section-note">点击报告中的文档位置可直接跳转到这里,查看本地源码中的对应行内容。</p><article class="source-card" id="ref-readme-md-132"><div class="source-label">README.md:132<span>README.md</span></div><pre><code>132: 1. 启动dynolog daemon进程。</code></pre></article><article class="source-card" id="ref-readme-md-137"><div class="source-label">README.md:137<span>README.md</span></div><pre><code>137:    # 命令行方式开启dynolog daemon</code></pre></article><article class="source-card" id="ref-readme-md-150"><div class="source-label">README.md:150<span>README.md</span></div><pre><code>150: 3. 设置LD_PRELOAD启动MSPTI(启动npu-monitor功能设置)。</code></pre></article><article class="source-card" id="ref-readme-md-152"><div class="source-label">README.md:152<span>README.md</span></div><pre><code>152:    ```bash</code></pre></article><article class="source-card" id="ref-readme-md-157"><div class="source-label">README.md:157<span>README.md</span></div><pre><code>157: 4. 启动训练或推理任务。</code></pre></article><article class="source-card" id="ref-readme-md-163"><div class="source-label">README.md:163<span>README.md</span></div><pre><code>163: 5. 使用dyno命令行触发npu-monitor监控关键算子耗时。</code></pre></article><article class="source-card" id="ref-readme-md-173"><div class="source-label">README.md:173<span>README.md</span></div><pre><code>173: 6. 使用dyno命令行触发nputrace采集详细trace数据(需要关闭npu-monitor功能才能触发nputrace功能)。</code></pre></article><article class="source-card" id="ref-readme-md-63"><div class="source-label">README.md:63<span>README.md</span></div><pre><code>63: ## 环境部署</code></pre></article><article class="source-card" id="ref-docs-zh-dyno-instruct-md-14"><div class="source-label">docs/zh/dyno_instruct.md:14<span>docs/zh/dyno_instruct.md</span></div><pre><code>14: | --port      | i32      | dynolog daemon进程监听的端口号,默认值1778。                 |    N     |</code></pre></article><article class="source-card" id="ref-docs-zh-dynolog-instruct-md-31"><div class="source-label">docs/zh/dynolog_instruct.md:31<span>docs/zh/dynolog_instruct.md</span></div><pre><code>31: # 方法2:命令行执行</code></pre></article><article class="source-card" id="ref-docs-zh-install-guide-md-158"><div class="source-label">docs/zh/install_guide.md:158<span>docs/zh/install_guide.md</span></div><pre><code>158: ### 编译并安装dynolog</code></pre></article><article class="source-card" id="ref-docs-zh-install-guide-md-164"><div class="source-label">docs/zh/install_guide.md:164<span>docs/zh/install_guide.md</span></div><pre><code>164:    ```bash</code></pre></article><article class="source-card" id="ref-docs-zh-install-guide-md-191"><div class="source-label">docs/zh/install_guide.md:191<span>docs/zh/install_guide.md</span></div><pre><code>191: ### 编译并安装mindstudio_monitor</code></pre></article><article class="source-card" id="ref-docs-zh-install-guide-md-195"><div class="source-label">docs/zh/install_guide.md:195<span>docs/zh/install_guide.md</span></div><pre><code>195: #### shell脚本一键安装</code></pre></article><article class="source-card" id="ref-docs-zh-install-guide-md-202"><div class="source-label">docs/zh/install_guide.md:202<span>docs/zh/install_guide.md</span></div><pre><code>202: 安装成功打印如下信息:</code></pre></article><article class="source-card" id="ref-docs-zh-install-guide-md-65"><div class="source-label">docs/zh/install_guide.md:65<span>docs/zh/install_guide.md</span></div><pre><code>65: ## 编译安装</code></pre></article><article class="source-card" id="ref-docs-zh-install-guide-md-67"><div class="source-label">docs/zh/install_guide.md:67<span>docs/zh/install_guide.md</span></div><pre><code>67: ### 安装依赖</code></pre></article><article class="source-card" id="ref-docs-zh-npumonitor-instruct-md-72"><div class="source-label">docs/zh/npumonitor_instruct.md:72<span>docs/zh/npumonitor_instruct.md</span></div><pre><code>72: 4. 设置LD_PRELOAD使能MSPTI。</code></pre></article><article class="source-card" id="ref-docs-zh-npumonitor-instruct-md-74"><div class="source-label">docs/zh/npumonitor_instruct.md:74<span>docs/zh/npumonitor_instruct.md</span></div><pre><code>74:    ```bash</code></pre></article><article class="source-card" id="ref-docs-zh-nputrace-instruct-md-77"><div class="source-label">docs/zh/nputrace_instruct.md:77<span>docs/zh/nputrace_instruct.md</span></div><pre><code>77: </code></pre></article><article class="source-card" id="ref-docs-zh-quick-start-md-13"><div class="source-label">docs/zh/quick_start.md:13<span>docs/zh/quick_start.md</span></div><pre><code>13: </code></pre></article><article class="source-card" id="ref-docs-zh-quick-start-md-14"><div class="source-label">docs/zh/quick_start.md:14<span>docs/zh/quick_start.md</span></div><pre><code>14: 1. 启动dynolog daemon进程。</code></pre></article><article class="source-card" id="ref-docs-zh-quick-start-md-39"><div class="source-label">docs/zh/quick_start.md:39<span>docs/zh/quick_start.md</span></div><pre><code>39: 4. 启动训练或推理任务。</code></pre></article></section></details></div></section></div></body></html>