{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MSTX Summary\n",
"\n",
"集群场景MSTX打点数据分析\n",
"\n",
"主要包含以下2个统计内容:\n",
"1. 按Step分组的,整个集群MSTX打点数据的统计情况\n",
"2. 按Name分组的,每个Rank上MSTX打点数据的统计情况"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 数据准备"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import display, HTML\n",
"display(HTML(\"<style>.container { width:95% !important; }</style>\"))\n",
"\n",
"import plotly.offline as pyo\n",
"\n",
"def is_lab_notebook():\n",
" import re\n",
" import psutil\n",
" return any(re.search('jupyter--lab-script', x) for x in psutil.Process().parent().cmdline())\n",
"\n",
"if is_lab_notebook():\n",
" pyo.init_notebook_mode()\n",
"\n",
"import pandas as pd\n",
"pd.options.plotting.backend = \"plotly\"\n",
"pd.set_option(\"display.max_rows\", 100)\n",
"pd.set_option(\"display.width\", 1000)\n",
"\n",
"import cluster_display\n",
"\n",
"all_fwk_stats_gdf = pd.read_csv(\"all_fwk_stats.csv\", index_col=\"Name\").groupby(\"StepId\")\n",
"all_cann_stats_gdf = pd.read_csv(\"all_cann_stats.csv\", index_col=\"Name\").groupby(\"StepId\")\n",
"all_device_stats_gdf = pd.read_csv(\"all_device_stats.csv\", index_col=\"Name\").groupby(\"StepId\")\n",
"mark_stats_df = pd.read_csv(\"mark_stats.csv\", index_col=\"Name\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 集群MSTX数据分析\n",
"\n",
"将整个集群所有Rank的MSTX数据进行汇总,按Step划分,统计分析耗时情况,时间单位为微秒(us)\n",
"打点数据分为三种:\n",
"1. 框架侧耗时:Framework Time\n",
"2. Cann侧耗时:Cann Time\n",
"3. Device侧耗时:Devcie Time\n",
"\n",
"3种数据都包含以下统计项:\n",
"- Count:数量\n",
"- Mean:平均耗时\n",
"- Std:标准差\n",
"- Min:最小值\n",
"- Q1:四分之一分位数\n",
"- Median:中位数\n",
"- Q3:四分之三分位数\n",
"- Max:最大值\n",
"- Sum:总耗时"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def display_stats_mstx_step_combobox(selected, args):\n",
" step = selected\n",
" fwk_stats_gdf, cann_stats_gdf, device_stats_gdf = args\n",
" fwk_df = fwk_stats_gdf.get_group(step)\n",
" cann_df = cann_stats_gdf.get_group(step)\n",
" device_df = device_stats_gdf.get_group(step)\n",
" figs = []\n",
" display(HTML(\"<p><b>Framework Time Stats</b></p>\"))\n",
" display(fwk_df)\n",
" cluster_display.display_duration_boxplots(figs, fwk_df, title=\"Framework Time\", x_title=\"Name\", y_title=\"Time\")\n",
" display(HTML(\"<p><b>Cann Time Stats</b></p>\"))\n",
" display(cann_df)\n",
" cluster_display.display_duration_boxplots(figs, cann_df, title=\"Cann Time\", x_title=\"Name\", y_title=\"Time\")\n",
" display(HTML(\"<p><b>Device Time Stats</b></p>\"))\n",
" display(device_df)\n",
" cluster_display.display_duration_boxplots(figs, device_df, title=\"Device Time\", x_title=\"Name\", y_title=\"Time\")\n",
"\n",
"steps = list(all_fwk_stats_gdf.groups.keys())\n",
"if steps:\n",
" cluster_display.display_stats_optional_combobox(steps, display_stats_mstx_step_combobox, \n",
" [all_fwk_stats_gdf, all_cann_stats_gdf, all_device_stats_gdf], \"Step:\")\n",
"else:\n",
" print(\"There is no step in stats, so no need to display\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 集群Rank MSTX数据分析\n",
"\n",
"将集群内每个Rank的MSTX数据进行汇总,按打点Name分类,统计分析耗时情况,时间单位为微秒(us)\n",
"\n",
"包含以下统计项:\n",
"- Name:打点名称\n",
"- FrameworkDuration(Us):框架侧耗时\n",
"- CannDuration(Us):Cann侧耗时\n",
"- DeviceDuration(Us):Device侧耗时\n",
"- Rank:Rank序号\n",
"- StepId:Step序号"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def display_mstx_duration_by_rank(selected, args):\n",
" mark_stats_gdf = args\n",
" df = mark_stats_gdf.get_group(selected).sort_values(\"Rank\")\n",
" display(df)\n",
" fwk_duration = []\n",
" cann_duration = []\n",
" device_duration = []\n",
" step_ids = []\n",
" for step_id, step_df in df.groupby(\"StepId\"):\n",
" fwk_duration.append((step_id, step_df[\"FrameworkDuration(Us)\"].values))\n",
" cann_duration.append((step_id, step_df[\"CannDuration(Us)\"].values))\n",
" device_duration.append((step_id, step_df[\"DeviceDuration(Us)\"].values))\n",
" step_ids.append(step_id)\n",
" fwk_df = pd.concat([pd.Series(dur, name=step_id) for step_id, dur in fwk_duration], axis=1)\n",
" cann_df = pd.concat([pd.Series(dur, name=step_id) for step_id, dur in cann_duration], axis=1)\n",
" device_df = pd.concat([pd.Series(dur, name=step_id) for step_id, dur in device_duration], axis=1)\n",
" figs = []\n",
" ranks = df[\"Rank\"].drop_duplicates()\n",
" cluster_display.display_graph(figs, ranks, fwk_df[step_ids],\n",
" title=\"Framework Time\", x_title=\"Rank\", y_title=\"Time\", legend_title=\"Step\")\n",
" cluster_display.display_graph(figs, ranks, cann_df[step_ids],\n",
" title=\"Cann Time\", x_title=\"Rank\", y_title=\"Time\", legend_title=\"Step\")\n",
" cluster_display.display_graph(figs, ranks, device_df[step_ids],\n",
" title=\"Device Time\", x_title=\"Rank\", y_title=\"Time\", legend_title=\"Step\")\n",
"\n",
"mark_stats_gdf = mark_stats_df.groupby(mark_stats_df.index)\n",
"names = list(mark_stats_gdf.groups.keys())\n",
"if steps:\n",
" cluster_display.display_stats_optional_combobox(names, display_mstx_duration_by_rank, mark_stats_gdf, \"Name:\")\n",
"else:\n",
" print(\"There is no mark name in stats, so no need to display\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.12.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}