tracing能力部署

特性介绍

pyMotor tracing能力基于三方件opentelemetry能力,opentelemetry文档资料可参考文档|OpenTelemetry

通过修改env.json配置文件和user_config.json配置文件后即可通过deploy.py脚本完成服务部署。

部署流程

pyMotor开启tracing能力需修改env.json配置文件和user_config.json配置文件后,通过deploy.py脚本即可完成服务部署,具体流程如下。

配置env.json

pyMotor快速开始中实例env.json为参考基线,适配打开tracing能力后的配置文件示例如下:

{
  "version": "2.0.0",
  "motor_common_env": {
  },
  "motor_controller_env": {
  },
  "motor_coordinator_env": {
    "OTEL_SERVICE_NAME": "mindie-motor",
    "OTEL_EXPORTER_OTLP_TRACES_INSECURE": "true",
    "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL": "http/protobuf"
  },
  "motor_engine_prefill_env": {
    "OTEL_SERVICE_NAME": "vllm-server-p",
    "OTEL_EXPORTER_OTLP_TRACES_INSECURE": "true",
    "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL": "http/protobuf"
  },
  "motor_engine_decode_env": {
    "OTEL_SERVICE_NAME": "vllm-server-d",
    "OTEL_EXPORTER_OTLP_TRACES_INSECURE": "true",
    "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL": "http/protobuf"
  },
  "motor_kv_cache_pool_env": {
  }
}

需要在motor_coordinator_envmotor_engine_prefill_envmotor_engine_decode_env三个配置项下新增OTEL_SERVICE_NAMEOTEL_EXPORTER_OTLP_TRACES_INSECUREOTEL_EXPORTER_OTLP_TRACES_PROTOCOL三个环境变量。 环境变量含义:

  • OTEL_SERVICE_NAME上报数据的服务名称,根据模块名称定义,建议参考样例。
  • OTEL_EXPORTER_OTLP_TRACES_INSECURE是否开启非安全协议,生产环境建议设置为false
  • OTEL_EXPORTER_OTLP_TRACES_PROTOCOL上报数据协议,可选grpchttp/protobuf。根据实际开发习惯设置。

配置user_config.json

PyMotor快速开始中实例user_config.json为参考基线,适配打开tracing能力后的配置文件示例如下:

{
  "version": "v2.0",
  "motor_deploy_config": {
    "p_instances_num": 1,
    "d_instances_num": 1,
    "single_p_instance_pod_num": 1,
    "single_d_instance_pod_num": 1,
    "p_pod_npu_num": 4,
    "d_pod_npu_num": 4,
    "image_name": "",
    "job_id": "mindie-motor",
    "hardware_type": "800I_A2",
    "weight_mount_path": "/mnt/weight/"
  },
  "motor_controller_config": {
  },
  "motor_coordinator_config": {
    "tracer_config": {
      "endpoint": "http://xx.xx.xx.xx:4318/v1/traces",
      "root_sampling_rate": 1,
      "remote_parent_sampled": 1,
      "remote_parent_not_sampled": 1,
      "local_parent_sampled": 1,
      "local_parent_not_sampled": 1
    }
  },
  "motor_nodemanger_config": {
  },
  "motor_engine_prefill_config": {
    "engine_type": "vllm",
    "engine_config": {
      "served_model_name": "qwen3-8B",
      "model": "/mnt/weight/qwen3_8B",
      "gpu_memory_utilization": 0.9,
      "data_parallel_size": 2,
      "tensor_parallel_size": 2,
      "pipeline_parallel_size": 1,
      "enable_expert_parallel": false,
      "data_parallel_rpc_port": 9000,
      "otlp-traces-endpoint": "http://xx.xx.xx.xx:4318/v1/traces",
      "kv_transfer_config": {
       "kv_connector": "MooncakeLayerwiseConnector",
       "kv_buffer_device": "npu",
       "kv_role": "kv_producer",
       "kv_connector_extra_config": {
         "use_ascend_direct": true
        }
      }
    }
  },
  "motor_engine_decode_config": {
    "engine_type": "vllm",
    "engine_config": {
      "served_model_name": "qwen3-8B",
      "model": "/mnt/weight/qwen3_8B",
      "gpu_memory_utilization": 0.9,
      "data_parallel_size": 2,
      "tensor_parallel_size": 2,
      "pipeline_parallel_size": 1,
      "enable_expert_parallel": false,
      "data_parallel_rpc_port": 9000,
      "otlp-traces-endpoint": "http://xx.xx.xx.xx:4318/v1/traces",
      "kv_transfer_config": {
       "kv_connector": "MooncakeLayerwiseConnector",
       "kv_buffer_device": "npu",
       "kv_role": "kv_consumer",
       "kv_connector_extra_config": {
         "use_ascend_direct": true
        }
      }
    }
  }
}
  • 需要在motor_coordinator_config下新增tracer_configtracer_config下的endpoint配置为开启tracing能力必填,填写内容根据env.json中的 OTEL_EXPORTER_OTLP_TRACES_PROTOCOL配置,可选: http://xx.xx.xx.xx:4318/v1/tracesgrpc://xx.xx.xx.xx:4317
  • motor_engine_prefill_configmotor_engine_decode_configengine_config下新增otlp-traces-endpoint配置,填写方法同endpoint

部署服务

examples/deployer 目录下通过 deploy.py 脚本部署服务。支持指定配置目录或单独指定配置文件:

cd examples/deployer
# 方式一:指定配置目录(推荐)
python deploy.py --config_dir ../infer_engines/vllm

# 方式二:单独指定配置文件
python deploy.py --user_config_path ../infer_engines/vllm/user_config.json --env_config_path ../infer_engines/vllm/env.json

部署jaeger

参考jaeger文档 下载好可执行文件后,在服务器上执行以下命令即可。也可采用docker容器方式,具体参考jaeger官网

./jaeger --set receivers.otlp.protocols.http.endpoint=0.0.0.0:4318 --set receivers.otlp.protocols.grpc.endpoint=0.0.0.0:4317 &

运行后通过浏览器打开对应IP的16686端口网页,效果如下:

Snipaste_2026-03-31_20-59-16.jpg


Snipaste_2026-03-31_20-59-24.jpg