Cluster Scheduling Component NodeD

English | 中文

Quick Reference


NodeD

NodeD is a MindCluster cluster scheduling component deployed on compute nodes. It detects node abnormal states, retrieves CPU, memory, and disk fault information from IPMI, and reports it to ClusterD.

Use Cases

When a node's CPU, memory, or disk experiences certain faults, training tasks will fail. To allow training tasks to exit quickly when a node fault occurs and prevent new tasks from being scheduled to faulty nodes, MindCluster provides the NodeD component for detecting node abnormalities.

Features

  • Retrieves node abnormalities from IPMI and reports them to the upper-level scheduling service.
  • Periodically sends node fault information to the upper-level scheduling service.

Upstream and Downstream Dependencies

  1. Retrieves CPU, memory, and disk fault information from IPMI on compute nodes.
  2. Reports CPU, memory, and disk fault information of compute nodes to ClusterD.

Tag Convention

Tags follow this format:

<version>-<os>
Field Example Description
version v26.1.0 NodeD component version
os ubuntu22.04 NodeD image operating system

NodeD 26.1.0

Tag Dockerfile Image Content
v26.1.0-ubuntu22.04 Dockerfile.ubuntu NodeD v26.1.0 image for Ubuntu 22.04
v26.1.0-openeuler24.03 Dockerfile.openeuler NodeD v26.1.0 image for openEuler 24.03

Quick Start

Prerequisites

Software Dependencies

Software Supported Versions Installation Location Description
Kubernetes 1.17.x~1.34.x (1.19.x or later recommended) All nodes See Kubernetes Documentation
ClusterD Same version as NodeD Management nodes Fault information reported by NodeD is aggregated by ClusterD

Hardware Requirements

Resource Requirement
CPU 0.5 cores
Memory 0.3 GB

How to Build Locally

docker build --no-cache -t noded:{tag} ./ -f Dockerfile.{os}

Note:

  • TARGETPLATFORM is a global built-in parameter provided by Docker BuildKit, used to obtain the target platform of the current build, such as linux/amd64 and linux/arm64.
  • This variable is automatically injected only when BuildKit is enabled. It will not be available in older Docker versions or environments with BuildKit disabled by default. Run export DOCKER_BUILDKIT=1 to enable it temporarily before executing build commands.

Deploy NodeD

  1. Pull the image
docker pull swr.cn-south-1.myhuaweicloud.com/ascendhub/noded:{tag}
  1. Retag the image
docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/noded:{tag} noded:{version}
  1. Start NodeD

Replace {tag} in the noded-{version}.yaml file with the actual image tag.

kubectl apply -f noded-{version}.yaml
  1. Verify deployment
kubectl get pods -A | grep noded

Supported Hardware

For descriptions of supported Ascend hardware models, please refer to the official documentation: Supported Product Formats and OS List


License

View the license information for the Mind series software contained in these images.

As with all container images, pre-installed software packages (Python, system libraries, etc.) may be subject to their respective license agreements.