MultimodalSDK

English | 中文

Quick Reference

MultimodalSDK Overview

In the inference pipeline of multimodal large models, massive and complex data needs to be processed. The MultimodalSDK accelerates the preprocessing workflow of large model inference by providing a set of high-performance Ascend-affinity interfaces. Preprocessing APIs currently run on CPU (DeviceMode.CPU) and are typically deployed alongside CANN/NPU inference frameworks.

  • It covers common preprocessing operations such as image and video loading and decoding, as well as resize, crop, and other typical processing steps.
  • It supports mutual conversion between various open-source data structures and acceleration library data structures, enabling rapid application and easy migration.

Tag Naming Convention

Tags follow this pattern:

<multimodalsdk_version>-<chip_series>-<os>-<python_version>
Field Example Values Description
multimodalsdk_version 26.0.0 MultimodalSDK version
chip_series 910 Target Atlas chip family
os ubuntu22.04, openeuler24.03 Base operating system
python_version py3.11 Python version

Tags and Dockerfile

Tag Dockerfile
26.0.0-910b-openeuler24.03-py3.11 Dockerfile.910b.openEuler
26.0.0-910b-ubuntu22.04-py3.11 Dockerfile.910b.ubuntu

Quick Start

Prerequisites (optional)

Install Driver

An NPU driver compatible with the container's CANN version must be installed on the host. See the CANN Compatibility Matrix for driver ↔ CANN version mapping.


How to build

docker build -t {your_repo}/multimodal:latest -f Dockerfile.<chip_series>.<os> .

Running MultimodalSDK Container

docker run \
    --name multimodal_container \
    --device /dev/davinci1 \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -it ascend/multimodal:tag bash

Enter the Container

docker exec -it multimodal_container bash

MultimodalSDK Usage

MultimodalSDK provides sample code to help developers get started quickly. You can access the examples through the following link:

Development

# Add required software by developer
FROM swr.cn-south-1.myhuaweicloud.com/ascendhub/multimodalsdk:26.0.0-910b-ubuntu22.04-py3.11-aarch64

RUN apt update -y && \
    apt install gcc ...

...

Supported Hardware

Product Examples Architecture
Atlas 800I A2 ARM64

License

View the license information for CANN and MindSeries software included in these images.

As with all container images, the pre-installed packages (Python, system libraries, etc.) may be subject to their own licenses.