Introduction to Kunpeng TensorFlow Serving

Latest Updates

  • [2025.09.30]: Added the TensorFlow ANNC for graph compilation optimization feature, providing optimizations including computational graph optimization, and generation and integration of high-performance fused operators.
  • [2025.06.30]: Released the TensorFlow Serving thread scheduling optimization feature for the first time.

Overview

Kunpeng TensorFlow Serving is a high-performance inference service component optimized for Kunpeng TensorFlow. As an inference server component within the end-to-end inference benchmark system, its main responsibilities are as follows:

  • Role: As part of the end-to-end inference benchmark system, it is responsible for loading, managing, and executing TensorFlow models.
  • Interface: Provides gRPC/REST interfaces to receive inference requests from clients (such as Triton server-client).
  • System integration: Internally, TensorFlow Serving executes models based on Kunpeng TensorFlow, leveraging its underlying optimization mechanisms across the Executor, Kernel, and XLA.
  • Performance test: It is a key object for end-to-end evaluation. By monitoring the running performance of TensorFlow Serving, you can analyze the model latency, throughput, thread scheduling efficiency, and resource utilization.

Kunpeng TensorFlow Serving's ranking inference library is an optimized acceleration framework for high performance. It provides the following features:

  • Thread scheduling optimization

    In high-concurrency environments, sharing an inter-operator thread pool across multiple sessions often leads to resource contention. The TensorFlow Serving thread scheduling optimization feature resolves this by restructuring thread allocation, significantly boosting overall graph execution efficiency. Optimized operator scheduling and thread management ensure superior model inference throughput for high-concurrency environments.

  • ANNC graph compilation optimization

    ANNC is a compiler dedicated to accelerating neural network computing. It focuses on technologies including computational graph optimization, generation and integration of high-performance fused operators, and efficient code generation. These capabilities significantly improve inference performance in recommendation scenarios.

For details about the Kunpeng TensorFlow Serving features, see Kunpeng TensorFlow Serving Feature Introduction.

Directory Structure

The full directory structure of the TensorFlow Serving open-source repository is as follows:

TensorFlow Serving
├── 0001-boostsra-tensorflow-serving.patch   // TensorFlow Serving patch file
├── LICENSE                                   // License file
├── README_en.md                                 // Open-source repository introduction
└── docs                                      // Documentation

Version Description

For details about the version updates of Kunpeng TensorFlow Serving, see Release Notes.

Documents

Resource Type

Resource Name

Resource Description

Document

Release Notes

Provides basic information and feature updates of each Kunpeng TensorFlow Serving release.

Document

Feature Introduction

Describes the Kunpeng TensorFlow Serving features.

Document

Quick Start

Provides guidance for getting started with Kunpeng TensorFlow Serving.

Document

Installation Guide

Describes how to compile and install Kunpeng TensorFlow Serving.

Document

Best Practices

Provides best practices of using Kunpeng TensorFlow Serving.

Disclaimer

This code repository contributes to the TensorFlow Serving community. It strictly adheres to the coding style and methods, as well as security design of the native open-source software. Any vulnerability and security issues of the software shall be resolved by the corresponding upstream communities according to their response mechanisms. Please pay attention to the notifications and version updates released by the upstream communities. The Kunpeng computing community does not assume any responsibility for software vulnerabilities and security issues.

License

This project is licensed under Apache License 2.0. For details, see the LICENSE file.

This project document is licensed under CC-BY 4.0. For details, see the LICENSE file.

Contribution Statement

We welcome your contributions to the community. If you have any questions/suggestions or want to provide feedback on feature requirements and bug reports, you can submit issues. For details, see the contribution guideline. You are also welcome to share insights in Discussions. Thank you for your support.

Acknowledgments

TensorFlow Serving is jointly developed by the following Huawei department:

  • Kunpeng Computing BoostKit Development Dept

Thank you to everyone in the community for your PRs. We warmly welcome contributions to TensorFlow Serving!