Installation Guide
Installation Overview
Network Planning
A storage-compute coupled networking architecture is recommended for the subfeatures. Storage nodes and compute nodes are shared to maximize the computing acceleration effect in big data scenarios.
The coupled storage and compute network of OmniOperator consists of four servers, which are one management node and three compute nodes. In the following, we will be using HDFS on the storage nodes for illustration:
- The management node is server1, which is used to manage tasks.
- The compute nodes are agent1, agent2, and agent3, which are used to run OmniOperator.
A server can function as a management node and a compute node at the same time. In single-node mode, operations performed on the management node or compute node mentioned in the following sections are performed on the same node. Figure 1 shows the networking diagram.
A storage-compute coupled networking architecture is recommended for the subfeatures. Storage nodes and compute nodes are shared to maximize the computing acceleration effect in big data scenarios.
Environment Requirements
Before installing OmniOperator, prepare the hardware and software environments to facilitate subsequent installation operations.
Table 1 Hardware requirements lists the hardware requirements for each node in the cluster.
| Item | Management/Compute/Storage Node |
|---|---|
| Processor | Kunpeng 920 seriesKunpeng 950OmniOperator can be enabled for Gluten only on servers that support the SVE instruction set. You can run **cat /proc/cpuinfo |
| Memory size | 384 GB (12 x 32 GB) |
| Memory frequency | 2666 MHz |
| Network | Service network: 10GEManagement network: 1GE |
| Drive | System drive: 1 x RAID 0 (1 x 1.2 TB SAS HDD)Data drive: 12 x RAID 0 (12 x 8 TB SATA HDD) |
| RAID controller card | LSI SAS3508 |
Table 2 OS and software requirements lists the OS and software requirements for each node in the cluster.
Table 2 OS and software requirements
| Item | Version | Description | Management Node (Server) | Compute/Storage Node |
|---|---|---|---|---|
| OS | CentOS 7.9 openEuler 20.03 LTS SP1 openEuler 22.03 LTS SP1 |
Later patch versions such as openEuler 20.03 LTS SP3 and openEuler 22.03 LTS SP3 are also supported. | √ | √ |
| JDK | BiSheng JDK 1.8 (BiSheng JDK 1.8.0_342) | openEuler 22.03 LTS SP1 is incompatible with BiSheng JDK 1.8.0_262, which needs to be replaced with BiSheng JDK 1.8.0_342.For details about how to install the BiSheng JDK, see BiSheng JDK 8 Installation Guide. | √ | √ |
| Hadoop | 3.2.0 | See Hadoop Deployment Guide (CentOS 7.6 & openEuler 20.03). | √ | √ |
| Spark | 3.1.13.3.13.4.33.5.2 | See Spark Deployment Guide (CentOS 7.6 & openEuler 20.03). | √ | - |
| Hive | 3.1.0 | See Hive Deployment Guide (CentOS 7.6 & openEuler 20.03). | √ | - |
| Python | 3.10.2 or later | None | √ | √ |
NOTE:
- √: indicates that the item is required on the node.
- -: indicates that the item is not required on the node.
- If the preceding third-party software has vulnerabilities, fix the vulnerabilities based on official instructions.
- The preceding component versions may be different from those in the Deployment Guide. The Deployment Guide is for reference only.
Table 3 OmniOperator software packages lists the software packages required for installing the OmniOperator feature and explains how to obtain them. In subsequent operations, install the required installation packages based on the operation guide.
NOTE: Use on Spark:
- SparkExtension requires installing the software packages numbered 1, 2 (select the SparkExtension version according to the Spark version), and 5.
- Gluten requires installing the software package numbered 4. Use on Hive:
- HiveExtension requires installing the software packages numbered 1, 3, and 5.
Table 3 OmniOperator software packages
| No. | Software Name | Package Name | Release Type | Description | How to Obtain |
|---|---|---|---|---|---|
| 1 | OmniRuntime package | BoostKit-omniruntime_1.9.0.zip | Closed source | OmniRuntime package (BoostKit-omniruntime_1.9.0.zip). Extract the package to obtain the OmniOperator software package BoostKit-omniop_2.0.0.zip. | Kunpeng community Before using the software package, read and agree to Kunpeng BoostKit User License Agreement 2.0. |
| 2 | SparkExtension | boostkit-omniop-spark-3.1.1-2.0.0-aarch64.zip | Open source | Spark extension package for the OmniOperator computing base. | Link |
| boostkit-omniop-spark-3.3.1-2.0.0-aarch64.zip | Open source | Spark extension package for the OmniOperator computing base. | Link | ||
| boostkit-omniop-spark-3.4.3-2.0.0-aarch64.zip | Open source | Spark extension package for the OmniOperator computing base. | Link | ||
| boostkit-omniop-spark-3.5.2-2.0.0-aarch64.zip | Open source | Spark extension package for the OmniOperator computing base. | Link | ||
| 3 | HiveExtension | boostkit-omniop-hive-3.1.0-2.0.0-aarch64.zip | Open source | Hive extension package for the OmniOperator computing base. | Link |
| 4 | Gluten | Boostkit-omniruntime-gluten-2.0.0.zip | Open source | OmniOperator software installation package (adapted to Gluten) | Link |
| Dependency_library_Gluten.zip | Open source | Library file on which Gluten depends. | Link | ||
| 5 | Dependency_library | Dependency_library_centos.zip Dependency_library_openeuler20.03.zip Dependency_library_openeuler22.03.zip |
Open source | Library file on which OmniOperator depends. Select the dependency package that matches your OS type. | CentOS dependencies openEuler 20.03 dependencies openEuler 22.03 dependencies |
Verifying the Software Package Integrity
After downloading a software package from the Kunpeng community, verify the software package to ensure that it is consistent with the original one on the website.
Verify a software package as follows:
- Obtain the digital certificate and software.
- Obtain the verification tool and guide.
- Verify the package integrity by following the procedure described in the OpenPGP Signature Verification Guide obtained from the URL.
Before installing OmniOperator, prepare the hardware and software environments to facilitate subsequent installation operations.
Installing the Feature
Installation Node Requirements
This section describes the requirements for installing dependency packages and configuring environment variables on each node before installing OmniOperator.
-
If you choose to perform the installation by compiling the source code, before compiling the source code, install GCC/G++, Autoconf, and CMake on each node. Table 1 Software required for source code compilation lists the version requirements.
NOTE:- LLVM and Jemalloc can run properly only after being compiled on the OS. If you want to run them on CentOS, compile them on CentOS. If you want to run them on openEuler 20.03 LTS SP1 or openEuler 22.03 LTS SP1, compile them on openEuler 20.03 LTS SP1 or openEuler 22.03 LTS SP1.
- Gluten depends on the ABSL library and can run properly only after it is compiled and installed on the current OS (openEuler 22.03 SP1).
| Software Name | Version Requirement | Download URL |
|---|---|---|
| GCC/G++ | openEuler 20.03: 7.3.0 | Link |
| GCC/G++ | openEuler 22.03: 10.3.0 | Link |
| Autoconf | 2.69 | Link |
| CMake | 3.20.5 | Link |
-
Install GCC/G++. The following uses version 7.3.0 as an example.
-
Check whether the GCC/G++ version is the target version.
``` gcc --version g++ --version ``` -
Compile and install GCC/G++.
``` # Extract the installation package and go to the gcc-7.3.0 directory. tar -zxvf gcc-7.3.0.tar.gz cd gcc-7.3.0 # Perform compilation and installation. mkdir build && cd build ../configure --prefix=/usr/local/gcc-7.3.0 --enable-languages=c,c++ --disable-multilib make -j$(nproc) make install # Set environment variables. echo 'export PATH=/usr/local/gcc-7.3.0/bin:$PATH' >> /etc/profile echo 'export LD_LIBRARY_PATH=/usr/local/gcc-7.3.0/lib64:$LD_LIBRARY_PATH' >> /etc/profile source /etc/profile # Verify the Installation. gcc --version g++ --version ```
-
-
Compile and install Autoconf.
-
Check whether the Autoconf version is the target version.
``` autoconf --version ``` -
Compile and install Autoconf.
``` # Extract the installation package and go to the autoconf-2.69 directory. tar -zxvf autoconf-2.69.tar.gz cd autoconf-2.69 # Perform compilation and installation. ./configure --prefix=/usr/local/autoconf-2.69 make -j$(nproc) make install # Set environment variables. echo 'export PATH=/usr/local/autoconf-2.69/bin:$PATH' >> /etc/profile source /etc/profile # Verify the Installation. autoconf --version ```
-
-
Install CMake.
-
Check whether the CMake version is the target version.
``` cmake --version ``` -
Install CMake.
``` # Extract the installation package to any directory (/opt in this example). tar -zxvf cmake-3.20.5-linux-aarch64.tar.gz -C /opt # Set environment variables. echo 'export PATH=/opt/cmake-3.20.5-linux-aarch64/bin:$PATH' >> /etc/profile source /etc/profile # Verify the Installation. cmake --version ```
-
- Before installing OmniOperator, deploy necessary components in the cluster environment by following instructions in Table 2 OS and software requirements.
- Before configuring environment variables, check whether the environment variable LD_LIBRARY_PATH exists in your environment. If the environment variable does not exist, you do not need to add LD\_LIBRARY\_PATH** during configuration, so as to prevent irrelevant content from being introduced to the current path. Otherwise, security problems may occur. All environment variable export operations in this document comply with this principle. Take **LD\_LIBRARY\_PATH** as an example. If **LD\_LIBRARY\_PATH** already exists in your environment, use **export LD\_LIBRARY\_PATH=_LD_LIBRARY_PATH:/xxx_. Otherwise, use export LD_LIBRARY_PATH=/xxx.
This section describes the requirements for installing dependency packages and configuring environment variables on each node before installing OmniOperator.
Installing Dependencies
When installing OmniOperator locally, install the LLVM and jemalloc dependency packages on the management node to combine OmniOperator and Spark.
In the Spark on Yarn scenario, use the --archives parameter of Spark to simplify the deployment. You can install the dependencies either by downloading precompiled SO files or by compiling source code. The precompiled SO file can be downloaded and installed quickly, and is suitable for most scenarios. In contrast, compiling and installing from source code is slower, but may be required for certain compliance scenarios.
Installing Dependencies (By Downloading Precompiled SO Files for SparkExtension)
Installing LLVM and jemalloc
NOTE:
- Select the dependency package based on your OS type. The following uses openEuler 22.03 as an example and the dependency package is Dependency_library_openeuler22.03.zip.
- The /opt/omni-operator and /opt/omni-operator/lib directories can be customized.
-
Create an /opt/omni-operator/ directory on the management node as the root directory for installing OmniOperator. Then go to the directory.
mkdir /opt/omni-operator cd /opt/omni-operator - -
Upload the Dependency_library_openeuler22.03.zip package obtained from Table 3 OmniOperator software packages to the /opt/omni-operator/ directory and extract it.
unzip Dependency_library_openeuler22.03.zip -
Create an /opt/omni-operator/lib directory, and copy the libLLVM-15.so and libjemalloc.so.2 subpackages in Dependency_library_openeuler to the /opt/omni-operator/lib directory.
NOTICE:
If LLVM and jemalloc have been installed in the environment, delete the existing libLLVM-15.so and libjemalloc.so.2 files before running the copy command.rm -rf /opt/omni-operator/lib/libjemalloc.so.2 rm -rf /opt/omni-operator/lib/libLLVM-15.socd /opt/omni-operator mkdir lib cp /opt/omni-operator/Dependency_library_openeuler22.03/libjemalloc.so.2 /opt/omni-operator/lib cp /opt/omni-operator/Dependency_library_openeuler22.03/libLLVM-15.so /opt/omni-operator/lib
Installing Dependencies (By Downloading Precompiled SO Files for Gluten)
Installing LLVM and jemalloc
NOTE:
- The precompiled SO files of the ABSL library are not provided for Gluten. Therefore, you need to manually compile and install the SO files.
- The /opt/omni-operator and /opt/omni-operator/lib directories can be customized.
-
Create an /opt/omni-operator/ directory on the management node as the root directory for installing OmniOperator. Then go to the directory.
mkdir /opt/omni-operator cd /opt/omni-operator rm Dependency_library_Gluten.zip -rf -
Upload the Dependency_library_Gluten.zip package obtained from Table 3 OmniOperator software packages to the /opt/omni-operator/ directory and extract it.
unzip Dependency_library_Gluten.zip -
Create an /opt/omni-operator/lib directory, and copy the libLLVM-15.so and libjemalloc.so.2 subpackages in Dependency_library_Gluten to the /opt/omni-operator/lib directory.
NOTICE:
If LLVM and jemalloc have been installed in the environment, delete the existing libLLVM-15.so and libjemalloc.so.2 files before running the copy command.rm -rf /opt/omni-operator/lib/libjemalloc.so.2 rm -rf /opt/omni-operator/lib/libLLVM-15.socd /opt/omni-operator mkdir lib cp /opt/omni-operator/Dependency_library_Gluten/libjemalloc.so.2 /opt/omni-operator/lib cp /opt/omni-operator/Dependency_library_Gluten/libLLVM-15.so /opt/omni-operator/lib -
Compile and install ABSL. For details, see 1 to 3 in the procedure for installing ABSL.
Installing Dependencies (By Compiling and Installing Source Code for SparkExtension and Gluten)
Installing LLVM
NOTE: The /opt/omni-operator, /opt/omni-operator/llvm, and /opt/omni-operator/lib directories can be customized.
-
Download llvm-project-llvmorg-15.0.4.tar.gz, create an /opt/omni-operator directory on the management node as the root directory for installing OmniOperator, go to the directory, and upload the package to /opt/omni-operator.
mkdir /opt/omni-operator cd /opt/omni-operator tar zxvf llvm-project-llvmorg-15.0.4.tar.gz mv llvm-project-llvmorg-15.0.4 llvm cd llvm mkdir build -
Go to the build directory, and compile and install LLVM.
cd ./build cmake -DCMAKE_INSTALL_PREFIX=/opt/omni-operator/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_BUILD_LLVM_DYLIB=true -DLLVM_ENABLE_PROJECTS="clang" -G "Unix Makefiles" ../llvm make -j4 make install -
Create a lib directory in /opt/omni-operator and copy /opt/omni-operator/llvm/lib/libLLVM-15.so to /opt/omni-operator/lib.
mkdir /opt/omni-operator/lib cp /opt/omni-operator/llvm/lib/libLLVM-15.so /opt/omni-operator/lib/
Installing jemalloc
-
Download jemalloc-5.3.0.tar.gz and upload it to the management node.
cd /opt/omni-operator/ tar zxvf jemalloc-5.3.0.tar.gz mv jemalloc-5.3.0 jemalloc
NOTE:
The /opt/omni-operator/jemalloc directory can be customized. -
Go to the jemalloc directory, run the script, and install the generated file.
cd jemalloc ./autogen.sh --disable-initial-exec-tls make -j2 -
Copy /opt/omni-operator/jemalloc/lib/libjemalloc.so.2 to the /opt/omni-operator/lib directory.
cp /opt/omni-operator/jemalloc/lib/libjemalloc.so.2 /opt/omni-operator/lib/
Installing ABSL (required only when ABSL is enabled for Gluten)
-
Download the ABSL source code to the management node.
git clone https://szv-open.codehub.huawei.com/OpenSourceCenter/abseil/abseil-cpp.git cd abseil-cpp/ git checkout tags/20250127.0 -
Compile the ABSL source code.
mkdir build && cd build cmake .. -DCMAKE_CXX_STANDARD=17 -DCMAKE_CXX_STANDARD_REQUIRED=ON -DABSL_PROPAGATE_CXX_STD=ON -DBUILD_SHARED_LIBS=ON make -j32 make install -
Copy the compiled ABSL library to /opt/omni-operator/lib.
cp /usr/local/lib64/libabsl_* /opt/omni-operator/lib
When installing OmniOperator locally, install the LLVM and jemalloc dependency packages on the management node to combine OmniOperator and Spark.
Installing OmniOperator
Install OmniOperator on the management and compute nodes and set environment variables. If OmniOperator is enabled for Gluten, skip this section.
NOTE:
- BoostKit-omniop_2.0.0.zip can be obtained by extracting BoostKit-omniruntime_2.0.0.zip. The BoostKit-omniop_2.0.0.zip package contains the boostkit-omniop-operator-2.0.0-aarch64-openeuler.tar.gz and boostkit-omniop-operator-2.0.0-aarch64-centos.tar.gz packages, which are used for openEuler and CentOS, respectively. The following uses openEuler as an example.
- To install OmniOperator on CentOS, replace boostkit-omniop-operator-2.0.0-aarch64-openeuler.tar.gz in the following commands with boostkit-omniop-operator-2.0.0-aarch64-centos.tar.gz.
-
Upload OmniOperator packages obtained in Table 3 OmniOperator software packages to the /opt/omni-operator/ directory on the management and compute nodes.
-
Go to the /opt/omni-operator/ directory and extract the packages.
cd /opt/omni-operator/ unzip BoostKit-omniruntime_1.9.0.zip unzip BoostKit-omniop_2.0.0.zip tar -zxvf boostkit-omniop-operator-2.0.0-aarch64-openeuler.tar.gz -
Copy OmniOperator files to the /opt/omni-operator/lib directory and set the permission on the software packages in the directory to 550.
cd /opt/omni-operator/boostkit-omniop-operator-2.0.0-aarch64 cp -r include libboostkit* boostkit-omniop* libsecurec.so /opt/omni-operator/lib/ chmod -R 550 /opt/omni-operator/lib/* -
Create a conf folder in the /opt/omni-operator directory and set the folder permission to 750.
cd /opt/omni-operator mkdir conf chmod 750 /opt/omni-operator/conf -
Create an omni.conf file in the conf folder and change the file permission to 640, which is required to set OmniOperator configuration items.
cd conf touch omni.conf chmod 640 omni.conf -
Delete redundant files from /opt/omni-operator.
mkdir -p /opt/omni-operator-bak mv /opt/omni-operator/lib /opt/omni-operator-bak mv /opt/omni-operator/conf /opt/omni-operator-bak ls /opt/omni-operator rm -rf /opt/omni-operator cd /opt mv omni-operator-bak omni-operator
Install OmniOperator on the management and compute nodes and set environment variables. If OmniOperator is enabled for Gluten, skip this section.
(Optional) Installing the UDF Plugin
You need to perform the operations described in this section only when your service scenario involves user-defined functions (UDFs). Perform the following plugin operation operations only on the management node. UDFs cannot be accelerated when OmniOperator is enabled for Gluten.
This plugin supports only the HiveSimpleUDF type (simple UDFs written based on the Hive UDF framework). HiveSimpleUDF is built on the Hive UDF framework and is used to extend the set of functions available in Hive queries. Spark supports Hive UDF interfaces. Therefore, HiveSimpleUDF can be directly used in a Spark environment.
OmniOperator accelerates UDFs through row-by-row or batch processing. You can switch between these two processing modes by modifying the configuration file.
-
Ensure that your UDFs are simple UDFs implemented based on the Hive UDF framework.
-
To use OmniOperator to accelerate UDFs, you need to provide related JAR packages and configuration files, including the following files:
-
udf.zip: contains the class files of all UDFs.
-
conf.zip: contains the configuration files on which the UDFs depend.
-
udf.properties: used to configure OmniOperator to accelerate UDFs.
Using udfName1 and udfName2 as examples, the udf.properties file has the following format:
udfName1 com.huawei.udf.UdfName1 udfName2 com.huawei.udf.UdfName2
Installing the UDF Plugin Row-by-Row Processing
-
Create an /opt/omni-operator/hive-udf directory on the management node.
mkdir /opt/omni-operator/hive-udf -
Upload the udf.zip and conf.zip packages to the /opt/omni-operator/hive-udf directory on the management node.
NOTE:
You can customize the example names of the udf.zip and conf.zip packages based on your service requirements. -
Extract the files.
cd /opt/omni-operator/hive-udf unzip udf.zip rm -f udf.zip unzip conf.zip rm -f conf.zip -
Modify the /opt/omni-operator/conf/omni.conf file.
-
Open the configuration file.
vi /opt/omni-operator/conf/omni.conf -
Press i to enter the insert mode and add the following UDF configuration.
# <----UDF properties----> # false indicates expression row-by-row processing and true indicates expression bath processing. enableBatchExprEvaluate=false # UDF trustlist file path hiveUdfPropertyFilePath=./hive-udf/udf.properties # Directory of the Hive UDF JAR package hiveUdfDir=./hive-udf/udf
NOTE:
The directory must start with a period (.). When OmniOperator is running, it reads the value of the OMNI_HOME environment variable and uses this value to replace the period (.). -
Press Esc, type :wq!, and press Enter to save the file and exit.
-
-
Update the environment variable.
-
Open the ~/.bashrc file.
vi ~/.bashrc -
Press i to enter the insert mode and add LD_LIBRARY_PATH to update environment variables.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${JAVA_HOME}/jre/lib/aarch64/server -
Press Esc, type :wq!, and press Enter to save the file and exit.
-
Make the updated environment variable take effect.
source ~/.bashrc
-
Installing the UDF Plugin Batch Processing
After installing row-by-row processing on the management node, modify the /opt/omni-operator/conf/omni.conf file to support batch processing.
-
Open the file.
vi /opt/omni-operator/conf/omni.conf -
Press i to enter the insert mode, find the following statements, and modify them.
enableBatchExprEvaluate=true -
Press Esc, type :wq!, and press Enter to save the file and exit.
You need to perform the operations described in this section only when your service scenario involves user-defined functions (UDFs). Perform the following plugin operation operations only on the management node. UDFs cannot be accelerated when OmniOperator is enabled for Gluten.
