Application to create a new SIG

English | 简体中文

Note: The Charter of this SIG follows the convention described in the openEuler charter [README] (/en/governance/README.md), and follows [SIG-governance] (/en/technical-committee/governance/SIG-governance.md).

SIG Mission

The data has become the fifth factor of production after land, labor, capital and technology. In this case, the OpenEuler community needs to consider how to fully play the role of data in production and life from the technical perspective and build big data processing capabilities. The SIG is responsible for building big data processing capabilities.

Service scope

  • Basic big data running capabilities on OpenEuler, including data collection, data transmission, data storage, data analysis, and data visualization.
  • Bigdata platforms integrating various commonly used tools and software to provide a unified user interface, making big data easier to use on OpenEuler.
  • Performance optimizing of big data components and platforms on the OpenEuler to make big data better on the OpenEuler.
  • Integrating big data-related capabilities on OpenEuler and support new chips and software when they are coming.

Repositories and description managed by this SIG

  • jupyter An environment for interactive computing in multiple languages
  • hadoop A software platform for processing vast amounts of data
  • libhdfs The Apache Hadoop Filesystem Library
  • qrupdate A Fortran library for fast updates of QR and Cholesky decompositions
  • zookeeper A high-performance service for building distributed applications
  • ibis A toolbox to bridge the gap between local Python environments, remote storage, execution systems like Hadoop components (HDFS, Impala, Hive, Spark) and SQL databases. Its goal is to simplify analytical workflows and make you more productive.
  • presto A distributed SQL query engine for big data.
  • rain An open-source distributed computational framework for processing of large-scale task-based pipelines.
  • alluxio Alluxio (formerly known as Tachyon) is a virtual distributed storage system.
  • ambari Apache Ambari is a tool for provisioning, managing, and monitoring Apache Hadoop clusters.

Basic Information

Maintainers

Committers

Mailing list

Roadmap