2b8c3d4e创建于 2023年6月17日历史提交

Hudi

Current English DOC | 查阅中文文档

Project background

Hudi is a data lake storage format that provides the ability to update and delete data, as well as the ability to consume changing data on the Hadoop file system. Support multiple computing engines, provide IUD interfaces, and provide streaming primitives for insert update and incremental pull on HDFS datasets.

img

reference:https://support.huaweicloud.com/productdesc-mrs/mrs_08_0083.html

Using Documents

1.Preparation before use

(1) On the Huawei Cloud OBS console, create a bucket to store the data written by Hudi https://console.huaweicloud.com/console/#/obs/manager/buckets

(2) On the Huawei Cloud DIS console, create a channel for configuring OBS event notifications https://console.huaweicloud.com/dis/

Note: When creating a channel, the data source type needs to be selected as JSON img

2.Parameter configuration method

In core-site.xml, configure the information required for linking OBS

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>obs://bucketname/</value>
    </property>
    <property>
        <name>fs.obs.access.key</name>
        <value>Huawei Cloud access key</value>
    </property>
    <property>
        <name>fs.obs.secret.key</name>
        <value>Huawei Cloud secret key</value>
    </property>
    <property>
        <name>fs.obs.endpoint</name>
        <value>Huawei Cloud OBS endpoint to connect to</value>
    </property>
    <property>
        <name>fs.obs.impl</name>
        <value>org.apache.hadoop.fs.obs.OBSFileSystem</value>
    </property>
</configuration>

In dis.properties, configure the information required for linking DIS event notifications

endpoint=Huawei Cloud DIS endpoint to connect to
region=Huawei Cloud DIS region
ak=Huawei Cloud access key
sk=Huawei Cloud secret key
projectId=The projectId corresponding to the Huawei Cloud DIS region

Note: The endpoint needs to be preceded by https://, for example https://dis.cn-north-4.myhuaweicloud.com

3.Run As

For the operation method, refer to com.xnx3.obs.sources.TestDISEventSource, and configure the parameters required for the operation using HoodieWriterConfig and DISReaderConfig

HoodieWriterConfig config = new HoodieWriterConfig()
        .basePath("obs://hudi-test-target/hudi_dis_cow")
        .tableName("hudi_dis_cow")
        .saveMode(SaveMode.Append)
        .keyGenerator(NonpartitionedKeyGenerator.class.getName())
        .recordkeyFieldOptKey("partitionKey")
        .precombineFieldOptKey("timestamp");

DISReaderConfig dicConfig = new DISReaderConfig()
        .streamName("hudi-dis-test")
        .partitionId("0")
        .startingSequenceNumber("0")
        .cursorType(PartitionCursorTypeEnum.AT_SEQUENCE_NUMBER.name());

        new DISEventSource().fetchEvents(config, dicConfig);