Hudi
Current English DOC | 查阅中文文档
Project background
Hudi is a data lake storage format that provides the ability to update and delete data, as well as the ability to consume changing data on the Hadoop file system. Support multiple computing engines, provide IUD interfaces, and provide streaming primitives for insert update and incremental pull on HDFS datasets.

reference:https://support.huaweicloud.com/productdesc-mrs/mrs_08_0083.html
Using Documents
1.Preparation before use
(1) On the Huawei Cloud OBS console, create a bucket to store the data written by Hudi https://console.huaweicloud.com/console/#/obs/manager/buckets
(2) On the Huawei Cloud DIS console, create a channel for configuring OBS event notifications https://console.huaweicloud.com/dis/
Note: When creating a channel, the data source type needs to be selected as JSON

2.Parameter configuration method
In core-site.xml, configure the information required for linking OBS
<configuration>
<property>
<name>fs.defaultFS</name>
<value>obs://bucketname/</value>
</property>
<property>
<name>fs.obs.access.key</name>
<value>Huawei Cloud access key</value>
</property>
<property>
<name>fs.obs.secret.key</name>
<value>Huawei Cloud secret key</value>
</property>
<property>
<name>fs.obs.endpoint</name>
<value>Huawei Cloud OBS endpoint to connect to</value>
</property>
<property>
<name>fs.obs.impl</name>
<value>org.apache.hadoop.fs.obs.OBSFileSystem</value>
</property>
</configuration>
In dis.properties, configure the information required for linking DIS event notifications
endpoint=Huawei Cloud DIS endpoint to connect to
region=Huawei Cloud DIS region
ak=Huawei Cloud access key
sk=Huawei Cloud secret key
projectId=The projectId corresponding to the Huawei Cloud DIS region
Note: The endpoint needs to be preceded by https://, for example https://dis.cn-north-4.myhuaweicloud.com
3.Run As
For the operation method, refer to com.xnx3.obs.sources.TestDISEventSource, and configure the parameters required for the operation using HoodieWriterConfig and DISReaderConfig
HoodieWriterConfig config = new HoodieWriterConfig()
.basePath("obs://hudi-test-target/hudi_dis_cow")
.tableName("hudi_dis_cow")
.saveMode(SaveMode.Append)
.keyGenerator(NonpartitionedKeyGenerator.class.getName())
.recordkeyFieldOptKey("partitionKey")
.precombineFieldOptKey("timestamp");
DISReaderConfig dicConfig = new DISReaderConfig()
.streamName("hudi-dis-test")
.partitionId("0")
.startingSequenceNumber("0")
.cursorType(PartitionCursorTypeEnum.AT_SEQUENCE_NUMBER.name());
new DISEventSource().fetchEvents(config, dicConfig);