StarRocks-2.2.0查询Hudi 0.11.0 COW外表出错

我在阿里云OSS上创建了一张Hudi COW表并注册到Hive 3.1.2 MetaStore,使用Hive查询没有问题。安装StarRocks 2.2.0,按照官网说明去创建Hudi外表,创建的过程中报错如下,这是因为hadoop版本原因么?我用的是开源 hadoop 2.10.1。

mysql> CREATE EXTERNAL TABLE demo_trips_cow (
-> begin_lat double NULL,
-> begin_lon double NULL,
-> driver varchar(200) NULL,
-> end_lat double NULL,
-> end_lon double NULL,
-> fare double NULL,
-> partitionpath varchar(200) NULL,
-> rider varchar(200) NULL,
-> ts bigint NULL,
-> uuid varchar(200) NULL,
-> continent varchar(200) NULL,
-> country varchar(200) NULL,
-> city varchar(200) NULL
-> ) ENGINE=HUDI
-> PROPERTIES (
-> “resource” = “hudi0”,
-> “database” = “default”,
-> “table” = “demo_trips_cow”
-> );
ERROR 1064 (HY000): Unexpected exception: Failed to get instance of org.apache.hadoop.fs.FileSystem

之后改用通过hive外表方式查询这张hudi表也会报错 ERROR 1064 (HY000): com.starrocks.common.DdlException: get partition detail failed: com.starrocks.common.DdlException: get hive partition meta data failed: unsupported file format [org.apache.hudi.hadoop.HoodieParquetInputFormat]

而后我又试了用SR查询另一张普通Hive表(数据文件在HDFS上)是没有问题的。

在Hive上查看Hudi表的建表语句如下
CREATE EXTERNAL TABLE demo_trips_cow(
_hoodie_commit_time string,
_hoodie_commit_seqno string,
_hoodie_record_key string,
_hoodie_partition_path string,
_hoodie_file_name string,
begin_lat double,
begin_lon double,
driver string,
end_lat double,
end_lon double,
fare double,
partitionpath string,
rider string,
ts bigint,
uuid string)
PARTITIONED BY (
continent string,
country string,
city string)
ROW FORMAT SERDE
‘org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe’
WITH SERDEPROPERTIES (
‘hoodie.query.as.ro.table’=‘false’,
‘path’=‘oss://datalake-huifu/hudi/demo_trips_cow’)
STORED AS INPUTFORMAT
‘org.apache.hudi.hadoop.HoodieParquetInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat’
LOCATION
‘oss://datalake-huifu/hudi/demo_trips_cow’
TBLPROPERTIES (
‘bucketing_version’=‘2’,
‘last_commit_time_sync’=‘20220519161739696’,
‘spark.sql.create.version’=‘3.2.1’,
‘spark.sql.sources.provider’=‘hudi’,
‘spark.sql.sources.schema.numPartCols’=‘3’,
‘spark.sql.sources.schema.numParts’=‘1’,
‘spark.sql.sources.schema.part.0’=’{“type”:“struct”,“fields”:[{“name”:"_hoodie_commit_time",“type”:“string”,“nullable”:true,“metadata”:{}},{“name”:"_hoodie_commit_seqno",“type”:“string”,“nullable”:true,“metadata”:{}},{“name”:"_hoodie_record_key",“type”:“string”,“nullable”:true,“metadata”:{}},{“name”:"_hoodie_partition_path",“type”:“string”,“nullable”:true,“metadata”:{}},{“name”:"_hoodie_file_name",“type”:“string”,“nullable”:true,“metadata”:{}},{“name”:“begin_lat”,“type”:“double”,“nullable”:true,“metadata”:{}},{“name”:“begin_lon”,“type”:“double”,“nullable”:true,“metadata”:{}},{“name”:“driver”,“type”:“string”,“nullable”:true,“metadata”:{}},{“name”:“end_lat”,“type”:“double”,“nullable”:true,“metadata”:{}},{“name”:“end_lon”,“type”:“double”,“nullable”:true,“metadata”:{}},{“name”:“fare”,“type”:“double”,“nullable”:true,“metadata”:{}},{“name”:“partitionpath”,“type”:“string”,“nullable”:true,“metadata”:{}},{“name”:“rider”,“type”:“string”,“nullable”:true,“metadata”:{}},{“name”:“ts”,“type”:“long”,“nullable”:true,“metadata”:{}},{“name”:“uuid”,“type”:“string”,“nullable”:true,“metadata”:{}},{“name”:“continent”,“type”:“string”,“nullable”:false,“metadata”:{}},{“name”:“country”,“type”:“string”,“nullable”:false,“metadata”:{}},{“name”:“city”,“type”:“string”,“nullable”:false,“metadata”:{}}]}’,
‘spark.sql.sources.schema.partCol.0’=‘continent’,
‘spark.sql.sources.schema.partCol.1’=‘country’,
‘spark.sql.sources.schema.partCol.2’=‘city’,
‘transient_lastDdlTime’=‘1652948282’);

ERROR 1064 (HY000): Unexpected exception: Failed to get instance of org.apache.hadoop.fs.FileSystem

fe.log日志发一下

2022-05-24 16:06:44,300 WARN (starrocks-mysql-nio-pool-9|955) [StmtExecutor.handleDdlStmt():933] DDL statement(CREATE EXTERNAL TABLE demo_trips_cow (
begin_lat double NULL,
begin_lon double NULL,
driver varchar(200) NULL,
end_lat double NULL,
end_lon double NULL,
fare double NULL,
partitionpath varchar(200) NULL,
rider varchar(200) NULL,
ts bigint NULL,
uuid varchar(200) NULL,
continent varchar(200) NULL,
country varchar(200) NULL,
city varchar(200) NULL
) ENGINE=HUDI
PROPERTIES (
“resource” = “hudi0”,
“database” = “default”,
“table” = “demo_trips_cow”
)) process failed.
org.apache.hudi.exception.HoodieIOException: Failed to get instance of org.apache.hadoop.fs.FileSystem
at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:104) ~[hudi-common-0.10.0.jar:0.10.0]
at org.apache.hudi.common.table.HoodieTableMetaClient.getFs(HoodieTableMetaClient.java:256) ~[hudi-common-0.10.0.jar:0.10.0]
at org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:112) ~[hudi-common-0.10.0.jar:0.10.0]
at org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:73) ~[hudi-common-0.10.0.jar:0.10.0]
at org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:614) ~[hudi-common-0.10.0.jar:0.10.0]
at com.starrocks.catalog.HudiTable.validate(HudiTable.java:225) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.HudiTable.(HudiTable.java:97) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.createHudiTable(Catalog.java:4431) ~[starrocks-fe.jar:?]
at com.starrocks.catalog.Catalog.createTable(Catalog.java:3156) ~[starrocks-fe.jar:?]
at com.starrocks.qe.DdlExecutor.execute(DdlExecutor.java:112) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.handleDdlStmt(StmtExecutor.java:920) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:439) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:283) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:430) ~[starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:666) ~[starrocks-fe.jar:?]
at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:55) ~[starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_302]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_302]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_302]
Caused by: java.io.IOException: ERROR: without login secrets configured.
at com.aliyun.emr.fs.auth.AliyunCredentialProviderList.getJindoCredentialsProvider(AliyunCredentialProviderList.java:116) ~[jindofs-sdk-3.7.2.jar:?]
at com.aliyun.emr.fs.internal.ossnative.OssCredentialUtils.createOssCredentialContext(OssCredentialUtils.java:119) ~[jindofs-sdk-3.7.2.jar:?]
at com.aliyun.emr.fs.internal.ossnative.OssNativeStore.(OssNativeStore.java:167) ~[jindofs-sdk-3.7.2.jar:?]
at com.aliyun.emr.fs.oss.JindoOssFileSystem.initialize(JindoOssFileSystem.java:134) ~[jindofs-sdk-3.7.2.jar:?]
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414) ~[hadoop-common-3.3.0.jar:?]
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) ~[hadoop-common-3.3.0.jar:?]
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474) ~[hadoop-common-3.3.0.jar:?]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442) ~[hadoop-common-3.3.0.jar:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524) ~[hadoop-common-3.3.0.jar:?]
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) ~[hadoop-common-3.3.0.jar:?]
at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:102) ~[hudi-common-0.10.0.jar:0.10.0]
… 18 more
2022-05-24 16:06:44,946 INFO (Routine load scheduler|46) [RoutineLoadScheduler.process():76] there are 0 job need schedule
2022-05-24 16:06:45,128 INFO (Thread-47|102) [ReportHandler.taskReport():333] begin to handle task report from backend 10004
2022-05-24 16:06:45,128 INFO (thrift-server-pool-3|193) [ReportHandler.handleReport():187] receive report from be 10004. type: task, current queue size: 1
2022-05-24 16:06:45,128 INFO (Thread-47|102) [ReportHandler.taskReport():373] finished to handle task report from backend 10004, diff task num: 0. cost: 0 ms
2022-05-24 16:06:45,444 INFO (thrift-server-pool-2|190) [ReportHandler.handleReport():187] receive report from be 10003. type: , current queue size: 1
2022-05-24 16:06:45,484 INFO (thrift-server-pool-8|198) [ReportHandler.handleReport():187] receive report from be 10009. type: , current queue size: 1
2022-05-24 16:06:45,484 INFO (thrift-server-pool-4|194) [ReportHandler.handleReport():187] receive report from be 10004. type: , current queue size: 2
2022-05-24 16:06:46,613 INFO (tablet scheduler|40) [ClusterLoadStatistic.classifyBackendByLoad():155] classify backend by load. medium: HDD, avg load score: 0.5, low/mid/high: 0/3/0
2022-05-24 16:06:46,613 INFO (tablet scheduler|40) [TabletScheduler.updateClusterLoadStatistic():353] update cluster default_cluster load statistic:
be id: 10003, is available: true, mediums: [{medium: HDD, replica: 10, used: 0, total: 51285684224, score: 0.5},{medium: SSD, replica: 0, used: 0, total: 0, score: NaN},], paths: [{path: /var/lib/container/software/StarRocks/be/storage, path hash: -7379868134095079688, be: 10003, medium: HDD, used: 0, total: 51285684224},]
be id: 10004, is available: true, mediums: [{medium: HDD, replica: 10, used: 0, total: 41237340160, score: 0.5},{medium: SSD, replica: 0, used: 0, total: 0, score: NaN},], paths: [{path: /var/lib/container/software/StarRocks/be/storage, path hash: -1995626152633031766, be: 10004, medium: HDD, used: 0, total: 41237340160},]
be id: 10009, is available: true, mediums: [{medium: HDD, replica: 10, used: 0, total: 87181357056, score: 0.5},{medium: SSD, replica: 0, used: 0, total: 0, score: NaN},], paths: [{path: /var/lib/container/software/StarRocks/be/storage, path hash: 6048419355841308433, be: 10009, medium: HDD, used: 0, total: 87181357056},]

2022-05-24 16:06:46,613 INFO (tablet scheduler|40) [TabletScheduler.adjustPriorities():382] adjust priority for all tablets. changed: 0, total: 0
2022-05-24 16:06:49,110 INFO (Thread-47|102) [ReportHandler.taskReport():333] begin to handle task report from backend 10003
2022-05-24 16:06:49,110 INFO (thrift-server-pool-1|189) [ReportHandler.handleReport():187] receive report from be 10003. type: task, current queue size: 1
2022-05-24 16:06:49,110 INFO (Thread-47|102) [ReportHandler.taskReport():373] finished to handle task report from backend 10003, diff task num: 0. cost: 0 ms
2022-05-24 16:06:49,276 INFO (Thread-47|102) [ReportHandler.taskReport():333] begin to handle task report from backend 10009
2022-05-24 16:06:49,276 INFO (Thread-47|102) [ReportHandler.taskReport():373] finished to handle task report from backend 10009, diff task num: 0. cost: 0 ms
2022-05-24 16:06:49,276 INFO (thrift-server-pool-7|197) [ReportHandler.handleReport():187] receive report from be 10009. type: task, current queue size: 1

读oss的数据需要在fe core-site里配置一下AK,参考:https://docs.starrocks.com/zh-cn/main/using_starrocks/External_table#aliyun-oss-支持