导入数据时报错Failed to get next from be -> ip

【详述】问题详细描述
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Failed to get next from be -> ip:[192...] CANCELLED msg:[canceled state]
【导入/导出方式】
flink cdc
【背景】做过哪些操作?
从starrocks一张表导入starrocks另一张表
【业务影响】
【StarRocks版本】例如:2.2
【集群规模】例如:1fe(1 follower+2observer)+3be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【附件】
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Failed to get next from be -> ip:[192.
..] CANCELLED msg:[canceled state]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:593)
at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650)
at com.starrocks.connector.flink.table.source.StarRocksDynamicSourceFunction.run(StarRocksDynamicSourceFunction.java:155)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:104)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:60)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:269)

be没挂,并且这个错误是第二次出现了,一样的操作

求帮助,请问是什么原因,同步了一部分,突然就失败了,flink的版本为1.3.6,官方要求的两个包分别是flink-connector-starrocks-1.2.1_flink-1.13_2.12.jar,flink-sql-connector-mysql-cdc-1.3.0.jar

从starrocks一张表导入starrocks另一张表是指?

flink监听表A的变化,当表A的数据变化时,表B中的数据同步变化,AB都在SR里

我也遇到了:
E1116 13:13:19.787750 4964 olap_scan_node.cpp:266] [TUniqueId(hi=4498849501048228643, lo=-8171290676406236532)] Cancelled: canceled state
W1116 13:13:20.780179 5115 fragment_mgr.cpp:180] Fail to open fragment f64573a9-78c5-0790-c469-14aaba9513be: Cancelled: canceled state

通过TStarrocksExternalService.Client.get_next(params) 读取时报错,而且每次运行flink程序都出错。
这个表在starrocks是可以正常使用的。

INFO 日志一直在打印 :I1116 13:29:30.782474 5071 fragment_mgr.cpp:513] FragmentMgr cancel worker going to cancel timeout fragment f64573a9-78c5-0790-c469-14aaba9513be

您用的什么版本?请提供下对应时间段flink taskmanager日志、作业从启动到出异常这个时间段内fe和be的日志,这边看下

版本是 2.3.3

be.WARM:
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:200 _plan->open(_runtime_state)
E1116 13:45:14.524895 29083 olap_scan_node.cpp:266] [TUniqueId(hi=-6506145385482269730, lo=-7252756451176626317)] Cancelled: ca
nceled state
W1116 13:45:14.529227 29264 fragment_mgr.cpp:180] Fail to open fragment 97467865-3658-5456-9995-1ac510227db8: Cancelled: cancel
ed state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
W1116 13:45:15.571513 26195 backend_base.cpp:188] fragment_instance_id [97467865-3658-5456-9995-1ac510227db8] fetch result stat
us [Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/result_queue_mgr.cpp:56 queue->status()]
W1116 13:45:15.572602 26195 backend_base.cpp:188] fragment_instance_id [97467865-3658-5456-9995-1ac510227db8] fetch result status [Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/result_queue_mgr.cpp:56 queue->status()]
W1116 13:45:15.572896 26195 backend_base.cpp:188] fragment_instance_id [97467865-3658-5456-9995-1ac510227db8] fetch result status [Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/result_queue_mgr.cpp:56 queue->status()]
W1116 13:45:15.573272 26195 backend_base.cpp:188] fragment_instance_id [97467865-3658-5456-9995-1ac510227db8] fetch result status [Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/result_queue_mgr.cpp:56 queue->status()]
E1116 13:58:27.747947 29114 olap_scan_node.cpp:266] [TUniqueId(hi=-2537900044075710516, lo=-5402410605033422913)] Cancelled: canceled state
W1116 13:58:27.749850 29264 fragment_mgr.cpp:180] Fail to open fragment 71440aae-5ca0-6ca3-098f-d6ce33911cb1: Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
W1116 13:58:28.790318 26393 backend_base.cpp:188] fragment_instance_id [71440aae-5ca0-6ca3-098f-d6ce33911cb1] fetch result status [Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/result_queue_mgr.cpp:56 queue->status()]
W1116 13:58:28.792894 26393 backend_base.cpp:188] fragment_instance_id [71440aae-5ca0-6ca3-098f-d6ce33911cb1] fetch result status [Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/result_queue_mgr.cpp:56 queue->status()]
W1116 13:58:28.793229 26393 backend_base.cpp:188] fragment_instance_id [71440aae-5ca0-6ca3-098f-d6ce33911cb1] fetch result status [Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/result_queue_mgr.cpp:56 queue->status()]
W1116 13:58:28.793428 26393 backend_base.cpp:188] fragment_instance_id [71440aae-5ca0-6ca3-098f-d6ce33911cb1] fetch result status [Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/result_queue_mgr.cpp:56 queue->status()]
W1116 13:58:28.793581 26393 backend_base.cpp:188] fragment_instance_id [71440aae-5ca0-6ca3-098f-d6ce33911cb1] fetch result status [Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/result_queue_mgr.cpp:56 queue->status()]
W1116 13:58:28.793730 26393 backend_base.cpp:188] fragment_instance_id [71440aae-5ca0-6ca3-098f-d6ce33911cb1] fetch result status [Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/result_queue_mgr.cpp:56 queue->status()]
W1116 13:58:28.794034 26393 backend_base.cpp:188] fragment_instance_id [71440aae-5ca0-6ca3-098f-d6ce33911cb1] fetch result status [Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/result_queue_mgr.cpp:56 queue->status()]
W1116 13:58:28.794284 26393 backend_base.cpp:188] fragment_instance_id [71440aae-5ca0-6ca3-098f-d6ce33911cb1] fetch result status [Cancelled: canceled state
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
/root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
/root/starrocks/be/src/runtime/result_queue_mgr.cpp:56 queue->status()]
W1116 13:58:28.794462 26393 backend_base.cpp:188] fragment_instance_id [71440aae-5ca0-6ca3-098f-d6ce33911cb1] fetch result status [Cancelled: canceled state

更正下,我用的版本是 2.3.0

taskmanger的日志没有什么看到,就是一直在重试,超过次数就结束了。
fe的日志,一直在反复刷:
2022-11-16 14:57:42,570 INFO (nioEventLoopGroup-3-12|96) [RestBaseAction.handleRequest():55] receive http request. url=/api/bootstrap?cluster_id=853982505&token=a8426c72-4c13-48cc-8fa8-dcb2be1b550d
2022-11-16 14:57:42,579 INFO (replayer|69) [GlobalStateMgr.replayJournal():1711] replayed journal id is 2437441, replay to journal id is 2437442
2022-11-16 14:57:44,007 INFO (replayer|69) [GlobalStateMgr.replayJournal():1711] replayed journal id is 2437442, replay to journal id is 2437443

也无 fe.warn 日志,be的内存都正常。

flink taskmanger的日志麻烦也发下吧,看着scan被超时cancel了,超时原因还得再看。你这个具体是在做什么操作报错的?

有个现象: 每次都是跑正好10分钟失败。 我怀疑是keep_alive_min 这个参数设置为10分钟。 可是我后来调整为30分钟也没用。 每次都是跑10分钟就出现这个问题

麻烦具体配置发下?

TSocket socket =
new TSocket(
beHost,
bePort,
3000,
3000);

TScanOpenParams :
params.setTablet_ids(tablets);
params.setOpaqued_query_plan(opaqued_query_plan); // 这个从plan里获取的
params.setCluster(“default_cluster”);
params.setDatabase(starRocksConf.getDatabase());
params.setTable(starRocksConf.getTable());
params.setUser(starRocksConf.getUsername());
params.setPasswd(starRocksConf.getPassword());
params.setBatch_size(1024);
params.setMem_limit(1073741824);
params.setProperties(starRocksConf.getBeSocketProperties()); // 这个传的是空的集合
params.setKeep_alive_min((short) 30);
params.setQuery_timeout(600);

10分钟失败,应该是query timeout设置的600s,params.setQuery_timeout(600)。可能是你的表数据量比较大,需要读的比较久,可以把timeout时间设长点试试

请问下,这个参数设置为-1 会怎么样? be会认为这个查询不需要设置超时吗

be会取 max(1, timeout)

遇到了相同的问题,请问解决了吗?如何解决?