be 节点停掉了

集群搭建描述:3个fe、3个be;

数据压力:20万/s

任务:我们定义了30个kafka到starrocks task

问题描述:集群运行了5天左右,突然之间be节点断掉了,be停止之前的日志如下,
starrocks_be: rdkafka_broker.c:5412: rd_kafka_broker_destroy_final: Assertion thrd_is_current(rkb->rkb_thread)' failed. *** Aborted at 1641179249 (unix time) try "date -d @1641179249" if you are using GNU date *** PC: @ 0x7fe50528e387 __GI_raise *** SIGABRT (@0x10407) received by PID 66567 (TID 0x7fe3913fd700) from PID 66567; stack trace: *** @ 0x32bb1d2 google::(anonymous namespace)::FailureSignalHandler() @ 0x7fe505f59630 (unknown) @ 0x7fe50528e387 __GI_raise @ 0x7fe50528fa78 __GI_abort @ 0x7fe5052871a6 __assert_fail_base @ 0x7fe505287252 __GI___assert_fail @ 0x3061cf7 rd_kafka_broker_destroy_final @ 0x30e35df rd_kafka_metadata_refresh_topics @ 0x30e39ba rd_kafka_metadata_refresh_known_topics @ 0x305c9a2 rd_kafka_broker_fail @ 0x306d0bc rd_kafka_broker_op_serve @ 0x306e7f6 rd_kafka_broker_ops_io_serve @ 0x306ed48 rd_kafka_broker_consumer_serve @ 0x3070337 rd_kafka_broker_serve @ 0x3070945 rd_kafka_broker_thread_main @ 0x30da9d7 _thrd_wrapper_function @ 0x7fe505f51ea5 start_thread @ 0x7fe5053569fd __clone @ 0x0 (unknown) start time: Tue Jan 4 09:13:11 CST 2022 starrocks_be: rdkafka_broker.c:5412: rd_kafka_broker_destroy_final: Assertion thrd_is_current(rkb->rkb_thread)’ failed.
*** Aborted at 1641276281 (unix time) try “date -d @1641276281” if you are using GNU date ***
PC: @ 0x7fc5e24c9387 __GI_raise
*** SIGABRT (@0x11963) received by PID 72035 (TID 0x7fc4ee440700) from PID 72035; stack trace: ***
@ 0x32bb1d2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fc5e3194630 (unknown)
@ 0x7fc5e24c9387 __GI_raise
@ 0x7fc5e24caa78 __GI_abort
@ 0x7fc5e24c21a6 __assert_fail_base
@ 0x7fc5e24c2252 __GI___assert_fail
@ 0x3061cf7 rd_kafka_broker_destroy_final
@ 0x30e35df rd_kafka_metadata_refresh_topics
@ 0x30e39ba rd_kafka_metadata_refresh_known_topics
@ 0x305c9a2 rd_kafka_broker_fail
@ 0x306d0bc rd_kafka_broker_op_serve
@ 0x306e7f6 rd_kafka_broker_ops_io_serve
@ 0x306ed48 rd_kafka_broker_consumer_serve
@ 0x3070337 rd_kafka_broker_serve
@ 0x3070945 rd_kafka_broker_thread_main
@ 0x30da9d7 _thrd_wrapper_function
@ 0x7fc5e318cea5 start_thread
@ 0x7fc5e25919fd __clone
@ 0x0 (unknown)
start time: Tue Jan 4 14:58:42 CST 2022

请问您使用的是哪个版本?

版本是1.19.5

这个是librdkafka第三方依赖库的问题 [Help Wanted]Crash on rd_kafka_broker_destroy_final · Issue #3608 · edenhill/librdkafka (github.com),可以升级到2.0.3版本,对这个问题进行了规避

你好我们现在Starrocks的版本是2.5.0,在cpu压力到达60%的时候也有这个问题,现在这个节点宕机了。
starrocks_be: /var/local/thirdparty/src/librdkafka-1.9.2/src/rdkafka_broker.c:5464: rd_kafka_broker_destroy_final: Assertion `thrd_is_current(rkb->rkb_thread)’ failed.
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
*** Aborted at 1702527254 (unix time) try “date -d @1702527254” if you are using GNU date ***
PC: @ 0x7f1723f09207 __GI_raise
*** SIGABRT (@0x3ea00003af5) received by PID 15093 (TID 0x7f134b230700) from PID 15093; stack trace: ***
@ 0x59fe902 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f17249bd5d0 (unknown)
@ 0x7f1723f09207 __GI_raise
@ 0x7f1723f0a8f8 __GI_abort
@ 0x7f1723f02026 __assert_fail_base
@ 0x7f1723f020d2 __GI___assert_fail
@ 0x698f687 rd_kafka_broker_destroy_final
@ 0x69ce1a8 rd_kafka_metadata_refresh_topics
@ 0x69ce58b rd_kafka_metadata_refresh_known_topics
@ 0x69889ca rd_kafka_broker_fail
@ 0x699c782 rd_kafka_broker_op_serve
@ 0x699da6f rd_kafka_broker_ops_io_serve
@ 0x699e560 rd_kafka_broker_consumer_serve
@ 0x69a0280 rd_kafka_broker_serve
@ 0x69a08ad rd_kafka_broker_thread_main
@ 0x6a97af8 _thrd_wrapper_function
@ 0x7f17249b5dd5 start_thread
@ 0x7f1723fd0ead __clone
@ 0x0 (unknown)