be 不定时宕机

be 一直宕机,6台be 不定时宕机
版本:3.0.3
3台fe 6台be

50be.INFO (51.0 MB) 42fe.log (77.3 MB)

be.out 发下,

start time: Fri Mar 8 16:52:18 CST 2024
3.0.3 RELEASE (build fe5e3a1)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 36410306000
tracker:query_pool consumption: 0
tracker:load consumption: 394733320
tracker:metadata consumption: 3224711929
tracker:tablet_metadata consumption: 308049099
tracker:rowset_metadata consumption: 195764726
tracker:segment_metadata consumption: 252566254
tracker:column_metadata consumption: 2468331850
tracker:tablet_schema consumption: 20403675
tracker:segment_zonemap consumption: 134944548
tracker:short_key_index consumption: 88875333
tracker:column_zonemap_index consumption: 1094844970
tracker:ordinal_index consumption: 970990768
tracker:bitmap_index consumption: 25632336
tracker:bloom_filter_index consumption: 5480576
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:column_pool consumption: 1281463986
tracker:page_cache consumption: 27400345472
tracker:update consumption: 2901028316
tracker:chunk_allocator consumption: 2148323408
tracker:clone consumption: 0
tracker:consistency consumption: 0
*** Aborted at 1710326728 (unix time) try “date -d @1710326728” if you are using GNU date ***
PC: @ 0x46f6714 starrocks::PrimaryIndex::memory_usage()
*** SIGSEGV (@0x88) received by PID 22619 (TID 0x7fb8b63f8700) from PID 136; stack trace: ***
@ 0x62e4042 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fb9481cb630 (unknown)
@ 0x46f6714 starrocks::PrimaryIndex::memory_usage()
@ 0x4810df6 starrocks::UpdateManager::on_rowset_finished()
@ 0x4e60687 starrocks::DeltaWriter::commit()
@ 0x49da1ce _ZNSt17_Function_handlerIFvvEZN9starrocks17SegmentFlushToken6submitEPN4brpc10ControllerEPKNS1_30PTabletWriterAddSegmentRequestEPNS1_29PTabletWriterAddSegmentResultEPN6google8protobuf7ClosureEEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x50e93c2 starrocks::ThreadPool::dispatch_thread()
@ 0x50e3eba starrocks::thread::supervise_thread()
@ 0x7fb9481c3ea5 start_thread
@ 0x7fb9477de96d __clone
@ 0x0 (unknown)

目前排查两个问题:1、主键模型排序键,合并失败应该是3.0.3 版本bug

这个Crash的原因已确定,会尽快修复。

导致我们生产环境be频繁宕机,这个当前有解决方案么?

只能打个临时包,升级下BE

预计会在哪个版本进行修复?

计划在3.0.10修复

那我升级到3.1.9 是不是可以满足?

3.1.9也没修复

但是关于主键模型的排序索引,还有一个报错,官方建议升级到3.1.9

什么错误,发来看看?

be.info 报错Internal error: wait_for_version version:2 failed: apply stopped tablet:16802567 这是 我把建表语句修改以后还存在的报错信息

这个是之前的cur_data 排序索引报错

这个,2.5/3.0/3.1 都修过了。

我们使用的版本是3.0.3

3.0的最新小版本也修了

所以如果我想要解决这两个问题的话,需要升级到那个版本?