Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 10000ms.

  • 环境:3节点,虚拟机

  • 问题:master节点经常挂掉;

  • 报错日志:

2022-05-07 22:07:24,134 ERROR (heartbeat mgr|37) [BDBJEJournal.write():163] catch an exception when writing to database. sleep and retry. journal id 691630
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -1040263  VLSN: 1,732,561, initiated at: 22:06:25.  Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 10000ms. FeederState=10.74.134.172_9010_1643011953296(2)[MASTER]
Current feeds:
 10.74.134.171_9010_1643011922297: feederVLSN=1,732,562 replicaTxnEndVLSN=1,732,555

        at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[je-7.3.7.jar:7.3.7]
        at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) ~[je-7.3.7.jar:7.3.7]
        at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) ~[je-7.3.7.jar:7.3.7]
        at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) ~[je-7.3.7.jar:7.3.7]
        at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:226) ~[je-7.3.7.jar:7.3.7]
        at com.sleepycat.je.txn.Txn.commit(Txn.java:772) ~[je-7.3.7.jar:7.3.7]
        at com.sleepycat.je.txn.Txn.commit(Txn.java:625) ~[je-7.3.7.jar:7.3.7]
        at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:1803) ~[je-7.3.7.jar:7.3.7]
        at com.sleepycat.je.Database.put(Database.java:1506) ~[je-7.3.7.jar:7.3.7]
        at com.sleepycat.je.Database.put(Database.java:1556) ~[je-7.3.7.jar:7.3.7]
        at com.starrocks.journal.bdbje.CloseSafeDatabase.put(CloseSafeDatabase.java:28) ~[starrocks-fe.jar:?]
        at com.starrocks.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:156) [starrocks-fe.jar:?]
        at com.starrocks.persist.EditLog.logEdit(EditLog.java:855) [starrocks-fe.jar:?]
        at com.starrocks.persist.EditLog.logHeartbeat(EditLog.java:1251) [starrocks-fe.jar:?]
        at com.starrocks.system.HeartbeatMgr.runAfterCatalogReady(HeartbeatMgr.java:163) [starrocks-fe.jar:?]
        at com.starrocks.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:61) [starrocks-fe.jar:?]
        at com.starrocks.common.util.Daemon.run(Daemon.java:119) [starrocks-fe.jar:?]
2022-05-07 22:07:29,135 ERROR (heartbeat mgr|37) [BDBJEJournal.write():186] write bdb failed. will exit. journalId: 691630, bdb database Name: 655418
2022-05-07 22:07:31,136 WARN (MASTER 10.74.134.172_9010_1643011953296(2)|63) [Catalog.notifyNewFETypeTransfer():2331] notify new FE type transfer: UNKNOWN
2022-05-07 22:07:31,137 ERROR (stateListener|76) [Catalog$4.runOneCycle():2431] transfer FE type from MASTER to UNKNOWN. exit
2022-05-07 22:07:31,138 WARN (UNKNOWN 10.74.134.172_9010_1643011953296(2)|63) [BDBStateChangeListener.stateChange():61] this node is DETACHED

这个是哪个版本?fe节点部署了3台?

版本:StarRocks-2.0.1
fe:3台

想请问下这个报错是因为什么原因引起的?
我怎么去排查?

请问最后解决了吗?解决方案是什么呢

同样也遇到了这样的问题

这个的原因一般是当时3个fe的压力比较大,内存/磁盘IO,等等。