These are chat archives for atomix/atomix
RegisterEntryto poke around at.
index=60357I see it stored on three servers in partition 1 but only see it applied on two followers and not the leader. Curious.
RegisterRequestis failed before the entry is applied
commitIndexis certainly increasing and entries are being committed
status=ERRORthat are being sent back to the client are always sent from followers that proxied the request to the leader. They’re just failed
RegisterRequests because of a timeout during the hop from the follower to the leader. That’s just a symptom of the cause. The real issue is that the leader is never applying the
RegisterEntryeven though it’s successfully committed. So, that’s what needs to be tracked down.
@jhall11 that's as far as I could get without reproducing it myself which I tried by to no avail. The logs are pretty clear that the problem is
RegisterEntry is committed but never applied. But why it's not applied is not obvious. The
commitIndex is being properly set by the leader, meaning replication is working well and followers are committing the entry. But for some reason the leader's
whenComplete callback is never called, so it never applies the entry and never responds to the client.
It doesn't seem like
false which would cause the leader to skip the entry. The leader is still operating normally. There's no obvious reason the
appendEntries(long) future would not be completed. There's no obvious reason
context.getStateMachine().apply(entry) would not apply the entry. I'm totally stumped right now. But maybe a nap will help.
➜ atomix-jepsen git:(master) ✗ docker rm jepsen Error response from daemon: Bad response from Docker engine ➜ atomix-jepsen git:(master) ✗ docker ps Error response from daemon: Bad response from Docker engine ➜ atomix-jepsen git:(master) ✗ docker ps -a Error response from daemon: Bad response from Docker engine