These are chat archives for atomix/atomix

21st
Aug 2018
Jordan Halterman
@kuujo
Aug 21 2018 00:39
of*
The trace logs are verbose because bugs require information about every single event in the cluster. When the leader sent a request, which index and entries were sent, how many requests were in flight, which entries were written to the logs, which entries were applied.
It’s better to have all the information you need plus some than not enough information to find and fix the bug. The former guarantees the bug is shown in the logs. The latter does not.
Junbo Ruan
@aruanruan
Aug 21 2018 00:43
07:49:17.682 [raft-server-consensus-partition-1] TRACE i.a.protocols.raft.roles.LeaderRole - RaftServer{consensus-partition-1}{role=LEADER} - Received CommandRequest{session=8053, sequence=1886, operation=PrimitiveOperation{id=DefaultOperationId{id=addVertexes, type=COMMAND}, value=byte[]{length=3403, hash=1013358970}}}
07:49:17.683 [raft-server-consensus-partition-1] TRACE i.a.protocols.raft.roles.LeaderRole - RaftServer{consensus-partition-1}{role=LEADER} - Appended Indexed{index=9950, entry=CommandEntry{term=1, timestamp=2018-08-21 07:49:17,683, session=8053, sequence=1886, operation=PrimitiveOperation{id=DefaultOperationId{id=addVertexes, type=COMMAND}, value=byte[]{length=3403, hash=1013358970}}}}
07:49:17.683 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Sending AppendRequest{term=1, leader=node3, prevLogIndex=9949, prevLogTerm=1, entries=1, commitIndex=9949} to node2
07:49:17.683 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Sending AppendRequest{term=1, leader=node3, prevLogIndex=9949, prevLogTerm=1, entries=1, commitIndex=9949} to node1
07:49:17.684 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Received AppendResponse{status=OK, term=1, succeeded=true, lastLogIndex=9950} from node2
07:49:17.684 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Committed entries up to 9950
07:49:17.684 [raft-server-consensus-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{consensus-partition-1} - apply index: 9950
07:49:17.684 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Received AppendResponse{status=OK, term=1, succeeded=true, lastLogIndex=9950} from node1
07:49:17.684 [raft-server-consensus-partition-1-state] TRACE i.a.p.raft.impl.RaftServiceManager - RaftServer{consensus-partition-1} - Applying Indexed{index=9950, entry=CommandEntry{term=1, timestamp=2018-08-21 07:49:17,683, session=8053, sequence=1886, operation=PrimitiveOperation{id=DefaultOperationId{id=addVertexes, type=COMMAND}, value=byte[]{length=3403, hash=1013358970}}}}
07:49:17.685 [raft-server-consensus-partition-1-state] TRACE i.a.p.s.impl.DefaultServiceExecutor - PrimitiveService{2}{type=AtomicUdoGraphType{name=udo-graph}, name=bigdata} - Executing DefaultCommit{index=9950, session=RaftSession{RaftServiceContext{server=consensus-partition-1, type=AtomicUdoGraphType{name=udo-graph}, name=bigdata, id=2}, session=8053, timestamp=2018-08-21 07:49:17,683}, time=2018-08-21 07:49:17,683, operation=DefaultOperationId{id=addVertexes, type=COMMAND}, value=byte[]{length=3403, hash=1013358970}}
07:49:17.691 [raft-server-consensus-partition-1-state] TRACE i.a.protocols.raft.roles.LeaderRole - RaftServer{consensus-partition-1}{role=LEADER} - Sending CommandResponse{status=OK, index=9950, eventIndex=8053, result=byte[]{length=1397, hash=2010050162}}

07:49:17.696 [raft-server-consensus-partition-1] TRACE i.a.protocols.raft.roles.LeaderRole - RaftServer{consensus-partition-1}{role=LEADER} - Received CommandRequest{session=8053, sequence=1887, operation=PrimitiveOperation{id=DefaultOperationId{id=addVertexes, type=COMMAND}, value=byte[]{length=3403, hash=495306944}}}
07:49:17.696 [raft-server-consensus-partition-1] TRACE i.a.protocols.raft.roles.LeaderRole - RaftServer{consensus-partition-1}{role=LEADER} - Appended Indexed{index=9951, entry=CommandEntry{term=1, timestamp=2018-08-21 07:49:17,696, session=8053, sequence=1887, operation=PrimitiveOperation{id=DefaultOperationId{id=addVertexes, type=COMMAND}, value=byte[]{length=3403, hash=495306944}}}}
07:49:17.696 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Sending AppendRequest{term=1, leader=node3, prevLogIndex=9950, prevLogTerm=1, entries=1, commitIndex=9950} to node2
07:49:17.697 [raft-server-consensus-partition-1]
log index '9952' do not applied
07:49:18.942 [raft-server-consensus-partition-1] DEBUG i.a.storage.journal.SegmentedJournal - Created disk segment: JournalSegment{id=2, version=1, index=9953, size=64}
07:49:18.943 [raft-server-consensus-partition-1] TRACE i.a.protocols.raft.roles.LeaderRole - RaftServer{consensus-partition-1}{role=LEADER} - Appended Indexed{index=9953, entry=KeepAliveEntry{term=1, timestamp=2018-08-21 07:49:18,751, sessionIds=long[]{length=1, hash=8084}, commandSequences=long[]{length=1, hash=1918}, eventIndexes=long[]{length=1, hash=8084}}}
07:49:18.943 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Sending AppendRequest{term=1, leader=node3, prevLogIndex=9952, prevLogTerm=1, entries=1, commitIndex=9952} to node2
07:49:18.944 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Sending AppendRequest{term=1, leader=node3, prevLogIndex=9952, prevLogTerm=1, entries=1, commitIndex=9952} to node1
create new disk segment
17.824 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Sending AppendRequest{term=1, leader=node3, prevLogIndex=9951, prevLogTerm=1, entries=1, commitIndex=9951} to node2
07:49:17.825 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Received AppendResponse{status=OK, term=1, succeeded=true, lastLogIndex=9952} from node2
07:49:17.825 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Committed entries up to 9952
07:49:17.825 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Sending AppendRequest{term=1, leader=node3, prevLogIndex=9952, prevLogTerm=1, entries=0, commitIndex=9952} to node2
07:49:17.825 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Sending AppendRequest{term=1, leader=node3, prevLogIndex=9951, prevLogTerm=1, entries=1, commitIndex=9952} to node1
07:49:17.825 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Received AppendResponse{status=OK, term=1, succeeded=true, lastLogIndex=9952} from node2
07:49:17.826 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Received AppendResponse{status=OK, term=1, succeeded=true, lastLogIndex=9952} from node1
Jordan Halterman
@kuujo
Aug 21 2018 00:47
Can you just put the logs for like the last 15-30 seconds prior to that somewhere? all of them?
Junbo Ruan
@aruanruan
Aug 21 2018 00:47
emm
Jordan Halterman
@kuujo
Aug 21 2018 00:47
It’s too difficult to make sense of the times because I don’t know what’s missing
You can just filter that one partition if you want
Junbo Ruan
@aruanruan
Aug 21 2018 00:49
ok
Jordan Halterman
@kuujo
Aug 21 2018 00:49
consensus-partition-1
Jordan Halterman
@kuujo
Aug 21 2018 00:54
interesting it was the last index of that segment
Junbo Ruan
@aruanruan
Aug 21 2018 00:57
yesterday, i found it was often happened in special testing progress of percentage
Jordan Halterman
@kuujo
Aug 21 2018 00:58
I don’t understand what you mean by that
Junbo Ruan
@aruanruan
Aug 21 2018 00:59
i run test case of inserting 2000 * 500 rows data, when the progress came to 97.xx% , cluster became wrong
it looked like a magic number 97.xx% for me
Jordan Halterman
@kuujo
Aug 21 2018 01:00
It’s probably not a magic number, the test is just relatively deterministic, and something happens based on the configuration at that percentage
probably something relating to the log segments
Jordan Halterman
@kuujo
Aug 21 2018 01:10
still do need the trace logs though
you can email them to me if you want
Junbo Ruan
@aruanruan
Aug 21 2018 01:11
ok, i am copying files...
Jordan Halterman
@kuujo
Aug 21 2018 01:50
sweet
> atomix-test cluster setup -c consensus -n 3 -v 3.0.0-SNAPSHOT
2018-08-20 18:48:58,365 Setting up cluster
2018-08-20 18:48:58,393 Creating network
2018-08-20 18:48:58,565 Running container test-1
2018-08-20 18:48:59,171 Running container test-2
2018-08-20 18:48:59,740 Running container test-3
2018-08-20 18:49:00,396 Waiting for cluster bootstrap
> atomix-test cluster upgrade-node 1 -v 3.0.0
2018-08-20 18:49:31,884 Upgrading node 1 to version '3.0.0'
2018-08-20 18:49:31,887 Stopping container test-1
2018-08-20 18:49:42,601 Removing container test-1
2018-08-20 18:49:42,612 Running container test-1
> atomix-test cluster nodes
 ID     NAME       VERSION            STATUS      PATH                                                              IP             LOCAL PORT     PROFILER PORT
 1      test-1     3.0.0              running     /Users/jordanhalterman/Projects/atomix-test/.data/test/test-1     172.18.0.2     61694
 2      test-2     3.0.0-SNAPSHOT     running     /Users/jordanhalterman/Projects/atomix-test/.data/test/test-2     172.18.0.3     61661
 3      test-3     3.0.0-SNAPSHOT     running     /Users/jordanhalterman/Projects/atomix-test/.data/test/test-3     172.18.0.4     61662
> atomix-test cluster upgrade-node 2 -v 3.0.0
2018-08-20 18:50:34,546 Upgrading node 2 to version '3.0.0'
2018-08-20 18:50:34,550 Stopping container test-2
2018-08-20 18:50:45,119 Removing container test-2
2018-08-20 18:50:45,134 Running container test-2
> atomix-test cluster upgrade-node 3 -v 3.0.0
2018-08-20 18:51:30,695 Upgrading node 3 to version '3.0.0'
2018-08-20 18:51:30,698 Stopping container test-3
2018-08-20 18:51:41,388 Removing container test-3
2018-08-20 18:51:41,401 Running container test-3
> atomix-test cluster nodes
 ID     NAME       VERSION     STATUS      PATH                                                              IP             LOCAL PORT     PROFILER PORT
 1      test-1     3.0.0       running     /Users/jordanhalterman/Projects/atomix-test/.data/test/test-1     172.18.0.2     61694
 2      test-2     3.0.0       running     /Users/jordanhalterman/Projects/atomix-test/.data/test/test-2     172.18.0.3     61724
 3      test-3     3.0.0       running     /Users/jordanhalterman/Projects/atomix-test/.data/test/test-3     172.18.0.4     61752
💥
Jordan Halterman
@kuujo
Aug 21 2018 02:07
🕓
Junbo Ruan
@aruanruan
Aug 21 2018 02:10
@kuujo how filter logs with keyword 'consensus-partition-1' to another file in vim?
Jordan Halterman
@kuujo
Aug 21 2018 02:10
hmm I’m not sure how to do that in vim… I use the console app in OSX to filter logs
or you can just grep consensus-partition-1 atomix.log > filtered.log
something like that
Junbo Ruan
@aruanruan
Aug 21 2018 02:12
great
Jordan Halterman
@kuujo
Aug 21 2018 02:13
hmm I may have just reproduced it
it can’t have been so easy
Jordan Halterman
@kuujo
Aug 21 2018 02:23
yeah I’m pretty sure I reproduced your bug
Jordan Halterman
@kuujo
Aug 21 2018 02:37
YAY
This makes it much much easier to fix
Junbo Ruan
@aruanruan
Aug 21 2018 02:38
really?
i have email you log fragments
You are so great!
Jordan Halterman
@kuujo
Aug 21 2018 02:39
19:19:21.113 [raft-server-1] DEBUG i.a.storage.journal.SegmentedJournal - Created disk segment: JournalSegment{id=2, version=1, index=9458, size=64}
19:19:21.136 [raft-server-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{1} - Cannot apply index 9459
Cannot apply index immediately following log segment roll over
Junbo Ruan
@aruanruan
Aug 21 2018 02:40
yes, it is same to me
Jordan Halterman
@kuujo
Aug 21 2018 02:40
also on the leader
I wonder how that factors in
Junbo Ruan
@aruanruan
Aug 21 2018 02:42
the function apply(long index) in RaftServiceManager does not call
  @SuppressWarnings("unchecked")
  public <T> CompletableFuture<T> apply(long index) {
    logger.info("apply index: {}", index);
    CompletableFuture<T> future = futures.computeIfAbsent(index, i -> new CompletableFuture<T>());
    enqueueBatch(index);
    return future;
  }
i added log info for that
Jordan Halterman
@kuujo
Aug 21 2018 02:43
oh yeah we’re seeing this on the leader because the leader is the one that creates futures
Jordan Halterman
@kuujo
Aug 21 2018 02:51
seems there are actually a couple of related bugs
probably going to take me a while to track them down
I will report back
oooh this one’s a good one
maybe I’ll think out loud actually
I find I tend to do that...

so the weird thing is the leader sends an entry to a follower:

19:49:51.188 [raft-server-3] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{3} - Sending AppendRequest{term=1, leader=3, prevLogIndex=9454, prevLogTerm=1, entries=1, commitIndex=9454} to 1

and then it seems to skip an index for some reason:

19:49:51.192 [raft-server-3] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{3} - Sending AppendRequest{term=1, leader=3, prevLogIndex=9456, prevLogTerm=1, entries=1, commitIndex=9455} to 1
the second request should be prevLogIndex=9455

the leader gets a response from the follower telling it to go back:

19:49:51.212 [raft-server-3] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{3} - Received AppendResponse{status=OK, term=1, succeeded=false, lastLogIndex=9455} from 1
19:49:51.214 [raft-server-3] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{3} - Reset next index for RaftMemberContext{member=1, term=1, configIndex=0, snapshotIndex=0, nextSnapshotIndex=0, nextSnapshotOffset=0, matchIndex=9455, nextIndex=9456, heartbeatTime=1534819791187, appending=0, appendSucceeded=false, appendTime=1534819791192, configuring=false, installing=false, failures=0} to 9456

but the leader doesn’t seem to be able to

19:49:51.215 [raft-server-3] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{3} - Sending AppendRequest{term=1, leader=3, prevLogIndex=9456, prevLogTerm=1, entries=4, commitIndex=9455} to 1
I have a theory about why this might be
Jordan Halterman
@kuujo
Aug 21 2018 03:00
strange it seems to always be three indexes before the end of a segment
Junbo Ruan
@aruanruan
Aug 21 2018 03:06
magic number ? hahaha
suimi
@suimi
Aug 21 2018 03:07
maybe there have some exception not catch, yesterday i still can't start cluster failed at open session
Junbo Ruan
@aruanruan
Aug 21 2018 03:10
my logs does not exists log : Reset next index for RaftMemberContext ...
Jordan Halterman
@kuujo
Aug 21 2018 03:11
@aruanruan different bug
seems there are actually a couple of related bugs
suimi
@suimi
Aug 21 2018 03:12
when open session success will append entries, at JournalSegmentReader.readNext throw the exception throw new IndexOutOfBoundsException("length cannot be greater than remaining bytes in the buffer");, this exception not catch, and no error log
Jordan Halterman
@kuujo
Aug 21 2018 03:13
@suimi stack trace?
suimi
@suimi
Aug 21 2018 03:15
@kuujo trace log?
Jordan Halterman
@kuujo
Aug 21 2018 03:17
no
stack trace
so I can see where the exception is coming from
and what called it
suimi
@suimi
Aug 21 2018 03:29
image.png
Jordan Halterman
@kuujo
Aug 21 2018 03:31
thank you
Jordan Halterman
@kuujo
Aug 21 2018 03:43
yeah I see that exception
That’s the root cause of one of the bugs
suimi
@suimi
Aug 21 2018 03:44
yes, seem it not be catch
Jordan Halterman
@kuujo
Aug 21 2018 04:42
that’s not quite it
sending a PR
@aruanruan got time to try a fix?
Junbo Ruan
@aruanruan
Aug 21 2018 04:42
ok
got it
Jordan Halterman
@kuujo
Aug 21 2018 04:43
this fixes the log reset issue above but not quite sure it fixes both so need to verify
one sec
Junbo Ruan
@aruanruan
Aug 21 2018 04:46
ok
Jordan Halterman
@kuujo
Aug 21 2018 04:46
I haven’t seen the Cannot apply index error again yet, but I may not have tried hard enough
Junbo Ruan
@aruanruan
Aug 21 2018 04:47
i will do the test
Jordan Halterman
@kuujo
Aug 21 2018 04:47
:+1:
@suimi the PR above fixes the exception you’re seeing for sure
yeah I don’t think I can reproduce the Cannot apply index error any more
suimi
@suimi
Aug 21 2018 04:50
@kuujo thanks
Junbo Ruan
@aruanruan
Aug 21 2018 04:59
12:56:57.403 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Received AppendResponse{status=OK, term=1, succeeded=true, lastLogIndex=9967} from node1
12:56:57.403 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Committed entries up to 9967
12:56:57.404 [raft-server-consensus-partition-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{consensus-partition-1} - Cannot apply index 9967
12:56:57.404 [raft-server-consensus-partition-1] TRACE i.a.protocols.raft.roles.LeaderRole - RaftServer{consensus-partition-1}{role=LEADER} - Sending KeepAliveResponse{status=ERROR, error=RaftError{type=PROTOCOL_ERROR, message=Cannot apply index 9967}}
Jordan Halterman
@kuujo
Aug 21 2018 05:02
more logs
Junbo Ruan
@aruanruan
Aug 21 2018 05:03
12:56:56.956 [raft-server-consensus-partition-1] TRACE i.a.protocols.raft.roles.LeaderRole - RaftServer{consensus-partition-1}{role=LEADER} - Received CommandRequest{session=8186, sequence=1769, operation=PrimitiveOperation{id=DefaultOperationId{id=addVertexes, type=COMMAND}, value=byte[]{length=3403, hash=1145448082}}}
12:56:56.957 [raft-server-consensus-partition-1] TRACE i.a.protocols.raft.roles.LeaderRole - RaftServer{consensus-partition-1}{role=LEADER} - Appended Indexed{index=9966, entry=CommandEntry{term=1, timestamp=2018-08-21 12:56:56,957, session=8186, sequence=1769, operation=PrimitiveOperation{id=DefaultOperationId{id=addVertexes, type=COMMAND}, value=byte[]{length=3403, hash=1145448082}}}}
12:56:56.980 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Sending AppendRequest{term=1, leader=node3, prevLogIndex=9965, prevLogTerm=1, entries=1, commitIndex=9965} to node2
12:56:56.980 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Received AppendResponse{status=OK, term=1, succeeded=true, lastLogIndex=9966} from node2
12:56:56.981 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Committed entries up to 9966
12:56:56.981 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Sending AppendRequest{term=1, leader=node3, prevLogIndex=9966, prevLogTerm=1, entries=0, commitIndex=9966} to node2
12:56:56.981 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Sending AppendRequest{term=1, leader=node3, prevLogIndex=9965, prevLogTerm=1, entries=1, commitIndex=9966} to node1
12:56:56.981 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Received AppendResponse{status=OK, term=1, succeeded=true, lastLogIndex=9966} from node2
12:56:56.981 [raft-server-consensus-partition-1] TRACE i.a.p.raft.roles.LeaderAppender - RaftServer{consensus-partition-1} - Received AppendResponse{status=OK, term=1, succeeded=true, lastLogIndex=9966} from node1
Jordan Halterman
@kuujo
Aug 21 2018 05:03
is it still after a new segment?
Junbo Ruan
@aruanruan
Aug 21 2018 05:03
no
prevLogIndex value are different of AppendRequest
Jordan Halterman
@kuujo
Aug 21 2018 05:04
I don’t know what you mean
Junbo Ruan
@aruanruan
Aug 21 2018 05:05
prevLogIndex=9966
prevLogIndex=9965
Jordan Halterman
@kuujo
Aug 21 2018 05:05
doesn’t matter
they’re different nodes
the context for each follower is independent
well now that I can’t reproduce it this is going to become much harder again
Junbo Ruan
@aruanruan
Aug 21 2018 05:09
:(
Jordan Halterman
@kuujo
Aug 21 2018 05:12
all you have to do is set a breakpoint on the error message and check the log reader state. Either getNextIndex() != index or hasNext() is false. If getNextIndex() != index then we have to figure out why it’s not based on the state of the reader, and if hasNext() is false then call hasNext() and follow it in JournalSegmentReader to see why it’s not reading the entry.
Junbo Ruan
@aruanruan
Aug 21 2018 05:13
ok, i will try
Jordan Halterman
@kuujo
Aug 21 2018 05:20
I probably gotta catch up on some sleep soon
Junbo Ruan
@aruanruan
Aug 21 2018 05:21
enn, good sleep
Jordan Halterman
@kuujo
Aug 21 2018 06:16
let me know if you have any luck debugging it
Junbo Ruan
@aruanruan
Aug 21 2018 06:17
just debugging...
reader index: 8770->9773(1024), but writer index: 9774-
writer begin new segment
Junbo Ruan
@aruanruan
Aug 21 2018 06:22
reader next index is 9773
Jordan Halterman
@kuujo
Aug 21 2018 06:27
The reader and writer are supposed to be at different indexes, that’s why they’re separate
Gotta figure out why that condition fails first
The entry is missing or the index is incorrect?
Junbo Ruan
@aruanruan
Aug 21 2018 06:29
hasNext = true, getNextIndex = 9793, index = 9794
Jordan Halterman
@kuujo
Aug 21 2018 06:30
Okay
Hmm
And was 9793 ever applied?
It’s like the reader didn’t advance after applying an entry
Junbo Ruan
@aruanruan
Aug 21 2018 06:34
9792, 9793 both in reader cache
Jordan Halterman
@kuujo
Aug 21 2018 06:36
But was 9793 read and applied?
Junbo Ruan
@aruanruan
Aug 21 2018 06:36
yes, it looks like 9792 come again
14:39:09.226 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9790
14:39:09.226 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9790
14:39:09.226 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9791
14:39:09.232 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9791
14:39:09.232 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9792
14:39:09.233 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9791
14:39:09.237 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9792
14:39:09.237 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9792
14:39:09.237 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9793
14:39:09.315 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9793
14:39:09.316 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9794
14:39:09.317 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9793
14:39:09.317 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9794
Jordan Halterman
@kuujo
Aug 21 2018 06:40
My next step after understanding what’s happening would be to add logging in the log to find out where it’s happening.
Strange
oh
where is that message being logged?
Junbo Ruan
@aruanruan
Aug 21 2018 06:45
@SuppressWarnings("unchecked")
  private void applyIndex(long index) {
    // Apply entries prior to this entry.
    boolean hasNext = reader.hasNext();
    long nextIndex = -1;
    if(hasNext) {
      nextIndex = reader.getNextIndex();
    }
    //if (reader.hasNext() && reader.getNextIndex() == index) {
    if (hasNext && nextIndex == index) {
      logger.info("apply index {}", index);
      // Read the entry from the log. If the entry is non-null then apply it, otherwise
      // simply update the last applied index and return a null result.
      Indexed<RaftLogEntry> entry = reader.next();
      try {
Jordan Halterman
@kuujo
Aug 21 2018 06:45
K
that doesn’t really make sense
9791 two times in a row
applyIndex(long) should be called with a sequentially increasing number
Which is index
so it doesn’t seem like it should be possible to get the above logs
Junbo Ruan
@aruanruan
Aug 21 2018 06:47
i will try again with running mode
Junbo Ruan
@aruanruan
Aug 21 2018 06:53
14:52:13.524 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9789
14:52:13.529 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9789
14:52:13.529 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9789
14:52:13.529 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9790
14:52:13.533 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9790
14:52:13.533 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9790
14:52:13.534 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9791
14:52:13.539 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9791
14:52:13.539 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9791
14:52:13.539 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9792
14:52:15.085 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9792
14:52:15.085 [raft-server-raft-partition-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - Cannot apply index 9794
14:52:15.086 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9793
14:52:15.086 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9794
14:52:15.087 [raft-server-raft-partition-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - Cannot apply index 9795
14:52:15.088 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9795
14:52:15.089 [raft-server-raft-partition-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - Cannot apply index 9796
14:52:15.125 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9792
14:52:15.126 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9793
14:52:15.126 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9794
14:52:15.126 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9796
14:52:15.126 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9795
14:52:15.126 [raft-server-raft-partition-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - Cannot apply index 9797
14:52:15.126 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9796
14:52:15.127 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9797
14:52:15.128 [raft-server-raft-partition-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - Cannot apply index 9798
14:52:15.128 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9797
14:52:15.129 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9798
14:52:15.129 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9798
14:52:15.129 [raft-server-raft-partition-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - Cannot apply index 9799
14:52:15.130 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9799
14:52:15.130 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9799
14:52:15.130 [raft-server-raft-partition-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - Cannot apply index 9800
14:52:15.131 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9800
14:52:15.131 [raft-server-raft-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - apply index 9800
14:52:15.131 [raft-server-raft-partition-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{raft-partition-1} - Cannot apply index 9801
Jordan Halterman
@kuujo
Aug 21 2018 06:56
I just don’t see how that’s possible. This loop executes each index a single time:
https://github.com/atomix/atomix/blob/master/protocols/raft/src/main/java/io/atomix/protocols/raft/impl/RaftServiceManager.java#L334-L336
How are any log messages showing duplicates of the same index?
check it out
going to sleep for a bit
Junbo Ruan
@aruanruan
Aug 21 2018 06:57
ok
Junbo Ruan
@aruanruan
Aug 21 2018 16:08

java.lang.IndexOutOfBoundsException: length cannot be greater than remaining bytes in the buffer
        at io.atomix.storage.buffer.AbstractBuffer.skip(AbstractBuffer.java:305) ~[atomix-storage-3.0.1-SNAPSHOT.jar:na]
        at io.atomix.storage.journal.JournalSegmentReader.readNext(JournalSegmentReader.java:139) ~[atomix-storage-3.0.1-SNAPSHOT.jar:na]
        at io.atomix.storage.journal.JournalSegmentReader.hasNext(JournalSegmentReader.java:104) ~[atomix-storage-3.0.1-SNAPSHOT.jar:na]
        at io.atomix.storage.journal.SegmentedJournalReader.hasNext(SegmentedJournalReader.java:136) ~[atomix-storage-3.0.1-SNAPSHOT.jar:na]
        at io.atomix.storage.journal.DelegatingJournalReader.hasNext(DelegatingJournalReader.java:47) ~[atomix-storage-3.0.1-SNAPSHOT.jar:na]
        at io.atomix.protocols.raft.storage.log.RaftLogReader.hasNext(RaftLogReader.java:71) ~[atomix-raft-3.0.1-SNAPSHOT.jar:na]
        at io.atomix.protocols.raft.impl.RaftServiceManager.applyIndex(RaftServiceManager.java:391) [atomix-raft-3.0.1-SNAPSHOT.jar:na]
        at io.atomix.protocols.raft.impl.RaftServiceManager.applyIndexes(RaftServiceManager.java:364) [atomix-raft-3.0.1-SNAPSHOT.jar:na]
        at io.atomix.protocols.raft.impl.RaftServiceManager.lambda$enqueueIndexes$9(RaftServiceManager.java:356) [atomix-raft-3.0.1-SNAPSHOT.jar:na]
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.8.0_172]
        at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.8.0_172]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) ~[na:1.8.0_172]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[na:1.8.0_172]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:1.8.0_172]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:1.8.0_172]
        at java.lang.Thread.run(Unknown Source) ~[na:1.8.0_172]
make some indexes can not apply
Jordan Halterman
@kuujo
Aug 21 2018 16:11
@aruanruan are you using the PR I sent? That PR fixes that bug... I hope you we’re testing with it
Junbo Ruan
@aruanruan
Aug 21 2018 16:12
 // when role is changed from follow to leader ,   some index is missing 
  @Override
  public synchronized CompletableFuture<RaftRole> start() {
    // Reset state for the leader.
    takeLeadership();

    // Append initial entries to the log, including an initial no-op entry and the server's configuration.
    appendInitialEntries().join();

    // Commit the initial leader entries.
    commitInitialEntries();

    // Register the cluster event listener.
   raft.getMembershipService().addListener(clusterListener);

    return super.start()
        .thenRun(this::startAppendTimer)
        .thenApply(v -> this);
  }
i am testing yet
Jordan Halterman
@kuujo
Aug 21 2018 16:14
I mean the IndexOutOfBoundsException above... did you see it with the #817?
And why did you paste that code?
Junbo Ruan
@aruanruan
Aug 21 2018 16:16
i added too much logs in source codes
#817 is means the log writer, i found reader maybe wrong
sorry, i will pull
Jordan Halterman
@kuujo
Aug 21 2018 16:19
No it’s a fix for the reader
Junbo Ruan
@aruanruan
Aug 21 2018 16:23
my mistake
Junbo Ruan
@aruanruan
Aug 21 2018 16:48
#817 is ok
Junbo Ruan
@aruanruan
Aug 21 2018 16:53
i have inserted 60000 * 100 rows data, it is stable
Junbo Ruan
@aruanruan
Aug 21 2018 17:13
···
partition-2} - Failed to configure node2
01:12:52.704 [raft-server-system-partition-2] WARN  i.a.p.raft.roles.LeaderAppender - RaftServer{system-partition-2} - Failed to configure node2
01:12:52.881 [raft-server-system-partition-2] WARN  i.a.p.raft.roles.LeaderAppender - RaftServer{system-partition-2} - Failed to configure node2
01:12:53.132 [raft-server-system-partition-2] WARN  i.a.p.raft.roles.LeaderAppender - RaftServer{system-partition-2} - Failed to configure node2
some warning
Johno Crawford
@johnou
Aug 21 2018 17:34
Two logging appenders?
Jordan Halterman
@kuujo
Aug 21 2018 17:36
@aruanruan that warning is normal
actually probably shouldn’t be a warning any more
should change that one to DEBUG
Jordan Halterman
@kuujo
Aug 21 2018 17:43
Yay it’s fixed!
@johnou I will clean up and test the version thingy and then do another release later today
Would be nice to figure out how to get that shared test cluster working
Also, I added upgrades to the atomix-test framework. It would be nice to add an upgrade test to CI, but I wouldn’t know how to do that in Travis
Johno Crawford
@johnou
Aug 21 2018 17:54
I'll try and check that out soon, I am in the middle of moving house and my PC is at the new place but we still live at the old place
House is at the end of the same road though
Does the journal writer have any tests
Jordan Halterman
@kuujo
Aug 21 2018 17:57
yeah
Johno Crawford
@johnou
Aug 21 2018 17:58
Do you think that was the heisenbug
Jordan Halterman
@kuujo
Aug 21 2018 17:59
I actually don’t understand why that exception is not thrown in tests
Jordan Halterman
@kuujo
Aug 21 2018 18:19
ah hah now I reproduced it
Jordan Halterman
@kuujo
Aug 21 2018 21:40
k I’m pretty satisfied with the implementation of the build/version numbers now
21:39:38.954 [main] INFO  io.atomix.core.Atomix - 3.0.1-SNAPSHOT (revision 54207a built on 2018-08-21 14:37:30)
Junbo Ruan
@aruanruan
Aug 21 2018 23:54
hi