These are chat archives for atomix/atomix

20th
Aug 2018
suimi
@suimi
Aug 20 2018 06:54
there have a java.util.ConcurrentModificationException when restart the node
```
2018-08-20 14:51:54,885 [atomix-partition-group-membership-service-0] WARN  i.a.s.j.SegmentedJournal - Found misaligned segment JournalSegment{id=2, version=1, index=21, size=940} 
java.util.concurrent.CompletionException: java.util.ConcurrentModificationException
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
    at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:991)
    at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
    at io.atomix.primitive.partition.impl.DefaultPartitionService.lambda$start$2(DefaultPartitionService.java:150)
    at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
    at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
    at io.atomix.primitive.partition.impl.DefaultPartitionGroupMembershipService.lambda$bootstrap$7(DefaultPartitionGroupMembershipService.java:199)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
    at io.atomix.utils.concurrent.AtomixFuture.lambda$wrap$0(AtomixFuture.java:46)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
    at io.atomix.primitive.partition.impl.DefaultPartitionGroupMembershipService.lambda$bootstrap$9(DefaultPartitionGroupMembershipService.java:232)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
    at java.util.concurrent.FutureTask.run(FutureTask.java)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.ConcurrentModificationException: null
    at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1211)
    at java.util.TreeMap$EntryIterator.next(TreeMap.java:1247)
    at java.util.TreeMap$EntryIterator.next(TreeMap.java:1242)
    at io.atomix.storage.journal.SegmentedJournal.loadSegments(SegmentedJournal.java:555)
    at io.atomix.storage.journal.SegmentedJournal.open(SegmentedJournal.java:184)
    at io.atomix.storage.journal.SegmentedJournal.<init>(SegmentedJournal.java:98)
    at io.atomix.storage.journal.SegmentedJournal$Builder.build(SegmentedJournal.java:851)
    at io.atomix.protocols.raft.storage.log.RaftLog$Builder.build(RaftLog.java:312)
    at io.atomix.protocols.raft.storage.RaftStorage.openLog(RaftStorage.java:333)
    at io.atomix.protocols.raft.impl.RaftContext.<init>(RaftContext.java:161)
    at io.atomix.protocols.raft.impl.DefaultRaftServer$Builder.build(DefaultRaftServer.java:246) 
    at io.atomix.protocols.raft.impl.DefaultRaftServer$Builder.build(DefaultRaftServer.java:222)
    at io.atomix.protocols.raft.partition.impl.RaftPartitionServer.buildServer(RaftPartitionServer.java:166)
Junbo Ruan
@aruanruan
Aug 20 2018 10:41
‘Cannot apply index ' often happens yet
Junbo Ruan
@aruanruan
Aug 20 2018 11:59
  /**
   * Applies the entry at the given index to the state machine.
   * <p>
   * Calls to this method are assumed to expect a result. This means linearizable session events triggered by the
   * application of the command at the given index will be awaited before completing the returned future.
   *
   * @param index The index to apply.
   * @return A completable future to be completed once the commit has been applied.
   */
  @SuppressWarnings("unchecked")
  public <T> CompletableFuture<T> apply(long index) {
    CompletableFuture<T> future = futures.computeIfAbsent(index, i -> new CompletableFuture<T>());
    enqueueBatch(index);
    return future;
  }

  /**
   * Applies all entries up to the given index.
   *
   * @param index the index up to which to apply entries
   */
  private void enqueueBatch(long index) {
    while (lastEnqueued < index) {
      enqueueIndex(++lastEnqueued);
    }
  }
may lost entries?
Junbo Ruan
@aruanruan
Aug 20 2018 12:33
20:31:51.400 [raft-server-consensus-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{consensus-partition-1} - apply index: 9942
20:31:51.412 [raft-server-consensus-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{consensus-partition-1} - apply index: 9943
20:31:51.424 [raft-server-consensus-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{consensus-partition-1} - apply index: 9944
20:31:52.274 [raft-server-consensus-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{consensus-partition-1} - apply index: 9946
20:31:52.275 [raft-server-consensus-partition-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{consensus-partition-1} - Cannot apply index 9946
20:31:52.304 [raft-server-consensus-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{consensus-partition-1} - apply index: 9947
20:31:52.304 [raft-server-consensus-partition-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{consensus-partition-1} - Cannot apply index 9947
20:31:52.307 [raft-server-consensus-partition-1] INFO  i.a.p.raft.impl.RaftServiceManager - RaftServer{consensus-partition-1} - apply index: 9948
20:31:52.307 [raft-server-consensus-partition-1] ERROR i.a.p.raft.impl.RaftServiceManager - RaftServer{consensus-partition-1} - Cannot apply index 9948
index 9945 lost
the node is LEADER
Jordan Halterman
@kuujo
Aug 20 2018 18:47
@aruanruan still need the trace logs
Jordan Halterman
@kuujo
Aug 20 2018 19:03
also @aruanruan which commit are you on?
Johno Crawford
@johnou
Aug 20 2018 19:03
@kuujo can we add a filtered resource
that puts the maven release in a properties file, and prints that at bootstrap
just finished moving my workstation to my new house
let me try and whip something up quick
Johno Crawford
@johnou
Aug 20 2018 19:10
@kuujo also wrt junit extension
this is how I did it in our project
<plugin>
                <artifactId>maven-failsafe-plugin</artifactId>
                <configuration>
                    <properties>
                        <property>
                            <name>listener</name>
                            <value>com.sulake.common.util.ServicesRunListener</value>
                        </property>
                    </properties>
                </configuration>
            </plugin>
which extends RunListener
and overrides testRunStarted and testRunFinished
looking at the docs though
what you have should work
Jordan Halterman
@kuujo
Aug 20 2018 19:17
Yes actually we need to attach the version number to the nodes anyways
Johno Crawford
@johnou
Aug 20 2018 19:18
something like this
Jordan Halterman
@kuujo
Aug 20 2018 19:20
Sure
Jordan Halterman
@kuujo
Aug 20 2018 20:07
@aruanruan I don’t even know where the apply index message is so it’s hard to tell why that index would be missing. It’s just a simple loop and counter, so there’s really no reason any index should not be applied.
Johno Crawford
@johnou
Aug 20 2018 20:11
i mean if it's happening rarely
could it be that sometimes a node is falling behind
Jordan Halterman
@kuujo
Aug 20 2018 20:11

What probably needs to be debugged is the log reader. It’s seemingly being told there’s an index in the log which it’s unable to read. We need the trace logs to see what happened when that entry was applied, then need to determine the state of the reader when it attempts to read the entry.

I asked about the commit because there was a threading bug that I saw product the Cannot apply index error and that bug is fixed in rc12

It may be possible for an entry to be missing after a snapshot is taken
Which is why we need the trace logs!
Johno Crawford
@johnou
Aug 20 2018 20:13
yeah that's what I meant
Jordan Halterman
@kuujo
Aug 20 2018 20:13
It’s pointless to try to make sense of the errors with a minimal amount of information
Johno Crawford
@johnou
Aug 20 2018 20:13
node falling behind, then snapshot is missing an index for some reason
at least that's my understanding of it
Jordan Halterman
@kuujo
Aug 20 2018 20:14
Actually, snapshots are always taken after entries are applied
Johno Crawford
@johnou
Aug 20 2018 20:14
if a node falls behind, instead of sending all the indexes it's missing it sends a snapshot of say, 100-1000, right?
Jordan Halterman
@kuujo
Aug 20 2018 20:15
@aruanruan I’m going to have to give up until I get the TRACE logs from all nodes. You should be able to filter just the Atomix Raft logs, and there’s no reason they should have any sensitive information in them since only byte lengths are recorded.
Also make sure you have the latest codes with the threading bug fix
512d90856eabb27ca2c55190d66a6f48af7fdd60
Jordan Halterman
@kuujo
Aug 20 2018 20:25
@johnou I need access to just the version number too
${project.version}
Johno Crawford
@johnou
Aug 20 2018 20:25
ah yes
just put
projectVersion=${project.version}
pushed
Jordan Halterman
@kuujo
Aug 20 2018 20:32
@johnou Atomix attempts to load atomix.properties as the configuration file, so this crashes the agent when I run it
Johno Crawford
@johnou
Aug 20 2018 20:33
hahaha
rename it version.properties or something
or do you have a better name?
Jordan Halterman
@kuujo
Aug 20 2018 20:33
no that’s fine
hmm
actually this may not work so well in OSGi
I’ve never had much luck with including resources on the classpath
what we do in ONOS is hard code the version number in a Java class
Johno Crawford
@johnou
Aug 20 2018 20:35
why not?
Jordan Halterman
@kuujo
Aug 20 2018 20:35
maybe will do that instead
Johno Crawford
@johnou
Aug 20 2018 20:35
it's in the same module / jar
Jordan Halterman
@kuujo
Aug 20 2018 20:39
yep
actually it’s not
the cluster module is reading a resource from the core module
will have to try to test it when I get a chance
Johno Crawford
@johnou
Aug 20 2018 20:47
oh ffs
i thought it was the same jar
yeah that's probably going to fail with OSGi
i was on Atomix, then followed the super
forgot that class was in cluster
Jordan Halterman
@kuujo
Aug 20 2018 20:47
I think it’s fine to only log it in Atomix
Jordan Halterman
@kuujo
Aug 20 2018 21:19
I have to do the 3.0.0 release after #816 so our QA can test the ONOS release candidate. All the other stuff can be merged after that
Following that release we’ll be testing upgrades with atomix/atomix-test
Junbo Ruan
@aruanruan
Aug 20 2018 23:42
@kuujo ok, i will add trace log, by the way, the trace logs are too large...
What i did was runing my test case more times
Jordan Halterman
@kuujo
Aug 20 2018 23:52
There’s no such thing as too much information when it comes to the logs. I was not exaggerating when I said I typically find bugs in hundreds of megabytes or trace logs