These are chat archives for atomix/atomix

22nd
Aug 2018
Jordan Halterman
@kuujo
Aug 22 2018 00:05
yo
Junbo Ruan
@aruanruan
Aug 22 2018 00:06
@kuujo the code is running in the thread pool, it was catched by worker thread
java.util.concurrent.ThreadPoolExecutor.runWorker
  * 4. Assuming beforeExecute completes normally, we run the task,
     * gathering any of its thrown exceptions to send to afterExecute.
     * We separately handle RuntimeException, Error (both of which the
     * specs guarantee that we trap) and arbitrary Throwables.
     * Because we cannot rethrow Throwables within Runnable.run, we
     * wrap them within Errors on the way out (to the thread's
     * UncaughtExceptionHandler).  Any thrown exception also
     * conservatively causes thread to die.
     *
we found when bug happened, the LEAD will changed
Junbo Ruan
@aruanruan
Aug 22 2018 00:12
because the thread die, right?
Jordan Halterman
@kuujo
Aug 22 2018 00:19
yeah seems that way
I was actually looking at why the exception was not being caught
I realized that was never implemented for the single thread context implementation
Jordan Halterman
@kuujo
Aug 22 2018 01:15
Although I did just see the exception logged in ONOS. Not sure why the unpredictable behavior
Actually, I don’t think the leader change is caused by the thread dying. The thread lives on AFAICT. It’s from the leader being unable to send valid AppendRequests to followers because it’s unable to read its log.
Raft is a consensus algorithm, and the Atomix Raft implementation has been in use for years. At this point, there’s basically never a simple explanation for bugs, which is why I’ve spent the last year looking through massive collections of logs.
Jordan Halterman
@kuujo
Aug 22 2018 01:21
Of course, that’s exactly what you’d want Raft to do in that scenario: elect a new leader and attempt to continue to make progress.
Johno Crawford
@johnou
Aug 22 2018 07:08
[ERROR] PrimaryBackupAtomicValueTest>AtomicValueTest.testValue:38 » NoClassDefFound io...
[ERROR] RaftAtomicValueTest.testDelete:46 » NoClassDefFound io/atomix/core/value/impl/...
[ERROR] RaftAtomicValueTest>AtomicValueTest.testEvents:53 » NoClassDefFound io/atomix/...
[ERROR] RaftAtomicValueTest>AtomicValueTest.testValue:38 » NoClassDefFound io/atomix/c...
seeing this with the testsuite
oh but wait
that means that class init failed
so maybe there's an error further up
Junbo Ruan
@aruanruan
Aug 22 2018 07:20
need rebuild?
Johno Crawford
@johnou
Aug 22 2018 07:22
@kuujo
lol
i found the bug
oops
or not
@return the previous value
*/
public final int getAndIncrement() {
Johno Crawford
@johnou
Aug 22 2018 07:46
[ERROR] testAnoint(io.atomix.core.election.PrimaryBackupLeaderElectionTest) Time elapsed: 4.459 s <<< ERROR!
java.lang.IllegalStateException: failed to create a child event loop
Caused by: io.netty.channel.ChannelException: failed to open a new selector
Caused by: java.io.IOException: Too many open files
09:44:23.109 [netty-messaging-event-epoll-server-0] WARN i.n.channel.DefaultChannelPipeline - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
at io.netty.channel.unix.Errors.newIOException(Errors.java:122) ~[netty-transport-native-unix-common-4.1.27.Final.jar:4.1.27.Final]
at io.netty.channel.unix.Socket.accept(Socket.java:316) ~[netty-transport-native-unix-common-4.1.27.Final.jar:4.1.27.Final]
Johno Crawford
@johnou
Aug 22 2018 07:56
java.lang.NoClassDefFoundError: io/atomix/core/election/AsyncLeaderElector
Caused by: java.lang.ClassNotFoundException: io.atomix.core.election.AsyncLeaderElector
Caused by: java.io.FileNotFoundException: /home/johno/Workspaces/atomix/core/target/classes/io/atomix/core/election/AsyncLeaderElector.class (Too many open files)
think my problem is different
Johno Crawford
@johnou
Aug 22 2018 09:09
problem with the latest fedora, i'll tweak nofile and see if that helps
Jordan Halterman
@kuujo
Aug 22 2018 16:56
Hmm
Not sure what would be creating files or threads when it’s reusing the cluster
Johno Crawford
@johnou
Aug 22 2018 20:57
@kuujo io.atomix.core.PrimitiveResource#startup
shouldn't we take into account an offset
or the ports will conflict with the second instance?
there's only base port
then id
which is hardcoded as 1, 2, 3
Johno Crawford
@johnou
Aug 22 2018 21:58
ah we only create a single atomix cluster with three nodes, it's the clients that are created if none are available