These are chat archives for atomix/atomix

7th
Jun 2018
Johno Crawford
@johnou
Jun 07 2018 11:14
I can get to the reviews later this afternoon, firefighting and feature dev right now
Jordan Halterman
@kuujo
Jun 07 2018 19:05
last couple are on the way
Jordan Halterman
@kuujo
Jun 07 2018 19:29
okay… I think that’s all of them
We’ll see what I forgot once they’re all merged :-P
Johno Crawford
@johnou
Jun 07 2018 22:00
did the ONOS test suite pass?
Jordan Halterman
@kuujo
Jun 07 2018 22:01
haven’t tried it yet… going to do some cleanup and the version change and then I’ll try it out today
been traveling
Johno Crawford
@johnou
Jun 07 2018 22:03
i might see a bug with leader elector
either that or it's dead code from the refactor
two bugs
io.atomix.core.election.impl.DefaultLeaderElectorService#onSessionEnd
Jordan Halterman
@kuujo
Jun 07 2018 22:04
yeah that’s a bug
Johno Crawford
@johnou
Jun 07 2018 22:04
listeners.remove(session.sessionId().id()); should be listeners.remove(session.sessionId());
and List<LeadershipEvent<byte[]>> changes = Lists.newArrayList();
unused list
here's another io.atomix.core.queue.impl.DefaultWorkQueueService#evictWorker
removing long from session id set
haha
Johno Crawford
@johnou
Jun 07 2018 22:10
yeah i'm making one
just need to compare it with the old impl
see if something is wrong
Jordan Halterman
@kuujo
Jun 07 2018 22:10
should probably look at all the inspections
Johno Crawford
@johnou
Jun 07 2018 22:10
yeah i ran findbugs
atomix/atomix#596
Jordan Halterman
@kuujo
Jun 07 2018 22:16
just have some naming problems to clean up
Johno Crawford
@johnou
Jun 07 2018 22:27
running tests locally, something fucky with the backup replicating
Jordan Halterman
@kuujo
Jun 07 2018 22:29
might be expected… depends on which test
Johno Crawford
@johnou
Jun 07 2018 22:29
so if SynchronousReplicator gets the backup memberids from PrimaryBackupServiceContext that means it never received a new term
well I mean it's still going
Jordan Halterman
@kuujo
Jun 07 2018 22:29
man I hate that Java 8 can’t differentiate between a Consumer and Function in overloaded methods
Johno Crawford
@johnou
Jun 07 2018 22:29
maybe for 10 minutes now
oh and of course the fucking thread dump shows maven surefire plugin forking..
Stack Trace
java.util.concurrent.LinkedBlockingQueue.take line: 442
io.atomix.core.AtomixTest$TestClusterMembershipEventListener.event line: 313
io.atomix.core.AtomixTest.testScaleDownPersistent line: 200
Jordan Halterman
@kuujo
Jun 07 2018 22:34
hmm missing cluster event
Johno Crawford
@johnou
Jun 07 2018 22:35
the fact it's continually failing replicating is surely a problem too?
logging the same stuff over and over
Jordan Halterman
@kuujo
Jun 07 2018 22:35
not really… if there was no cluster event then it probably doesn’t realize the node that was shut down was removed
so it’s trying to replicate to it
could use some exponential backoff but shouldn’t give up
Johno Crawford
@johnou
Jun 07 2018 22:36
shouldn't phi kick in
Jordan Halterman
@kuujo
Jun 07 2018 22:36
Yeah...
some race condition somewhere
Johno Crawford
@johnou
Jun 07 2018 22:37
actually maybe it was phi :P
still haven't merged your relaxed impl
Jordan Halterman
@kuujo
Jun 07 2018 22:37
it has a maximum timeout
wait that only kicks in if no heartbeats have been successful
Johno Crawford
@johnou
Jun 07 2018 22:38
so maybe it's stuck shutting down?
and it's still responding to heart beats
Jordan Halterman
@kuujo
Jun 07 2018 22:38
yeah that actually does seem to be something I see happen every now and then
never been able to catch it in a position to debug it, but I’m pretty certain there’s a bug somewhere in shutting down nodes
Johno Crawford
@johnou
Jun 07 2018 22:47
i wasn't seeing any heartbeat messages though
so maybe the membership service is shutting down first before other services
looking at the shutdown code one thing that kind of screams at me is that we don't force communicationService stop to be run from the thread context
Johno Crawford
@johnou
Jun 07 2018 22:53
but all that does is change an atomic boolean