These are chat archives for atomix/atomix

5th
Mar 2017
Jon Hall
@jhall11
Mar 05 2017 01:51
Cool, I'll start some tests when I get home. Have a good vacation :smile:
Jordan Halterman
@kuujo
Mar 05 2017 05:11
YAY!
df -h
Filesystem      Size   Used  Avail Capacity  iused    ifree %iused  Mounted on
/dev/disk1     112Gi   60Gi   51Gi    54% 15861002 13462260   54%   /
hahaha
Jordan Halterman
@kuujo
Mar 05 2017 05:19
last night I was basically only able to start a cluster and then… run out of disk space :-P
Jon Hall
@jhall11
Mar 05 2017 05:19
haha
I haven’t looked at the logs, but the last patch set looks similar from an ONOS perspective
Jordan Halterman
@kuujo
Mar 05 2017 06:21
odd
can I see the logs?
nodes are not even joining each other?
onos.py is still working well for me
when I started, prior to making that change the cluster only startup up successfully like 1/5 times
then after the change it started working great
Jordan Halterman
@kuujo
Mar 05 2017 06:33
nvm they seem to somehow be able to start and elect a leader, but a lot of AppendRequests are failing
AppendRequest to /192.168.123.1:9876 failed. Reason: io.atomix.catalyst.transport.TransportException: java.util.concurrent.CompletionException: org.onosproject.store.cluster.messaging.MessagingException$NoRemoteHandler: No remote message handler registered for this message
Jordan Halterman
@kuujo
Mar 05 2017 10:37
Actually, those AppendRequest failures were a part of normal operation during startup
Jordan Halterman
@kuujo
Mar 05 2017 10:46

So, @jhall11 I don't really know what else I can do to debug this right now. It's working fine when I run it with onos.py.

But for good measure, I also just rewrote the entire Copycat transport myself:
https://github.com/kuujo/onos/tree/copycat-connection-management-2?files=1

I know that branch has a correct implementation of Copycat's transport. And again, it seems to be running well on a three node cluster with onos.py. I logged into the ONOS shell and enabled DEBUG logging for Copycat and watched the command/query request/response logs fly by. I also compared the logs to master for good measure and the behavior was essentially the same. That said, it doesn’t seem like the bug I fixed last night should have even been causing the dismall communication issues that are showing up in your logs. Obvious,y I’m still new at this and figuring everything out. So what am I missing?

I’ll be on the road/doing fun things tomorrow! But I’ll be available off and on all week.
Jordan Halterman
@kuujo
Mar 05 2017 10:54
I'll implement a very minimal version tomorrow night - passing only a CloseConnection message via the existing interface - if all else fails. If that doesn't work then there's something else going on.
Richard Pijnenburg
@electrical
Mar 05 2017 11:25
@kuujo Yeah, i saw @jhalterman joining ES. I used to work there ;-) My current job is okay but I'll be moving soon to an other company called Avanti here in London as a Devops Teamlead :-)
Jordan Halterman
@kuujo
Mar 05 2017 19:28
Nice!
Richard Pijnenburg
@electrical
Mar 05 2017 21:53
Btw @kuujo did you find a good solution for the message bus issue we had ages ago? speed wise it wasn't great.