These are chat archives for atomix/atomix

6th
Mar 2017
Jordan Halterman
@kuujo
Mar 06 2017 02:42
Nope. The only messaging that exists is persistent messaging through DistributedGroup. That will never be fast without partitioning, and even with partitioning still makes little sense. What little messaging is still in Atomix will remain, but will never be intended for general communication. It will always be strongly consistent and therefore relatively slow and intended only for periodic coordination. There are no plans to change that any time in the near future. The focus will instead be only on improving strong consistency (partitioning), performance, and APIs (HTTP).
Jon Hall
@jhall11
Mar 06 2017 07:32
@kuujo, I’ll check in the morning, but it’s looking like the master is failing, some tests, so the issues I am seeing might be un related
but enjoy your vacation, we can look at it more when you get back
Jordan Halterman
@kuujo
Mar 06 2017 07:49

@jhall11 thanks! Let me know what you find!

BTW I actually implemented that really minimal version too: https://github.com/kuujo/onos/tree/copycat-transport-connection-close?files=1

All of them are really working well for me, though I can only test up to 3 nodes before I start to run out of memory :-P The one in Gerrit is still the best/cleanest though.

I'm done hacking codes for the week. But I'll still be around a little bit working on some other things.

Jon Hall
@jhall11
Mar 06 2017 07:52
cool, I’ll check that out
Jon Hall
@jhall11
Mar 06 2017 20:11
so i also cannot reproduce my issues when I run using ones.py. My guess is it has to do with the delayed startup of node and a cluster size of 7. There is one partition that will have node 1,6,7 as members, so on node 1, that partition won’t start until node 6 comes up. I might try to look at the logs this week, but I think I’m going to be looking at some other stuff this week
Jordan Halterman
@kuujo
Mar 06 2017 20:12
Makes sense
That's actually why I was trying to start a larger cluster but my wife's laptop can't ha doe it :-P
Jon Hall
@jhall11
Mar 06 2017 20:13
It might just be that now we are slightly less tolerant of these delays, so what worked before doesn't
Jordan Halterman
@kuujo
Mar 06 2017 20:13
Hmm
Jon Hall
@jhall11
Mar 06 2017 20:14
yeah, I haven’t been able to get more than 4 nodes running in mininet
Jordan Halterman
@kuujo
Mar 06 2017 20:15
That might make sense. More proactively cleaning up resources probably just makes it more difficult to coordinate that many nodes at startup. The current implementation sort of lazily creates connections and doesn't really close them, so that would certainly allow for a lot more flexibility.
Jordan Halterman
@kuujo
Mar 06 2017 20:22
There may just be some changes that can be made to how the nodes are started to handle that. I haven't looked at that part of the code, but it depends on whether some code is waiting for partition n to start before starting partition n + 1. If they're all started concurrently there shouldn't be a problem, but a single Copycat server can't start until it can talk to a majority of the cluster. That's necessary to prevent split brain at startup in case the node that's being started thinks the cluster is configured differently than it actually is. A node can be started and will just continually try to join the cluster until it's successful, at which time startup is complete. But if we have to start nodes sequentially, it's also possible to just bootstrap one node for each partition and add the remaining nodes to the bootstrapped node.
Richard Pijnenburg
@electrical
Mar 06 2017 23:46
@kuujo been looking into what to use for the message transport. perhaps this could work? https://qpid.apache.org/components/java-broker/index.html
Richard Pijnenburg
@electrical
Mar 06 2017 23:53
Instead of making a custom memory + persistence storage and transport, might as well use something existing.
An other thing i could do is use rabbitmq it self and make it part of the whole system and have atomix on it as an agent to configure it