These are chat archives for atomix/atomix

23rd
Jun 2016
Madan Jampani
@madjam
Jun 23 2016 07:06
@kuujo please take a look at atomix/copycat#235 when you get a chance.
Ran into this issue during rc verification.
Jordan Halterman
@kuujo
Jun 23 2016 07:07
makes sense
Madan Jampani
@madjam
Jun 23 2016 07:10
cool. took a while to track this down. but that test works so much better with this change.
Jordan Halterman
@kuujo
Jun 23 2016 07:12
Don’t see any issue with it… then again it’s late and I’ve been working on a totally different database too much :-P
Madan Jampani
@madjam
Jun 23 2016 07:13
which one is that?
Jordan Halterman
@kuujo
Jun 23 2016 07:14
just an internal one for work
that project has consumed my nights lately but it is using Atomix and should finally slow down over the next couple weeks so I can get back to focusing on that more
Jordan Halterman
@kuujo
Jun 23 2016 07:20
always manage to find myself way too many things to do
Madan Jampani
@madjam
Jun 23 2016 07:21
don’t be good at them. its quite simple :)
Jordan Halterman
@kuujo
Jun 23 2016 07:21
lol
true true
Madan Jampani
@madjam
Jun 23 2016 07:24
working frantically to get a release of ONOS out this week. This copycat PR was the last big issue that eluded me for the last couple of days.
Jordan Halterman
@kuujo
Jun 23 2016 07:25
Sorry bout that :-P Seriously though, the effort is awesome. Fixed a bunch of things I’ve seen but don’t currently have the time to fight with. Thanks!
Madan Jampani
@madjam
Jun 23 2016 07:28
No worries! No gain without pain :) And as far I’m concerned its a lot less pain than writing and maintaining something similar. The more people use it the better is gets and faster.
Jordan Halterman
@kuujo
Jun 23 2016 07:29
Totally… it will only keep getting more stable now especially because I don’t really anticipate any even moderate changes in the foreseeable future. Just let it mature
Madan Jampani
@madjam
Jun 23 2016 07:33
Yep. It is pretty solid for a v1. Thing with any software and more so with distributed system software as you know, you can never be bug free. The only way to root them out is by using it and reporting/fixing bugs.
Jordan Halterman
@kuujo
Jun 23 2016 07:33
indeed
I think it’s important now to focus on those things to help promote adoption. There will always be bugs sure, but stability can promote adoption which in turn promotes stability. So, that’s why I don’t intend to make any significant changes any time soon. Atomix needs some more work but Copycat only needs bug fixes and lots of performance improvements now.
Madan Jampani
@madjam
Jun 23 2016 07:41
True. A majority of my time has been spent looking into Copycat and I see that as the core of the framework. Definitely helps to keep that stable. But adoption will come via Atomix as that is where the abstractions are.
Jordan Halterman
@kuujo
Jun 23 2016 07:46
Totally… and that will continue to receive a lot of love. I’m less apprehensive about making changes in Atomix and less concerned that they’ll impact the stability of the system as significantly since most of its stability is dependent on Copycat really. As long as changes in Atomix are well tested it feels like there’s less of a risk of them impacting guarantees or causing the types of issues like the one you just fixed since the complexity of communication and the algorithms related to it are all behind abstractions in Copycat or in the core Atomix classes.
Gotta go get some sleep! Thanks again!
Jordan Halterman
@kuujo
Jun 23 2016 07:52
btw I’m fine with more releases… I don’t particularly care if it’s just a minor change if there’s reason to release it
Madan Jampani
@madjam
Jun 23 2016 07:53
Cool. I was going to ask about releasing with this copycat fix. That way I can get a new release of ONOS out. We can do that tomorrow morning.
Jordan Halterman
@kuujo
Jun 23 2016 07:54
k
Madan Jampani
@madjam
Jun 23 2016 07:54
All right we can talk tomorrow then. Have a good night.
James Watson
@JPWatson
Jun 23 2016 13:17
in copycat, where should you put code that causes side effects? like adding rows to a database table
Jordan Halterman
@kuujo
Jun 23 2016 17:24
So, this is sort of the idea of a persistent state machine. There’s nothing wrong with the state machine writing something to disk or a database, but it has to be done carefully. Essentially, writes just have to be idempotent because state machine commands can be replayed any time the server restarts. So, typically how you do that is by storing the index of the command associated with a write and checking that the current stored index is less than the write index any time you’re writing to the database.
But there are actually some complications with doing this when it comes to cluster membership changes. When a new node is added, the new node needs the full state of the state machine, which in the case of a persistent state machine means the underlying database. Copycat needs some new mechanisms to handle this more gracefully, but it can be done by storing the state in the snapshot as well, in which case Copycat will handle replicating the snapthot to a new member.
The easier way to do this is just to use Atomix to coordinate writes on a client rather than trying to implement a persistent state machine because of the added complexity of managing a persistent state machine.
@madjam pushing it now
Madan Jampani
@madjam
Jun 23 2016 18:38
thanks @kuujo! Do you have plans to push atomix as well? That is the top level dependency I’m using.
Jordan Halterman
@kuujo
Jun 23 2016 18:39
Yep
Have to give a talk real quick then I'll do Atomix
Multitasking :-)
Madan Jampani
@madjam
Jun 23 2016 18:40
sounds good! :)
Jordan Halterman
@kuujo
Jun 23 2016 21:53
@madjam will be done in just a sec
running the final tests
Madan Jampani
@madjam
Jun 23 2016 21:56
awesome. thanks @kuujo
Jordan Halterman
@kuujo
Jun 23 2016 22:04
k all done
James Watson
@JPWatson
Jun 23 2016 22:42
thanks @kuujo, do you see any way to stop all nodes from writing the same messages to the database during normal operation? for instance making it so that only the leader writes to the db. i plan on managing the persistent state machine by sending the relevant state from the underlying db using a command when the cluster starts. is there a way to send commands straight from a CopycatServer object after bootstrapping? and is there a way to establish whether a CopycatServer has started from scratch or initialized from a log file?
Jordan Halterman
@kuujo
Jun 23 2016 23:08
So, I think it would be impossible to ensure only the leader writes to a database from within the state machine. Even though leader information is stored in the log, it’s not even possible to determine which node is the current leader from that since new leaders can commit entries from old terms. A leader could commit an entry and crash before any other node commits it, and the new leader could then be elected and commit the same entry. In that case, two leaders committed the same entry.
I can’t recommend non-determinism in the state machine. The only way it remains consistent is by all nodes doing everything. This is really what Atomix is for otherwise. You can elect a leader with DistributedGroup and write from that leader. But even in that case, it’s impossible to ensure two nodes don’t believe themselves to be the leader simultaneously, so you still have to use the term provided by DistributedGroup to ensure the leader with the highest term is the one that can write. The only thing that can be guaranteed is that only one leader will be elected for any term, that term numbers are unique and monotonically increasing, only one entry for any index will be committed, all commands will be eventually applied on all servers, etc and you have to work within the context of those guarantees to create exactly-once semantics through idempotency. Exactly-once in reality is impossible.
Roman Pearah
@neverfox
Jun 23 2016 23:10
That's right. Exactly-once side effects are provably impossible.