These are chat archives for atomix/atomix

11th
Mar 2016
Jordan Halterman
@kuujo
Mar 11 2016 00:28
This message was deleted
Jordan Halterman
@kuujo
Mar 11 2016 00:39

Hey @jfim sorry I have been in meetings. I think you’re on the right track. Seems like that would make sense, but it actually wouldn’t be accurate and would lose accuracy the further behind the leader you got. Copycat can exclude entries from replication depending on feedback from the state machine. For instance, in Atomix’s DistributedMap state machine, if clients submit two commands put(foo, 123) and put(foo, 345) one after the other, what will happen is the leader will likely replicate put(foo, 123) and put(foo, 345) to a majority of the cluster in the same batch, and once both are committed the map state machine on the leader will release put(foo, 123) since it no longer contributes to its state. Thereafter, when replicating those entries to any additional nodes that are further behind the leader, the leader will actually just exclude put(foo, 123) since it’s irrelevant to the committed state of the cluster, and that entry will ultimately be compacted from the log.
http://atomix.io/copycat/docs/internals/#replication-performance
If snapshotting is being used, the behavior changes a bit, but not much. For snapshotted state machines, when a snapshot is taken and stored Copycat will release all SNAPSHOTTABLE entries up to that point in the log, and those entries can will then be excluded from replication in the same manner. But even in that case, Copycat still releases internal entries (keep-alive, leader changes, etc) in the manner described above. For instance, when the client with session 1 submits a keep-alive, all prior keep-alives are no longer needed and so they’re excluded from replication and eventually compacted from the log.

So, that’s all to say that because Copycat attempts to exclude entries from replication when they no longer contribute to the system’s state, it’s not very straightforward to determine how many entries behind the leader a follower is. In order to determine precisely how far a follower is behind a leader, you would actually need to determine the number of live entries (which will be replicated to that follower) in the leader’s log from the follower’s matchIndex up to the leader’s last log index. That calculation would be complicated a bit by the differences between tombstones and non-tombstones. It would simply be too expensive to track it with any precision, but perhaps somewhat meaningful numbers could be derived from indexes.

Thanks for the compliments :-) Relevant to the above discussion: the entire section on log compaction is pretty insightful I hope - http://atomix.io/copycat/docs/internals/#log-compaction-algorithm
Richard Pijnenburg
@electrical
Mar 11 2016 00:44
@kuujo how you doing bud?
Jordan Halterman
@kuujo
Mar 11 2016 00:44
great
you?
Richard Pijnenburg
@electrical
Mar 11 2016 00:46
Tired but good. Second week at the new job almost done
Jordan Halterman
@kuujo
Mar 11 2016 00:47
haha awesome
how’s that going?
other than tiring
Richard Pijnenburg
@electrical
Mar 11 2016 00:48
Pretty good. Mostly drinking in the evenings lol. Getting along pretty well with the rest of the team.
One of the sales guys wants to hook me up with a friend of him. Lol
Jordan Halterman
@kuujo
Mar 11 2016 00:49
haha
I went to an LA Kings game last night
does anyone watch hockey over there?
Richard Pijnenburg
@electrical
Mar 11 2016 00:51
Some times yeah when I get the chance.
Interesting game ?
Jordan Halterman
@kuujo
Mar 11 2016 00:52
yeah it was really good
I also got tickets to Kobe Bryant’s last game :-P
Richard Pijnenburg
@electrical
Mar 11 2016 00:55
Nice !
I'm gonna get some sleep. 1 am almost
Catch you tomorrow
Jordan Halterman
@kuujo
Mar 11 2016 01:03
adios
Jean-François Im
@jfim
Mar 11 2016 01:20
@kuujo Ah I see, so the compacted log is replicated across the events, not the raw log is what you're saying. So for example if a new replica was really far behind, it wouldn't see all the intermediate event ids. That's actually fine for what I would use it for, I just want to figure out if a replica is really far behind or close to the tip of the log.
The accuracy doesn't matter too much in my case, a replica is either too far away from the end of the log or close enough to it.
Jordan Halterman
@kuujo
Mar 11 2016 01:21
yeah that should work then
Jean-François Im
@jfim
Mar 11 2016 01:21
Thanks
Is copycat considered usable for production purposes (eg. are there other people using it in production systems) or only for tinkering for now?
Jordan Halterman
@kuujo
Mar 11 2016 01:32
I would say Copycat is as close to production ready as can be without having been deployed in many production environments. It’s sort of in a chicken and egg spot wherein wide adoption is necessary to find certain environment/use case specific bugs but adoption is necessary for adoption. Atomix is slightly behind it in terms of my confidence in its stability. But the state of Copycat is that we have done all we can do help our confidence that it is ready for production, putting a ton of time and effort into testing and evaluating the algorithms. I haven’t kept myself up to date in terms of where it’s being used now, but one of the contributors that has help significantly in getting Copycat production ready is Madan Jampani who is now working on ONOS (http://onosproject.org/) where they’ve been using it for quite a while from what I understand. It has been tested in networks and in Jepsen, and Madan has contributed a number of essential bug reports and bug fixes for many of the most critical aspects of the algorithms, so it’s at the point where it’s probably ready for production use with the caveat that it’s still a young, complex project that is bound to have its issues.
Basically, my feeling is that it’s on the cusp of production ready
and I intend for 1.0 to be production ready, that will probably be in a couple releases
Jonathan Halterman
@jhalterman
Mar 11 2016 01:34
...young in terms of deployments in the wild, certainly not in terms of development time.
Jordan Halterman
@kuujo
Mar 11 2016 01:34
indeed that
Jonathan Halterman
@jhalterman
Mar 11 2016 01:34
It's been a few years in the works now.
Jordan Halterman
@kuujo
Mar 11 2016 01:36
atomix has much of the actual commit history for Copycat from which it was separated a while back, so there are actually thousands of commits and a few years of experience behind it. Until late last year, it was a lot of research and experimentation, but the last many months have been focused solely on stability with daily commits
Jean-François Im
@jfim
Mar 11 2016 02:11
Thanks, sounds like quite a bit of work went into stabilization lately, which is always a good sign
Jordan Halterman
@kuujo
Mar 11 2016 02:12
You could probably email Madan and ask him about his experience if you’d like. He’s been around for a long time and has been working on projects with it for a long time.
Jean-François Im
@jfim
Mar 11 2016 02:13
Well I'll try building something with it and see how many pieces I end up with in the end, hoping it's only one :)
Jordan Halterman
@kuujo
Mar 11 2016 02:13
haha
Jean-François Im
@jfim
Mar 11 2016 02:16
From what I've seen, the documentation and design are clean and well written, so it's time to kick some tires
Thanks for the info, I'll go try it out
Jordan Halterman
@kuujo
Mar 11 2016 02:18
:-D good luck! We're always (usually) around. Personally, I think documentation is very critical even if it’s sometimes hard to keep up. Most of the Copycat documentation is great and really accurate/relevant though.
Jean-François Im
@jfim
Mar 11 2016 02:19
It's great documentation actually, it looks polished and doesn't seem like an afterthought
Jordan Halterman
@kuujo
Mar 11 2016 10:08
@jfim FYI I just pushed new Catalyst and Copycat releases. Both just with various minor bug fixes in serialization and snapshotting in Copycat. Will be finishing up the website docs for Copycat this weekend, but in the meantime the Javadoc is pretty extensive and up-to-date. Also on that note, since you're working in Copycat you might be interested to know that all Atomix resources are also Copycat state machines. So, all of the Atomix resources (variables, collections, locks, leader election, etc) are just implementations of StateMachine with a nicer API, and they therefore also serve as real-world examples of state machines. The resource state machines can be found under the *.state.* packages throughout Atomix. Note, though, that most of the Atomix state machines use incremental log compaction by tracking the liveness of commits and releasing (closeing) commits that no longer contribute to the state machine's state. But as I mentioned, Copycat optionally abstracts this away for Snapshottable state machines. Snapshotting is simpler to implement in a state machine, but tracking liveness allows for the replication optimizations I mentioned earlier. For the examples in Atomix, the LongState (DistributedLongs state machine) implements snapshotting, and the rest use incremental compaction. The process of incremental compaction is not yet described completely on the website, but perhaps the plethora of examples account for the lack of documentation. Even if parts of the docs are good now, the rest will continue to be developed until the full 1.0 release.