These are chat archives for atomix/atomix

12th
Aug 2016
Jordan Halterman
@kuujo
Aug 12 2016 08:20
Code would indeed be nice. The very fact that a join is successful indicates that replication is working since joins are done through the same mechanism that controls state machine commands (the replicated log).
Jordan Halterman
@kuujo
Aug 12 2016 08:51
If state machine commands are not properly implemented this could certainly happen. For example, if the state machine does not support snapshotting, Copycat assumes the state machine will indicate when a command no longer contributes to the state of the system. In that case, if all the commands are released by the state machine then there's nothing to replicate.
James Watson
@JPWatson
Aug 12 2016 15:13
@kuujo what do you do if you have a state machine where a command always contributes to the state of a system? as a trivial example, an Increment command.
Jordan Halterman
@kuujo
Aug 12 2016 17:04
@JPWatson that's actually a great example, and that's exactly why snapshotting is implemented. In Atomix, the DistributedLong state machine uses snapshots, but most of the other state machines use the incremental compaction algorithm. The two just have to be used wisely. It would be insane to store e.g. a million commands to arrive at a 64-bit number. Might as well just store the 64-bit number. Storing and replicating snapshots just adds some overhead that isn't present in the incremental compaction algorithm. There some change if a large snapshot is being stored or replicated it could cause a pause in the system. There are some ways Copycat could get around this, e.g. copying the state machine memory before taking a snapshot so commands can keep being applied, but I've been reluctant to do any of that.
chance*
Mahidhar Rajala
@rmahidhar_twitter
Aug 12 2016 20:24
@kuujo I was releasing commands without snapshotting support. Removing release fixed the issue. Thanks for the quick help.
James Watson
@JPWatson
Aug 12 2016 21:05
@kuujo in https://github.com/atomix/copycat/blob/master/server/src/main/java/io/atomix/copycat/server/Snapshottable.java#L50, the increment command is being closed, is that right? is a 'closed' commit still replicated to a node that joins cluster?
Jordan Halterman
@kuujo
Aug 12 2016 21:34
@JPWatson so, if a state machine is Snapshottable and the command's compaction mode is DEFAULT (isn't changed to something else) Copycat will ensure it gets replicated as necessary. When a state machine is Snapshottable, Copycat assumes all commands will be stored in the snapshot when one is taken. That means it will replicate the commands to a majority of the cluster, and when a snapshot is taken t will be removed from disk. If a snapshot is taken after a command is applied and a follower is dead or is lagging far behind and never received the command, the leader will automatically send its snapshot to that follower. So, state from Snapshottable commands will eventually make it to every node whether that be through replication of the command or replication of a snapshot that was taken after that command. If all state machines need to see a command there is a way to do that as well.