These are chat archives for atomix/atomix
removalListener, etc, but as I mentioned
expireAfterAccesswould be extremely costly. It would require a write on every read to replicate the last access time. Maybe the best way to do this would be using the anti-entropy protocol, in which case the cache is always local and eventually consistent so extremely efficient.
So, what you’re really referring to in Raft terms (which also applies to the primary-backup protocol) is a persistent state machine. This is actually a really challenging problem to solve correctly, which is why Atomix doesn’t really do anything to support them currently.
The problem is, the state machine is populated from a history of operations written and replicated in the Raft log, and operations are applied on several nodes. So, if you’re calling a REST API then each node will make the same call to the same REST API n times for an n-node partition. Additionally, when a node crashes and restarts, the history in the Raft log is replayed. What that will mean is you’re repeating old calls to the file system or REST API.
This is even compounded when you introduce the problem of reconfiguring Raft partitions. Typically, when a Raft partition is reconfigured to add a new node, we send the latest snapshot and the rest of the logs to the new node. But in a persistent state machine, the snapshot is the persistent state - e.g. the files you’re writing on the file system or the state behind the REST API - so that is what needs to be sent to the new node instead of a state machine snapshot, and Atomix doesn’t currently provide an API for state machines to provide their own snapshots.
Some of this is avoidable. Raft and the primary-backup protocol provide monotonically increasing operation indexes. Those indexes can be used to deduplicate calls to an external REST API. But the reconfiguration of partitions is the biggest problem for services that access the file system.
You can probably make it work for a REST API but not file system... at least not correctly.
But Atomix is not designed to be used in this way, which is why support for persistent state machines has not been built in to it. Instead, users should look to use primitives to build higher level replication protocols. If you need to coordinate cluster-wide access to a shared resource, use a lock or leader election and use the
ClusterCommunicationService to send operations to the leader. In ONOS we use a
LeaderElector to elect multiple leaders - one per switch - to control switches and proxy operations through them.
LeaderElector, and something like that is not easy I’m Raft because it’s designed to elect the leader with the most up-to-date information, not the leader in the best position. I’ve never been a fan of using Raft to replicate e.g. persistent databases. In fact, I found a horrible implementation of this type of architecture quite recently...
LeaderElectionfor that. Don’t rely on the Raft leader for anything. It’s only exposed for informational purposes.