These are chat archives for atomix/atomix
If you were having issues with that code, it's likely from
BalancingClusterManager, which was/is really a prototype of something we want to do with Atomix clusters but AFAIK doesn't currently work.
I'm really interested in those exceptions though. They're both illegal states that seem to likely be caused by a broken thread model.
So, good questions on distributed locks. I can share a bit of what we've done for concurrency control, and all this code is open source and I can point you to it...
So, first on the question of the scalability of locks. Atomix, of course, has a
DistributedLock resource which is backed by a Copycat state machine that does distributed locking. The state machine is pretty simple, but I actually don't even use it.
Scaling a pessimistic distributed lock is difficult. But because you're talking about millions of discrete objects, it's possible to scale the locks with partitioning since presumably locks can act independently of one another between objects. The number of locks you can scale to is limited only by memory in your cluster. If you can partition the locks then you can scale the memory as well, and each partition is then limited by memory. The throughput of pessimistic locks can also be scaled by partitioning, but is limited by the amount of contention as well. When a client requests a lock that's already been acquired by another process, it has to wait for a sequential unlock event when the lock is released, and that amounts to coordination between clients that can be hugely expensive.
Anyways, in general you should try to avoid pessimistic locking in distributed systems if at all possible, and that's what we do. Raft provides all the primitives necessary to do extremely efficient distributed locking if used correctly (logical clocks and atomic CAS operations).
I actually just finished rearchitecting transactions for the project I work on (ONOS). It's a perfect case study in efficient concurrency control that uses optimistic and pessimistic locking, caching, and two-phase commit to provide read committed/repeatable read/snapshot/serializable isolation across multiple transactional objects in a partitioned Atomix cluster. I'd be happy to describe how that was implemented (which I intend to document anyways) and share the code (it's open source).