These are chat archives for atomix/atomix

10th
Mar 2016
Jonathan Halterman
@jhalterman
Mar 10 2016 02:35
@joachimdb distributed lock was just moved :) http://atomix.io/atomix/docs/concurrency/
pardon the disruption
that was due to the need to separate DistributedGroup out to a separate package, so we just moved distributed lock as well. that's about the end of moving things around for the 1.0 release.
Joachim De Beule
@joachimdb
Mar 10 2016 09:48
This message was deleted
Joachim De Beule
@joachimdb
Mar 10 2016 09:59
Question: what's the best (cheapest) way to test if a DistributedLock is locked? I was thinking of tryLock but wasn't sure? Is there sth like onLock(lock -> {...})?
also, the tryLock potentially actually locks whereas I just want to know if there is a lock
Joachim De Beule
@joachimdb
Mar 10 2016 10:09
nm, got it (onStateChange)
Joachim De Beule
@joachimdb
Mar 10 2016 10:23
grmbl, doesn't seem to work after all: I don't get to see a state change on lock events. Should I?
Joachim De Beule
@joachimdb
Mar 10 2016 12:45
It would be helpfull to have an overview of supported state changes per distributed resource class. And some more things: (1) calling close on a replica or client never completes, even after closing all distributed resources. Any ideas why? (2) putting a clojure keyword in a DistributedMap never completes, not even exceptionally. It does however render the map unusable .. (subsequent gets and puts, even of java core types, never complete)...
Joachim De Beule
@joachimdb
Mar 10 2016 13:26
Regarding DistributedTopics: there's a difference between the web-site doc at http://atomix.io/atomix/docs/messaging/ and the class doc at http://atomix.io/atomix/api/latest/io/atomix/messaging/DistributedTopic.html: the former doesn't talk about subscribing and suggests to use onMessage to receive messages. It doesn't work. The latter says to handle messages via a subscribe callback and does work..
Jordan Halterman
@kuujo
Mar 10 2016 17:28
Awesome...
On the topic of Resource.State see the Javadoc: http://atomix.io/atomix/api/latest/io/atomix/resource/Resource.State.html The resource state represents the resource's (and the underlying client's) ability to talk to the cluster. It references the connection state, not the state of the state machine. This is the same thing Curator does for ZooKeeper. When the client is able to communicate with the cluster, its state will be
CONNECTED. If the client can't reach the cluster, it will be SUSPENDED, indicating some linearizability guarantees may be lost. If it's disconnected for too long, its session van be CLOSED and it will reopen a new session.
Jordan Halterman
@kuujo
Mar 10 2016 17:35
There is no way to determine if a lock is locked. We could add an isLocked method, but the problem is a lock can be locked by multiple threads or even by the same thread multiple times, and there's no context to represent that. If thread 1 on client A calls lock, and then thread 2 on client A calls lock, the lock will be acquired once, and once it is released will be acquired again by the same process. This allows multiple threads to access the same lock. So, isLocked would not really be accurate. Thread 2 could call isLocked and it could return true even though thread 1's call is actually the one that has the lock. We'd have to add something like a LockFuture to provide context to specific lock calls, which is fine.
Jordan Halterman
@kuujo
Mar 10 2016 17:42
The way that lock checks are supposed to be done is through the lock token. This is the only safe way to check the lock in a distributed system. The reason an isLock method would not be entirely safe in a distributed system is because, in theory, a process could request and acquire a lock but a process pause could occur between acquiring the lock and taking action on the lock. Once the lock future is completed, that indicates that the session was granted the lock, and you can use that context to say that the lock is acquired. But if e.g. a long GC pause occurs immediately thereafter, the session could expire and another process can be granted the lock. You can call isLock to check the lock again, but you still have the same potential problem where two nodes could believe themselves to hold a lock by the time the isLock response is received. Instead, the only completely safe way to check if a lock is acquired is to use the token with which the CompletableFuture is completed when a lock is acquired. When the lock holder interacts with whatever distributed resource it's locking, it uses the token to ensure no more recent process has acquired the lock and interacted with that resource. Essentially, you're doing version control. The lock with the highest token wins, and a lock holder that fails due to another lock holder having accessed the same resource with a higher token can safely assume its lock has been lost.
So, the reason there is no isLocked method is because it doesn't give you any information you don't already have. You know that when a lock call completes, the process has acquired the lock. isLocked will not give you any stronger guarantee than that, but monitoring the DLock for a state change to SUSPENDED can suggest the lock may have been lost.
On the topic of closing a client/replica, I'd like a reproduced for that.
Jordan Halterman
@kuujo
Mar 10 2016 17:48
Replicas and clients are closed repeatedly throughout all the tests (opened before and closed after every test), but I'll try to see what you're seeing
Joachim De Beule
@joachimdb
Mar 10 2016 17:48
would atomix/atomix#154 do?
Jordan Halterman
@kuujo
Mar 10 2016 17:49
Probably I'll check it out sorry :-) responding to these first
Joachim De Beule
@joachimdb
Mar 10 2016 17:50
np, great responses, and thank you for them! I was planning to add an example where closing fails with more than 2 replica's btw, but maybe I'll await your respone to that issue then
Jordan Halterman
@kuujo
Mar 10 2016 17:50
On the topic of Clojure keywords, this seems Like it could just be a serialization issue. By default, the serializer rejects serializing classes that aren't explicitly registered for security reasons. We've seen cases in the past where things are serialized in e.g. Netty threads and exceptions get caught and suppressed. This probably indicates we need to add some try-catch blocks to make sure futures are failed properly.
Yeah... That could be good. You should be able to close a client/replica without closing any resources. The way state machines work internally is when a client's session is closed all its resources are automatically closed. If that's not working that's for sure a bug
This is a huge help! Thanks a lot
Joachim De Beule
@joachimdb
Mar 10 2016 17:54
well, closing or not closing resources, it didn't make a difference for me, so it's probably sth else ... and sounds like it may be nice to add support for custom serializations?
np, I just want to be able to use a decent distributed platform java lib fast :D
Jordan Halterman
@kuujo
Mar 10 2016 17:56
There is support. It's just not well documented yet. You can register custom serializers and custom serialization frameworks. Catalyst packages Jackson and Kryo serializers and serializesSerializable, Externalizable et al natively. The security thing is just a little annoying. But it also serves to improve performance. Forcing users to register types can also force them to associate a numeric type ID so we don't have to serialize class names.
Joachim De Beule
@joachimdb
Mar 10 2016 18:02
great, I might look into that. Would be really cool if clojure (edn) data types are supported by default .. btw, when can we expect the next release on maven central?
Jordan Halterman
@kuujo
Mar 10 2016 18:13
I think we should still add something like a LockContext to check the state of a lock. It’s still useful even if it’s not 100% safe. Often it’s enough to assume some arbitrary pause is not going to occur.
this weekend for sure
I’m going to release Catalyst and Copycat today then Atomix once the last couple PRs are merged and we can look in to these issues
Jean-François Im
@jfim
Mar 10 2016 21:29
I was reading the Copycat docs today, great documentation :)
I had a question though, is there a way to know the depth of log entries to be processed by the state machine when a replica is catching up to the leader?
Jean-François Im
@jfim
Mar 10 2016 21:37
From reading the docs, it doesn't look like it's possible though it could be I'm just looking at this wrong
Jean-François Im
@jfim
Mar 10 2016 21:45
Ah, I guess I could send a no-op query to the cluster and check the difference in commit indices between what's being processed by the state machine and the leader's commit index