These are chat archives for atomix/atomix

14th
Apr 2016
Roman Pearah
@neverfox
Apr 14 2016 02:03
@kuujo Does that mean you see 1.0.0 coming soon?
Roman Pearah
@neverfox
Apr 14 2016 02:30
@kuujo This is probably an obvious question, but if the holder of a lock gets partitioned from the system, what happens to the lock? And what's the strategy for dealing with a lock-holder that believes it has the lock but is really partitioned away (and the lock has been subsequently obtained by someone else), assuming that's even possible?
@kuujo I guess that's really a general question about what happens when things go wrong at the most inconvenient time, i.e. the callback, whether it's leader election, locks, etc.
Roman Pearah
@neverfox
Apr 14 2016 03:23
@kuujo Sorry, just found the "Detecting failures" section of the javadocs, so my question is answered.
Roman Pearah
@neverfox
Apr 14 2016 04:53
Ok, I started playing with rc4 and, unfortunately, we I get to the step of joining my second replica to the first bootstrapped replica, the future never completes. Any idea why that might happen? I'm using memory storage and NettyTransport, fwiw.
Roman Pearah
@neverfox
Apr 14 2016 05:21
I'm going to guess it's because I'm not doing it in separate processes.
Jordan Halterman
@kuujo
Apr 14 2016 06:58
Hey sorry I was at the Lakers game :-)
@neverfox one thing you might check is to ensure the replicas are using different log directories. If two replicas are configured with the same log directory they'll go haywire and behave very unpredictably since the second replica will read the first replica's configuration and log instead of creating its own configuration and log. This is a pretty common issue people have with running multiple replicas on the same machine.
I did run the tests with the NettyTransport before the release, and that effectively means running a cluster in a single process which should work fine
On the topic of the release, if this RC goes well it will indeed be released. In reality, I'd expect Copycat 1.0 can likely be released and Atomix may need one more RC to get there since it's typically a bit behind Copycat in terms of stability.
Jordan Halterman
@kuujo
Apr 14 2016 07:05
On the topic of locks, when a lock holder is partitioned from the rest of the cluster, what will happen is its session will expire since it can't send keep-alives to the cluster. When the leader expires the client or replica's session, the lock state machine will get a notification via the expire(ServerSession) method and will release the lock and grant it to another session if there's one waiting for it.
The amount of time required to expire a client's session can be configured in Copycat, but I just realized it may not be exposed in Atomix which is not good. In Copycat you can set the session timeout in the CopycatServer.Builder. When a client/replica registers a session, the leader through which the session is registered assigns a timeout to the client/replica's session. The client/replica (or the underlying CopycatClient sends keep-alives at 1/2 the session timeout. If a leader doesn't here from the client for a session timeout, it expires the session and all state machines can react to the expired session.
Sessions that are expired and sessions that are explicitly closed are treated the same way by the lock state machine. If a lock holder closes the resource or the Atomix client/replica, the lock will be released, and if the session expires the same will happen.
Jordan Halterman
@kuujo
Apr 14 2016 07:12
If a client's session is expired but the client wasn't closed by the user, Atomix will register a new session for the client so operations can continue to be submitted. In that case, the lock will still be lost. Lock users can detect the loss of a lock by listening for state changes in the client, but I think some more obvious mechanism may be necessary. I've thought about adding something like a LockContext on which you can register a listener to detect when the lock is lost due to connection issues.
The new Copycat client documentation mentions a bit about how the client does state changes, and Copycat's architecture documentation goes more in depth on how sessions are expired
Roman Pearah
@neverfox
Apr 14 2016 12:31
Hey sorry I was at the Lakers game
Excuses, excuses lol
Thanks. I guess with MEMORY storage, there was no real way to stop those collisions from happening? Could that have been my problem?
Richard Pijnenburg
@electrical
Apr 14 2016 12:41
@neverfox is that with instances launched via the same process or separate ?
Roman Pearah
@neverfox
Apr 14 2016 12:45
Same
In a Clojure REPL, in fact
Richard Pijnenburg
@electrical
Apr 14 2016 12:45
hmm. then memory storage might mess it up. but not sure.
you could try disk storage and make sure each instance has a unique dir
Roman Pearah
@neverfox
Apr 14 2016 12:46
I'll try files now that I'm more comfortable with what that means, basically a log.
Richard Pijnenburg
@electrical
Apr 14 2016 12:46
ah okay :-)
Jordan Halterman
@kuujo
Apr 14 2016 16:59
Hmm shouldn't do that with MEMORY storage at all.
grahamashby
@grahamashby
Apr 14 2016 19:33
We're seriously looking at using Atomix instead of Zookeeper in our product. I realize that GA is coming soon, so that's OK. But my bosses are going to ask if there is anyone using it in production, since they wan't want to be bleeding edge. Anything I could tell them?
Jordan Halterman
@kuujo
Apr 14 2016 19:36
Hey...
Jordan Halterman
@kuujo
Apr 14 2016 19:50
I can never argue against ZooKeeper and its offspring like Consul. Atomix was heavily influenced by its success, and in particular the session events algorithm was modeled on it. Though Copycat and Atomix have had years of development effort put into them including Jepsen testing and a lot of help from some really smart people, I don't expect it will be as stable as a ZooKeeper very soon. But I'm totally not answering your question... I know of several projects that are using Copycat and/or Atomix and have been for a long time, but I'm not up to date on the environments in which they are being used. @madjam works on one of those projects and has contributed a lot to Copycat's more intricate algorithms (log compaction and session events) and might have some thoughts. But IMO Copycat/Atomix are in a chicken-and-egg spot. We've been really hesitant about releasing it and calling it stable because it hasn't been widely adopted, but wide adoption is dependent on stability. A ton of progress has been made and will continue to be made this year, but my honest opinion on questions like these for the time being is: I would certainly deploy it in my own production environment and am because I have all the domain knowledge in the world, but in terms of recommending it for production environments in general I'll likely remain hesitant until I see more deployments in general :-)
Jonathan Halterman
@jhalterman
Apr 14 2016 19:51
@grahamashby ^ :)
Jordan Halterman
@kuujo
Apr 14 2016 19:51
Ahh yes that
But the API is considered stable and after a few more weeks of testing it will be released and called stable.
Richard Pijnenburg
@electrical
Apr 14 2016 19:54
I've been playing around with it a bit but not fully using it. But from when I first started trying it out and now it made major improvements :) I'm happy to say I contributed small things to it, but the core team really know what they are doing :) h
Just my £ 0.02
Jonathan Halterman
@jhalterman
Apr 14 2016 19:57
@grahamashby To elaborate on @kuujo's response, we're working an Atomix integration into some of the HPE Helion Eucalyptus infrastructure to replace some legacy JGroups-based coordination stuff, and while GA is nice, I recognize that integrations such as ours are part of the process of feeling confident about declaring a build GA. Most of the pre-requisites are in place for that, except broad usage in the wild - and with a distributed system there is some value in that.
...getting some broad usage
Richard Pijnenburg
@electrical
Apr 14 2016 19:59
Since it's very specific software it's harder to get the traction / wide usage I think ? That said this is the best and most complete raft library I've seen so far.
And I was looking for quite a while in different languages.
Richard Pijnenburg
@electrical
Apr 14 2016 20:06
So :thumbsup: to @kuujo @jhalterman and @madjam :smile:
grahamashby
@grahamashby
Apr 14 2016 20:11
Thanks all. See the thing is we using Zookeeper for a small amount of configuration data in a distributed system. But ZK 3.4 cluster configuration is a beast. ZK 3.5 is similar to Atomix, although Atomix seems even more flexible. But 3.5 probably won't be GA for another year or two, if I read the tea leaves correctly.
Richard Pijnenburg
@electrical
Apr 14 2016 20:12
I’m horrible at reading tea leaves. mostly because i get tea out of the teabags :p
Sorry, trying to be funny :-)
Jordan Halterman
@kuujo
Apr 14 2016 20:41
Haha
Yeah one thing I can say about Copycat/Atomix is that it's the most complete Raft implementation I know of. It implements all aspects of the Raft algorithm and contributes algorithms back to Raft for log compaction and session events. Some of these algorithms are actually improvements on what ZooKeeper does. ZK's session events are not fault tolerant. If a client is switching servers it can miss events AFAICT. But Copycat's session events are fault tolerant and events missed by a client when switching servers will be resent when it reconnects to a new server. You can read about the algorithms in the architecture section in the Copycat docs. It also seems to perform about as well as Diego's implementation AFAICT, and Diego has also read the architecture literature and hasn't yet pointed out any flaws :-)
Richard Pijnenburg
@electrical
Apr 14 2016 20:59
@kuujo been working a bit more on my project.. In the current logstash stuff they build up the actual ruby code how to run the pipeline including conditionals. I’m guessing i’ll have to do some kind of simiair thing? Because with the current loops i have no idea how to implement the conditional logic. And also still looking for something to parse the LS config it self.
Jordan Halterman
@kuujo
Apr 14 2016 20:59
hmm not sure about the LS config
isn’t that a custom syntax?
Richard Pijnenburg
@electrical
Apr 14 2016 20:59
yeah
Jordan Halterman
@kuujo
Apr 14 2016 21:00
bah
Richard Pijnenburg
@electrical
Apr 14 2016 21:00
they use treetop to parse it into something more sensible
but no idea how to do that in java
Jordan Halterman
@kuujo
Apr 14 2016 21:00
it’s pretty simple though
Richard Pijnenburg
@electrical
Apr 14 2016 21:00
couldn’t find anything that looks like it
Jordan Halterman
@kuujo
Apr 14 2016 21:00
IIRC
Richard Pijnenburg
@electrical
Apr 14 2016 21:01
Also haven’t found anything that looks like their config methods for plugins, to define the settings for a plugin. guess its an abstraction i’ll have to build. but no idea how.
Jordan Halterman
@kuujo
Apr 14 2016 21:01
Jackson is the de facto parsing library. I know you can write custom parsers but no idea how easy or hard it is
Richard Pijnenburg
@electrical
Apr 14 2016 21:01
hmm yeah
to be fair its sort of json kind-isa :p
Jordan Halterman
@kuujo
Apr 14 2016 21:02
yep
Richard Pijnenburg
@electrical
Apr 14 2016 21:02
the current ruby code is so confusing
really shouldn’t use that as a guide
i think the config parsing and creating somekind of logic from that including the conditional stuff is going to be extremely hard :-(
got a very basic pipeline working. with multiple inputs, multiple filters and a single output
Wish i could find someone with more java knowledge who would love to work with me on this
Jordan Halterman
@kuujo
Apr 14 2016 22:15
you’ll pick it up fast :-)
Richard Pijnenburg
@electrical
Apr 14 2016 22:16
I wish :) to old to learn this stuff as fast as I used to.