These are chat archives for atomix/atomix

5th
Jan 2017
Jordan Halterman
@kuujo
Jan 05 2017 10:09 UTC
Essentially, we use Jepsen to test Atomix under various types of network partitions. Jepsen tests run a few Docker containers, and Jepsen runs a few clients. The client's submit various reads and writes to a DistributedValue while Jepsen creates partitions in the network. Jepsen then checks to ensure that the responses received by clients represent a linearizable history. This bug made it possible for two different entries to get applied at the same logical time on different servers. Jepsen should have seen that inconsistency at some point in the tests after forcing a leader change and submitting reads to different servers. But I think we just haven't run them enough. They need to become part of the normal development workflow and be run very frequently to catch rare bugs like this.
Kevin Daly
@kedaly
Jan 05 2017 13:00 UTC
Just a quick question about the status of Atomix, the Maven Repo has been at RC9 since June.. are there any plans for bugfixes or a release?
Tyler Foster
@tfoster
Jan 05 2017 18:41 UTC
Hey everyone! I'm just getting started with atomix, great work so far guys.
I'm seeing pretty heavy CPU utilization when polling a DistributedQueue. I see a bunch of discussion on DistributedWorkQueue, which will be awesome.
I also saw some stuff around using blockingqueue semantics, which would be a great fit for how I'm trying to use the DistributedQueue, but what is the current best practice for continuously consuming a DistributedQueue. Continually polls seems wicked expensive from what I can see?
Jordan Halterman
@kuujo
Jan 05 2017 18:47 UTC
@kedaly there will be a new release within the next couple of days. The master branch of Atomix has some important bug fixes in it already - mostly for the leader election/group membership resource - and we’re testing several bug fixes in Copycat today. Once I get some feedback on that testing, all three projects will be released
Kevin Daly
@kedaly
Jan 05 2017 18:48 UTC
@kuujo Thank's a lot.. I'm hoping to be able to have some resources to commit to Atomix in the near future...
Jordan Halterman
@kuujo
Jan 05 2017 18:49 UTC
Awesome! Always need more help :-)
Jordan Halterman
@kuujo
Jan 05 2017 19:47 UTC

As for queues, yeah, continually polling will be expensive. That certainly limits their usefulness. The problem is polling requires a write - since the state of the state machine would be modified by removing an item from the queue - which means a write to disk and replication of an entry. That's crazy. We've had implementations of queues that push to clients, but it has never been finalized because there are several ways to go about it, and a decision just hasn't been made on the right way.

Currently, the closest thing Atomix has to an efficient queue is using the messaging feature in DistributedGroup. That messaging does not require any polling - messages are pushed to clients via the events system - so one write to add a message and one write to remove a message is all that's required. But I don't believe that particular API satisfies all potential use cases. I think we do want to implement a standalone work queue resource or something like that.

The questions around that just have to do largely with ordering guarantees, how to handle multiple consumers, etc. I think if ordering is no concern then it should be easy to implement.

If we can decide on the particular use case and API then implementing an efficient queue that doesn't require any polling should be trivial.

Jordan Halterman
@kuujo
Jan 05 2017 19:54 UTC
I'm actually starting to work on an HTTP API for Atomix and Copycat which will be the second incarnation of those projects, but I'd like to get better resources into Atomix so they can mature before that HTTP API is done. I may take a stab at a basic queue that's better for distributing work. I may just be able to add new event methods to DistributedQueue for now. Using session events it would be much, much more efficient. The question is, do consumers need to process queue items in sequential order? If so, that may mean pushing an item to a client and waiting for an ack before distributing the next item.
Tyler Foster
@tfoster
Jan 05 2017 19:59 UTC
That's a good question. In my case an order guarantee wouldn't be a requirement, but there are definitely cases where order would need to be guaranteed.
Ack would be pretty chatty on large clusters
Jordan Halterman
@kuujo
Jan 05 2017 20:00 UTC
If order is not a requirement then clients can ack batches of queue items
Tyler Foster
@tfoster
Jan 05 2017 20:03 UTC
Yeah, that's true. Would it be too unfair (from a workload distribution perspective) to designate a leader for the queue that orchestrates distribution of items?
Tyler Foster
@tfoster
Jan 05 2017 20:09 UTC
You could probably hit 80% of use cases by just having a blocking poll / continuously poll method using a backing thread that calls the Consumer when an item is available, then worry about ordering and such in a later implementation.
Or using messages as you said
I can take a crack at the implementation if you want
Jordan Halterman
@kuujo
Jan 05 2017 20:11 UTC
I think you're right about that
Jordan Halterman
@kuujo
Jan 05 2017 20:17 UTC

Have at it. Basically all we need is:

  • A command to register a consumer
  • A command to unregister a consumer
  • Methods on QueueState to add/remove those listeners
  • Add session.publish("item") when an item is added
  • Register an item event listener in the DistributedQueue

The easiest implementation would just be doing that so the client gets notified when an item is added to the queue. Then clients can poll to get the item. There are all sorts of problems with this though. If a client polls a queue, the item is removed before the response is received. If the client gets disconnected after an item is removed from the queue then the item is lost. So, this is basically at most once processing, whereas using socks would get at least once processing

Hmm
I'd rather do that than any polling
But...
Spent too much time on events to not use them :-)
It may also suffice just to add change events to DistributedQueue first. I actually already have a lot of that code written and it will be in the net release
That is, an event that says an item is available, and the user can poll for it
Tyler Foster
@tfoster
Jan 05 2017 20:36 UTC
I get what you're saying about the lack of reliable delivery, which is completely applicable when someone wants to implement a resilient queue, but that's not really the responsibility of a queue primitive.
If you have a blocking queue and a thread takes an item and then the process crashes, you have the same result.
It would likely make sense to have a distributed resilient queue / distributed work queue at some point, but don't think it's needed for a lot of use cases
Adding an onChange event would probably be good enough
Tyler Foster
@tfoster
Jan 05 2017 20:41 UTC
For the distributed resilient queue, you could probably use something similar to the semantics of an ephemeral node in ZK
So when an item is added to the queue, it is distributed to all nodes in the cluster. When a consumer "takes" the item, you just put an ephemeral lock on the item in the queue that gets released if communication to the taker is lost. When work completes, the consumer acks and the item is deleted from all nodes.
If communication to the taker is lost, or the processing the item errors out (and notifies the other nodes it errors out), the lock would be released and the item would be given to the next taker.
Tyler Foster
@tfoster
Jan 05 2017 20:47 UTC
You have the issue of partial failures, where an item is taken and partially processed before a failure (it may have caused), so you'd like want a failed work queue
So that clean up can be performed before the item is reattempted
Jordan Halterman
@kuujo
Jan 05 2017 21:03 UTC

Yep... that all sounds good and is simple to do in Atomix. This should be similar to how the DistributedGroup queues work internally now. When a message is stored, it's published to one or all of the client's via session.publish. If the client's session expires before an ack is received, its re-sent to another session and so on. Session expirations work the same as ephemeral nodes in ZooKeeper, but the benefit is that the state machine (the servers) is the coordinator, rather than a client needing to coordinate using low level primitives like ephemeral nodes and watches. That means less coordination between clients and servers. When a client's session expires the state machine can decide what to do. The state machine always has a fully consistent view of the state of clients as well, meaning the state machine can select a consumer to send a message to based on how many messages are in transit to/being processed by all consumers. It seems to me doing this in ZooKeeper requires a lot more coordination between client and server, e.g. elect a leader to control all consumers, set ephemeral nodes and watches, when an ephemeral node is lost send an event to the coordinator which then has to communicate with the servers to decide what to do with messages that weren't fully processed. If two coordinators manage to exist at the same time - which is impossible to prevent - they could conflict with each other. In Atomix that can all be done in the state machine with no additional communication and a guaranteed single system view. This seems to be a totally different way of thinking about solving these types of problems. All the consensus services I see seem to prefer providing primitives like ZooKeeper's and allowing clients to use them to coordinate state changes. Atomix is a different paradigm in that almost all coordination for each of the patterns is done entirely inside the consensus algorithm.

Good point on failed work queues. That's really helpful. I think we can integrate these ideas into a sort of work queue resource.

Atomix' lock state machine is probably the best illustration of the way it's handled differently than how a lock would be built in ZooKeeper:
https://github.com/atomix/atomix/blob/master/concurrent/src/main/java/io/atomix/concurrent/internal/LockState.java
Tyler Foster
@tfoster
Jan 05 2017 21:27 UTC
Nice, that LockState implementation is pretty straight forward. WRT to providing primitives that clients use to orchestrate more complex tasks, the argument is generally a about scale. If work increases with each client, the more work you can do on the client, the more predictable your performance at scale. I think Atomix and RAFT in general are really good in cases where for the most part, the nodes in the cluster are the clients.
I'll get the DistributedQueue changes in ASAP and get out a PR. Do you have a contribution guide that I should follow?
Tyler Foster
@tfoster
Jan 05 2017 21:33 UTC
Also, what's your release schedule / release process?
Jordan Halterman
@kuujo
Jan 05 2017 21:41 UTC

Yeah... I do need to make a contributor guide. We just got a critical release into Copycat yesterday and several other bug fixes in Copycat and Atomix so I've been planning on releasing them ASAP. But I do want to make sure the added resource events are in the release since people have been asking for them for a long time, and this is part of that. I have some changes around here somewhere that should make the addition of these events really straightforward. I think I need to get those merged, then it should be simple to add queue events - a few lines of code and tests really.

My release process is: release it when it feels right, which in general is when someone from ONOS requests a release, a major bug fix is merged, a major feature is done, or enough minor bug fixes are done. All of those conditions have been met this time.

Critical bug fix into Copycat*