These are chat archives for atomix/atomix

Nov 2016
Mark Elliot
Nov 21 2016 17:45

@neverfox thanks. I think I'm probably explaining what we're doing a bit poorly, in the path we're in we don't necessarily try to make a commit to the log, so I'm not sure we get the leadership check we're looking for. Our goal is just to hand out monotonically increasing numbers.

Two approaches we're thinking of:

  1. Use DistributedLong, always just incrementAndGet() (worried about throughput of this)
  2. Use a DistributedLong to track a ceiling, hand out numbers up to the ceiling, and only from a designated leader

In (2), we're worried that if the leader loses leadership, it might still hand out new numbers while another leader election takes places and a new leader/ceiling is set. We'd like to get some guarantees around the leader detecting it is no longer the leader. Our non-Atomix-implementation checks for leadership at the beginning of every request (this is relatively cheap compared to setting a new ceiling), but it's not clear (a) how to replicate this behavior with Atomix or (b) if it's necessary.

So I think that begs a couple of questions:
(a) can two nodes believe they're the leader, or do the election cycles guarantee this doesn't occur?
(b) if (a) is true for some overlap, is there a way to ask "is this node still the leader?"

re: sacrificing availability: we'd be comfortable with reasonable short availability loss in the face of a node failure.

Jordan Halterman
Nov 21 2016 19:18

Two nodes can certainly believe themselves to be the leader, and because of GC that's basically impossible to prevent. So, the way that case is handled is by using the epoch that's provided in Atomix' leader elections: the term. While Atomix can't guarantee two nodes won't believe themselves to be the leader at the same time, it can guarantee that no two leaders will ever have the same term and that terms are monotonously increasing. So, when you pair the term with your monotonically increasing number you can ensure consistency. For example, if one client's gets the number 10 from a leader with term 2 and another also gets the number 10 but from a leader with the term 4, you consider the second client to have the consistent value since 4 > 2. Furthermore, since 10 is also a monotonically increasing number that was consistent in term 4, any number greater than 10 with a term less than 4 can be considered inconsistent and that client should request a new number. Because Atomix (or more specifically Raft) guarantees that no two leaders will have the same term and that terms are monotonically increasing, performing this "fencing" ensures that only numbers from the most up-to-date leaders are being used. Often, this type of thing is done with databases that support CAS operations.

It's certainly also possible to check whether a leader is still the leader. That hasn't been added to the API but is simple to add, and indeed checking whether a node is a leader is much more efficient than writing an increment to disk. But it doesn't totally solve the problem, which is why it hasn't been implemented in the Atomix API. It's still theoretically possible for a client to check whether it's still the leader and have e.g. a long GC pause between the time it gets the response and the time the client code reacts to the fact that it's still the leader. So, while not likely, it's still possible for a node to believe itself to be the leader after checking but for it to have lost its leadership. For that reason, either using DistributedLong or using the term as a fencing token is the only truly safe solution.

Mark Elliot
Nov 21 2016 20:46
Would that mean clients of the thing providing numbers should read from multiple nodes to do the term comparison?
I guess as a practical matter, we can have nodes that don't believe they're the leader throw, and then any node that responds also provides its term, the client would take the highest number provided by the highest term
...and in that case, since we use a DistributedLong as a ceiling, the new leader (one with a higher term) would absolutely provide higher numbers, so we could actually just take the highest number given.
But might be missing something?