These are chat archives for atomix/atomix

11th
May 2016
Philip Lombardi
@plombardi89
May 11 2016 01:25 UTC
Heyo, checking in after a couple months of not paying attention. Has any work moved forward on supporting an additional eventual consistency mode rather than strong consistency? Would love to use Atomix on a new pet project, but I don't want the base of the system to require strong consistency. I asked about this around Christmas time I think and got an answer that it was a desired future feature. Just curious if it's in progress or not even on the roadmap.
Madan Jampani
@madjam
May 11 2016 01:38 UTC
@plombardi89 Any particular reason you want a weakly (or eventually) consistent system as the base line?
Jordan Halterman
@kuujo
May 11 2016 01:41 UTC
Not even on the road map for sure. It's likely not something that will even be considered this year. The weakest consistency you'll get is sequential consistency. There's just too much complexity in adding new consistency models outside the scope of Raft and too much to do within the scope of Raft to justify it. I think Atomix will be a system that can facilitate those types of algorithms (e.g. through membership groups) but doesn't provide them itself. I think sharding/partitioning is the more logical next step compared to eventual consistency.
And even that's unlikely to be attempted this year
Primarily because of the consistency issues when operating on multiple partitions (e.g. creating and deleting resources or global operations like DistributedMap.size()
Jordan Halterman
@kuujo
May 11 2016 01:46 UTC
Atomix resources are designed to facilitate different back ends, but they're currently too coupled to Copycat's state machines to make it an way task. I think weaker consistency models can be represented as state machines, but Atomix would need its own abstractions to replace Copycat state machines. For example we could do consistent hashing with partitions a la Dynamo. That's feasible, but I don't think it will ever get to any consistency models that require conflict resolution.
But there's a lot of work implied in replacing Copycat's state machine abstraction with Atomix' own abstraction. Implementing something like consistent hashing properly is a whole other project in and of itself. Some new mechanism is required to handle replication for partitions and handle repartitioning when a node leaves the cluster.
I think any implementation of that type of algorithm now would be half assed so I don't want to consider it until some lower level abstraction can be built, but even then I don't anticipate it happening. Atomix may just provide abstractions that are independent enough for that to be possible but may not actually provide any implementation of alternative replication algorithms.
Jordan Halterman
@kuujo
May 11 2016 01:55 UTC
The reason I mention state machines and conflict resolution and consistent hashing is because for sure Atomix will always rely on ordered inputs that can represent resources as state machines. But the problem with abstracting Copycat is that Atomix resources are dependent on some of the more complex features of Copycat like session events. Session events and even sessions in general become much more difficult to handle with partitioning, which is why I don't anticipate the current algorithms changing any time soon. Partitioning alone causes major problems for consistency in events where there's otherwise a really well-defined consistency model.
I think the closest Atomix may come to weaker consistency models any time soon is providing some mechanism for users to handle partitioning externally so Atomix doesn't even have to handle those consistency issues.
Also, if you want weaker consistency why not use Hazelcast?
Philip Lombardi
@plombardi89
May 11 2016 02:06 UTC
Already using Hazelcast actually, but i'm not a huge fan of how that project is managed.
Also as a Hazelcast user I tend to feel like it's really overly complicated in design and hard to often understand what's going on when things are working as expected. Not to mention the docs are a mess...
@madjam custom service discovery system ... a prototype is here: https://github.com/datawire/discovery .
Basically system should be able to accept writes no matter what
Madan Jampani
@madjam
May 11 2016 02:11 UTC
You are basically talking about a Dynamo style architecture. What is your conflict resolution strategy?
If your complaint with hazelcast is that it is "hard to often understand what's going on when things are working as expected”, then me thinks what you need is a system that offers stronger consistency guarantees :)
Philip Lombardi
@plombardi89
May 11 2016 02:15 UTC
Latest update wins in a conflict. A client should really only ever be talking to one node at a time so if for some reason we end up with conflicting records the most recent one is the one to honor.
Jordan Halterman
@kuujo
May 11 2016 02:18 UTC
There are a lot of ways to define "latest". This is the mess you get into with weaker consistency haha
Madan Jampani
@madjam
May 11 2016 02:18 UTC
Do you only ever have a single writer per key? What is update ordering function? How do the server(s) decide which update is latest? By looking at receive timestamp? If so, who assigns the timestamp?
Jordan Halterman
@kuujo
May 11 2016 02:18 UTC
Jinx
Madan Jampani
@madjam
May 11 2016 02:19 UTC
I’m one of the Dynamo paper’s authors. Talk to me about lessons learned :)
Philip Lombardi
@plombardi89
May 11 2016 02:20 UTC
Single writer per key (tenant + service name + address). Right now latest is by update time into the Hazelcast ReplicatedMap which also means we've left the merge logic to Hz and assume they've gotten it "right".
Jordan Halterman
@kuujo
May 11 2016 02:20 UTC
Haha yes that would be interesting. I want some lessons :-)
Madan Jampani
@madjam
May 11 2016 02:22 UTC
Not to say that there no cases where eventual consistency is useful. Just that they are few and far in between. The choice has to made carefully considering the disadvtanges. Single write persumably is one scenario where it could work. But even there there is scope for things to go wrong.
@kuujo The primary lesson was that conflict resolution is messy and not every application has a sensible strategy. Shopping cart use case could come up with something resonable. But a lot of applications could not.
Madan Jampani
@madjam
May 11 2016 02:31 UTC
@plombardi89 AFAICT Hazelcast uses last updated time to determine which update to keep/discard. As you can imagine this is susceptible to server side clock skew. One way you can potentially make this work with Hz is to include a client created timestamp field in your map value and supply your custom MergePolicy that looks at this client supplied timestamp to figure out recency.
Philip Lombardi
@plombardi89
May 11 2016 02:32 UTC
@madjam Shouldn't clock skew be addressable via keeping clocks synced via NTP?
Jordan Halterman
@kuujo
May 11 2016 02:32 UTC
NTP is only accurate to within a few milliseconds, isn't it?
Madan Jampani
@madjam
May 11 2016 02:33 UTC
That is not a perfect solution. At the end of the day it all depends on what you are building on top it and what it can tolerate.
Philip Lombardi
@plombardi89
May 11 2016 02:34 UTC
It's service discovery for SOA/microsvc archs so ever so slight clock skew seems like it would be acceptable as long as it's managed and kept in check
Roman Pearah
@neverfox
May 11 2016 02:34 UTC
NTP is a problem in a lot of ways. The server could be down, time can jump backwards, etc.
at least for stronger consistency guarantees
Madan Jampani
@madjam
May 11 2016 02:38 UTC
The challenge with “keep badness in check” strategy is that you need to define what badness looks like. Absent a clear definition of that the system can forever be in a undesirable state.
Philip Lombardi
@plombardi89
May 11 2016 02:39 UTC
Hrm... well you've certainly giving me some nice food for thought here
Madan Jampani
@madjam
May 11 2016 02:40 UTC
One way to define that in your case is: the client always has the most authoritative info. Periodically the client will probe server to see what value it has. If it detects a conflict, pushes the correct value.
This could work at the expense of making your client logic sligtly more complicated.
Philip Lombardi
@plombardi89
May 11 2016 02:41 UTC
Clients are more or less constantly pushing values to the server anyways because of TTL's on the records
Right now we require they push at least once every 30 seconds
Madan Jampani
@madjam
May 11 2016 02:43 UTC
So that is your window of staleness. If you can tolerate that then it should ok. Keep in mind that if there is a network partition, readers on the side that does not have the writer (client) can have a longer staleness window.
Philip Lombardi
@plombardi89
May 11 2016 02:50 UTC
Actually that's a problem now that you bring it up. Arch is basically pub/sub with ReplicatedMaps. If the TTL isn't updated on the srv2 by a Publisher talking to srv1 because of a netpart between srv1 and srv2 then a Subscriber on srv2 will eventually be told by the partitioned server that the original record has been expired even if it actually has not been.
Jordan Halterman
@kuujo
May 11 2016 03:11 UTC
Anyways... I think Atomix will never support any algorithms that require conflict resolution. It could support partitioning at some point, but that's not on the agenda because of the complexities of atomic operations across partitions and handling sessions. We could implement 2pc or those issues could just be pushed to users by exposing a partition API. Neither is likely to happen any time soon. All effort in Copycat/Atomix is going to be focused on stability and completing resource APIs (e.g. streams for collections) probably for a couple minor releases.
Jason Savlov
@jsavlov
May 11 2016 05:30 UTC

Hello all... Maven question. If I have the proper dependencies pertaining to atomix in my pom.xml file for my project, why might I get an error like this upon running the project? The answer might be obvious, but I'm new to Maven

Exception in thread "main" java.lang.NoClassDefFoundError: io/atomix/catalyst/transport/Address
at Main.main(Main.java:42)
Caused by: java.lang.ClassNotFoundException: io.atomix.catalyst.transport.Address
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more

Jordan Halterman
@kuujo
May 11 2016 05:32 UTC
You shouldn’t be getting it… that class is a transitive dependency of Atomix that is used throughout Atomix itself:
atomix -> copycat -> catalyst
You might want to look at the examples
those projects build and run fine for me
Jason Savlov
@jsavlov
May 11 2016 05:41 UTC
I can't find anything out of the ordinary... here's my pom.xml file: http://pastebin.com/BzQM3MWw
I'm just trying to confirm I'm not going crazy, I know this should work
Richard Pijnenburg
@electrical
May 11 2016 07:15 UTC
@jsavlov pom looks good
Kevin Daly
@kedaly
May 11 2016 14:37 UTC
Hi, I am considering using Atomix for my project, typically how much memory in heap should I allocate for a node in an Atomix cluster?
Richard Pijnenburg
@electrical
May 11 2016 14:47 UTC
@kedaly it very depends on how large your data set is.
Kevin Daly
@kedaly
May 11 2016 15:51 UTC
it
@electrical It's just for config data and Vert.x clustering so I would assume 1GB would be sufficient?
Richard Pijnenburg
@electrical
May 11 2016 15:52 UTC
I’m not that well known with vert.x to be very honest. but i would expect 1 GB might be enough yeah. would just keep an eye on heap usage and GC patterns
Madan Jampani
@madjam
May 11 2016 16:10 UTC
Assuming usage of a FILE based log, the only thing that can potentially grow with usage is the state machine state. For example, if you have a DistributedMap with a large number of unique keys then you might start seeing some memory pressure. Otherwise base Atomix itself is fairly light weight. I routinely run it with 128MB max heap size
Jordan Halterman
@kuujo
May 11 2016 19:20 UTC
Right. For running a normal Vert.x cluster you hardly need any memory unless your using a MEMORY log. Vert.x just uses it to store node IDs and event bus addresses, and you don't need much more memory than would normally be required for those things. There is some overhead
There is some memory overhead in the log aside from the StorageLevel. The log index is held in memory along with a bit array, but the log index is pretty compact and the bit array is tiny. Atomix/Copycat frequently compacts the log which also frees memory.
You can reduce the memory footprint by reducing the size of log segments in the Storage configuration, but typically the memory footprint even with defaults is really small, like 10 or 20mb in my experience even with log indexes.
Jordan Halterman
@kuujo
May 11 2016 19:25 UTC
Reducing the size of log segments causes them to be compacted more frequently, thus resulting in memory and disk being freed more quickly.
Under normal operation, when a segment is compacted in e.g. the Vert.x cluster, the vast majority of the disk and memory will be freed
Each entry in the log currently requires 129 bits of memory. 8 bytes for the offset. 8 bytes for the position. 1 bit for log compaction processes. A three ode cluster will usually write something like 3 entries per second for sessions, but that can also be reduced by increasing the session timeout this decreasing the frequency with which clients send keep-Alice's to th cluster (but also increasing the amount of time it takes to detect a failure)
Ugh phone typing
Jordan Halterman
@kuujo
May 11 2016 19:33 UTC
The moral of the story is, you should only have to mess with memory of the state machine is holding a lot of it. The memory footprint of the log is almost unnoticeable. For the most part, the state in the state machine is what dictates memory usage. Objects held in the state machine can consume memory and also cause the log to consume more memory since, depending on the state machine, it may keep more entries in the log for longer.
I think in the future the log may be improved to drop indexes that aren't needed. We can determine when an index is/will be needed by commit/match/next indexes. e.g. when a follower's commitIndex is greater than a segment's last index, or when a leader's lowest nextIndex is greater than a segment's last index. That would actually allow the MEMORY log to be more efficient as well.
Jordan Halterman
@kuujo
May 11 2016 19:38 UTC
This is already done to some extent with MAPPED segments. The log only actually memory maps MAPPED segments that haven't been compacted. Once a segment is compacted it's rewritten to a RandomAccessFile that's not mapped.
There are actually ways to make the index much more efficient too. The typical use of the index is sequential, so offsets are unnecessary. Offsets only become necessary once a segment is missing an entry (after compaction) at which time the segment already has a lower memory footprint anyways.
That could be a nice improvement to segment indexes - only index positions unless an entry in the segment is skipped.
I sort of went off on a rant there :-)
Richard Pijnenburg
@electrical
May 11 2016 19:43 UTC
Hehe. It's always good to give a lot of info and what is in your head :)
Jordan Halterman
@kuujo
May 11 2016 19:44 UTC
I'm well known for my "epic rants" as my co-workers call them
Typically at 2am
Richard Pijnenburg
@electrical
May 11 2016 19:45 UTC
Hahaha. They are best late and when you had a few drinks :)
Jordan Halterman
@kuujo
May 11 2016 20:02 UTC
Lol indeed
Jason Savlov
@jsavlov
May 11 2016 20:30 UTC
Is there a reason why using replica.bootstrap(list_of_hosts) infinitely loops through the lists of hosts to connect to when there are hosts that are running?
Jordan Halterman
@kuujo
May 11 2016 20:30 UTC
interesting
what line of code are you seeing that at?
Jason Savlov
@jsavlov
May 11 2016 20:32 UTC
Not sure which line, but this is what keeps repeating --> http://pastebin.com/09tmD3iD
that's console output
Jordan Halterman
@kuujo
May 11 2016 20:33 UTC
gotcha...
Jason Savlov
@jsavlov
May 11 2016 20:33 UTC
Two of those clusters are running, but they too are experiencing the same looping issue
It blocks at this line of code in my program.. atomix.bootstrap(hostList).join();
Jordan Halterman
@kuujo
May 11 2016 20:35 UTC
Some explanations of the logs...
So, when you see Polling members, that indicates that the follower is not hearing from any leader and so it's trying to start a new election. If it can't talk to a majority of the cluster then it will repeatedly do that until it can talk to a majority of the cluster or finds a leader
During that time in Atomix, you'll also see the client periodically trying to connect to different servers. It's trying to find a leader
So, it seems there's just no leaswr
Ugh spelling... Leader
When you bootstrap(...) with arguments, you should bootstrap(...) with arguments on all the nodes in that list
Alternatively, you can do bootstrap() on one node and then join(thatNode) on all the others
If you did something like bootstrap(threeNodes) on only one node then this is the type of log you'll get since it can't be elected leader without the other two nodes
Err the cluster can't elect a leader
Technically without at least one more of those nodes
You can also turn on debug logging and you'll see a lot more of this actually happening. You'll see the client try to connect to a server and try to register it's session and fail, then connect to another one and try the same thing etc
Jordan Halterman
@kuujo
May 11 2016 20:40 UTC
You'll see the follower timing out from not receiving heartbeats from the leader
The debug logs are very verbose
Jason Savlov
@jsavlov
May 11 2016 20:40 UTC
Yup, I saw how changing the number of connected host changed up things. Thank you!
Jordan Halterman
@kuujo
May 11 2016 20:41 UTC
this gives me an idea
could actually be interesting to write up something explaining the logs like that… what means what
could be helpful
Jason Savlov
@jsavlov
May 11 2016 20:44 UTC
That would be VERY helpful.. you just helped me understand a bunch of what has been going wrong all day with my project
Jordan Halterman
@kuujo
May 11 2016 20:45 UTC
I’ll get on it :-)
Jason Savlov
@jsavlov
May 11 2016 20:45 UTC
Now my new problem is, I can't get members to join the group. It prints that they are connected locally, but they never receive remote connects
Jonathan Halterman
@jhalterman
May 11 2016 20:45 UTC
@kuujo that is a pretty good idea actually
Jason Savlov
@jsavlov
May 11 2016 20:45 UTC
CS students everywhere will thank you!
Jason Savlov
@jsavlov
May 11 2016 20:58 UTC
I think I got it working