These are chat archives for atomix/atomix

28th
Feb 2016
Richard Pijnenburg
@electrical
Feb 28 2016 01:52
@kuujo i just added you as colaborator to the private repo.
this is just the very beginning and got loads of stuff to add.
this was mainly to try out some basic things and have sort of a bootstrap in there
so i could start trying things out
removing of the thread sleep helped btw :-)
Jordan Halterman
@kuujo
Feb 28 2016 05:01
sweet
Jordan Halterman
@kuujo
Feb 28 2016 09:07
added something pretty interesting to the DistributedGroup resource
Jordan Halterman
@kuujo
Feb 28 2016 10:23
It was that >>
Consistent hashing/partitioning in DistributedGroup :-D
Richard Pijnenburg
@electrical
Feb 28 2016 11:00
What does that do? Sorry just woke up lol
Jordan Halterman
@kuujo
Feb 28 2016 11:04
Basically, that's how a lot of scalable databases store data. Use consistent hashing to map a key to a set of nodes. Consistent hashing is a really simple algorithm that ensures when nodes are added or crash, only a minimal amount of data has to be repartitioned. So, consistent hashing in DistributedGroup means you can use it to e.g. create a linearly scalable map.
Basically, it's just an API that maps an object to a set of nodes, and that can be used to store data on that set of nodes.
Richard Pijnenburg
@electrical
Feb 28 2016 11:09
Ahh okay I see. Same what elasticsearch does. It uses consistent hashing with the primary shards
Richard Pijnenburg
@electrical
Feb 28 2016 11:22
Downside at least on the elasticsearch side is that you can't repartition the data. Can you do that ? For example if you go from 1 to 3 master nodes ?
Jordan Halterman
@kuujo
Feb 28 2016 11:53
Well, I think in ES that’s largely a result of integrating with Lucene. In general, there’s no reason you can’t move partitions around. Indeed, consistent hashing is designed to allow nodes to be added and removed and to move the minimal amount of data necessary to balance the cluster. For instance, the way DistributedGroup is set up, you could have e.g. a 3 node cluster with 3 partitions and a replication factor of 2. That would mean each partition has two nodes. Partition 1 is assigned to nodes A and B, partition 2 to B and C, and partition 3 to partitions C and A. It won’t necessarily work out that, but if you’re using a good hashing algorithm and virtual nodes it will get close.
Anyways, consistent hashing uses what’s called a consistent hash ring. The nodes A, B, and C are placed in a logical ring (a data structure that wraps around - TreeMap in Atomix). Then, you hash a partition to a point on that ring. So, if partition 1 falls right before node A on the ring, and the replication factor is 2, then it will be assigned to nodes A and B since those are the next two nodes in the ring. So, when you add a node to the ring, all that happens is the partitions closes to the node being added need to be moved around, and the others stay in the same place. So, if you add a node D after node C, then partition 3 will be assigned to nodes C and D instead of C and A. Thus, you’ll move a partition from node A to D when it joins. In the opposite direction, if node C is instead removed, then partition 3 will map to nodes A and B. Because partition 3 was mapped to nodes C and A before C left, we can copy the partition from the remaining replica node A to node B. Does that make sense?
It’s easier to understand with graphics.
Richard Pijnenburg
@electrical
Feb 28 2016 11:55
Makes complete sense yeah. I'm wondering though that if you want to go from 3 partitions to more. or does that make no sense to do?
Jordan Halterman
@kuujo
Feb 28 2016 12:08
Some systems do shard splitting, but I think in order to do that you can to track the shards and do lookups. S3 splits shards pretty intelligently based on request rates. It would actually be possible to support shard splitting in Atomix too since it can store that state and push changes to the client. But to do shard splitting with the way it's currently set up I suppose it would have to be application specific. It sort of depends on whether the data can be split deterministically, and it would still require basically rewriting all the data.
Richard Pijnenburg
@electrical
Feb 28 2016 12:08
Ahh yeah indeed.
In the case of my project i think it won't be used that much. the data that atomix would store is mainly about configs for nodes, tracking of locks, and some state information of plugins.
Don't think this sharding would speed up the task queue resource?
Jordan Halterman
@kuujo
Feb 28 2016 12:14
Not right now. ATM it's only implemented as a utility for users, but I'm working that way. Any of the message queues can be partitioned, and things like maps and sets can be partitioned. Bit it's a lot of refactoring of internals to make pieces more abstract. I will probably take my time. More interested in finishing the website and documentation and release. That will probably be the 1.1 release - slightly weaker consistency models.
Richard Pijnenburg
@electrical
Feb 28 2016 12:15
Makes sense yeah
to make my project work i do need to find a performant way of sending messages to nodes at some point
first version will most likely do a single pipeline per node
Jordan Halterman
@kuujo
Feb 28 2016 12:17
Gotta sleep
Richard Pijnenburg
@electrical
Feb 28 2016 12:17
night bud. catch you later
Richard Pijnenburg
@electrical
Feb 28 2016 22:20
@kuujo opened some issues on my project. if you hav any cool ideas or things i should look at, feel free to comment on anything
Jordan Halterman
@kuujo
Feb 28 2016 22:20
Indeed checking them out right now actually
Richard Pijnenburg
@electrical
Feb 28 2016 22:21
hehe cool. thanks :-)
just been opening issues with things that pop up in my head about it lol :-)