These are chat archives for atomix/atomix

25th
Jul 2017
Jordan Halterman
@kuujo
Jul 25 2017 02:26
3 node cluster
7 node cluster
Dual Intel Xeon E5-2670v2 2.5GHz Processors - 10 real cores/20 hyper-threaded cores per processor;
32GB 1600MHz DDR3 DRAM;
1Gbps Network interface card;
Ubuntu 14.04 OS;
Java HotSpot(TM) 64-Bit Server VM; version 1.8.0_31
JAVA_OPTS="${JAVA_OPTS:--Xms8G -Xmx8G}”
This is all write throughput. Still need to update our benchmarks to do read throughput (which will be much, much, much higher)
Jordan Halterman
@kuujo
Jul 25 2017 02:33
These benchmarks also use blocking clients. We could just as easily get the write throughput around that high using a much smaller number of asynchronous clients.
Jordan Halterman
@kuujo
Jul 25 2017 03:17
@jhall11 I still have more benchmarks to run, but these numbers seem to imply we may want to consider saturating the cluster with more partitions. I did see throughput begin to suffer with 9 partitions on a 3 node cluster, but that is likely due to hotspotting as much as anything. Implementing leadership transfer/leader balancing could well allow more partitions to scale better. The only down side is transactions are more likely to span more partitions and so will likely suffer.
Jon Hall
@jhall11
Jul 25 2017 17:54
very interesting
Andrew Audibert
@aaudiber
Jul 25 2017 21:29
Hey @kuujo , I have a question about maybeInstallSnapshot - it says "If the latest snapshot is non-null, hasn't been installed, and has an index lower than the current index, install it". The state machine should have applied all commits up to the current index, so what is the purpose of refreshing its state by applying a snapshot? If snapshot 3 represents commits 1, 2, and 3, and the state machine has applied 1, 2, and 3, why reinstall 1, 2, and 3 from the snapshot?
Jordan Halterman
@kuujo
Jul 25 2017 23:05

Good question. It's because of how Atomix takes and installs snapshots. What if instead of 1, 2, and 3, entries 101, 102, and 103 were applied? Within an Atomix server is many state machines. When the log rolls over to a new segment, snapshots are taken of each state machine with no snapshots overlapping. So, a state machine's snapshot may be a few entries after the start of the log. Once all snapshots are complete, segments prior to all snapshots are deleted.

So that means the Raft log may start at index 101, but the state machine's snapshot may be at index 103. But that snapshot represents all the state from entry 1-103, so when it encounters a snapshot it installs the snapshot.

We could also just skip applying entries 101, 102, and 103, but it's possible for a snapshot that doesn't exist on a follower to be replicated after entries 101-103 are applied.

Jordan Halterman
@kuujo
Jul 25 2017 23:13

In other words, after compaction the log may look more like:

101
102 (snapshot1)
103
104
105 (snapshot2)
106 (snapshot3)
107
108 (snapshot4)
109
110

where each snapshot is for a specific state machine. Entries prior to that snapshot may be applied to that state machine, but only the snapshot represents the entire history of state up to that point since the entries prior to index 101 have been removed from disk

@aaudiber
Jordan Halterman
@kuujo
Jul 25 2017 23:32
Atomix read/write throughput
Jordan Halterman
@kuujo
Jul 25 2017 23:55
read throughput is a little lower than I think it should be TBH