These are chat archives for atomix/atomix

29th
Mar 2016
Jordan Halterman
@kuujo
Mar 29 2016 11:11
@madjam atomix/copycat#205 has the pipelining implementation. I mocked latency to prove that it does significantly aid performance in many cases. FYI I also experimented with a bunch of other performance improvements. In particular, I implemented thread-per-follower for sending AppendRequests from the leader and separate log threads, but none of those optimizations had any positive impact on performance because of the concurrency control it necessitated in the log. Because so much of the algorithm relies on strict order, a lot of the benefits of concurrency are subdued. Pipelining and flushing to disk on commits proved far more beneficial to performance than anything else.
But there are some other areas that can be significantly optimized in the log. Currently, the log index is optimized for sequential reads. When an index is read, the index and position are cached, and the next read first checks the next index before binary search. But when a leader is sending requests to followers, it still jumps around the log a lot. For each follower it reads the log sequentially, but for multiple followers a search of the index is required on each request. A per-follower log Iterator would prevent almost all binary index searches (except when an append fails) on the leader.
Other obvious improvements would be not deserializing entries when sending to followers. Entries can be sent in binary form.
Richard Pijnenburg
@electrical
Mar 29 2016 11:21
Deserializing is only required when the client needs the data right?
Jordan Halterman
@kuujo
Mar 29 2016 11:52
State machine too. Clients submit state machine operations which are logged and replicated and applied to state machines. Those only need to be deserialized when applied to the state machine. Right now they're deserialized and then reserialized when the leader replicates to followers. Clients deserialize state machine output.
IMO the log Iterator per follower will have a bigger impact on performance though, so I think that's first. We can potentially even move immutable (committed) indexes to disk like I did with memory mapped files. Memory mapped logs are only mapped until committed and compacted, and then they're moved to disk only. Binary search is extremely expensive on disk, but if we remove most binary search with Iterator-per-follower immutable indexes can be moved to disk.
Jordan Halterman
@kuujo
Mar 29 2016 12:05
Copycat tries to take advantage of the way Raft reads logs to optimize the parts that are accessed most and save resources on the parts that are accessed least. That's why memory mapped logs are only mapped at the tail of the log. Logs are always read sequentially on each follower and sequentially per follower on the leader. The index is optimized for reading sequentially, but not on a per-follower basis. Similarly, random access is only needed at the tail of the log for things like consistency checks in generally uncommitted segments. Concurrency control in the log makes assumptions about how it's accessed. Writing to the log is not thread safe, but the algorithm is designed such that one thread will ever write to any segment while still allowing multiple log compaction and server threads to write to the log concurrently.
Guess I just went on a tangent
Richard Pijnenburg
@electrical
Mar 29 2016 12:45
haha. It does give a lot of insight :-)
Jordan Halterman
@kuujo
Mar 29 2016 12:48
I don't think those things have made it into the internals docs yet
Richard Pijnenburg
@electrical
Mar 29 2016 12:48
documentation is always lagging :-)
Jordan Halterman
@kuujo
Mar 29 2016 18:29
@madjam @jhalterman I want to release Copycat 1.0 some time this week.
Objections?
Richard Pijnenburg
@electrical
Mar 29 2016 18:49
I don't if my vote counts lol
Jonathan Halterman
@jhalterman
Mar 29 2016 20:10
@kuujo HN first then release?
i sense there will be more users and issues coming up after HN
Jordan Halterman
@kuujo
Mar 29 2016 20:10
idc
either way… I just want to get 1.0 out the door. It feels ready. Of course there will be bugs. But we seem to have worked out most of the significant ones now, and new users aren’t going to change anything that would break a release
Jonathan Halterman
@jhalterman
Mar 29 2016 20:14
good with me
the main risk isn't bugs - there will be - it's API stability
and the sense is that it's pretty good
Jordan Halterman
@kuujo
Mar 29 2016 20:14
right
It’s a really small API in Copycat really. start, stop, connect, close, submit, onEvent and the state machine
I feel confident
Jonathan Halterman
@jhalterman
Mar 29 2016 20:18
it shows :shipit:
Jordan Halterman
@kuujo
Mar 29 2016 22:26
So, doing some more profiling of Copycat. One of the assumptions I made about the log was that the synchronized methods would have little to no impact on log management aside from the log compaction threads because of biased locking. The server thread should have no trouble getting locks. But what I didn't realize is that the log is also accessed for releasing entries from the state machine thread, and furthermore that because the state machine can release entries at arbitrary and unspecified points in the log at any given time, binary search becomes necessary for every entry after a segment is compacted. One interesting idea is that an entry's offset in a segment could be stored in the entry itself. That would preclude the need for an index lookup when releasing the entry. But log compaction still poses a challenge since offsets change after compaction. So, offsets could be updated in entries, but that would require tracking referenced entries in the log, and that's probably too great a task for little gain.