These are chat archives for atomix/atomix

28th
Dec 2017
Jordan Halterman
@kuujo
Dec 28 2017 01:49
@BITjiangrui Atomix can be run in a separate process. It’s a distributed system so it doesn’t care where clients are. But in ONOS specifically its embedded for performance reasons. If an ONOS process crashes then Atomix is also crashed, but that’s why we have fault tolerance. Even in a single node cluster, a crashed ONOS node should not mean lost data. Atomix persists data in a commit log, so when a node recovers the log will be replayed and state rebuilt.
@publicocean0 it seems like you got this now, but the 2PC protocol is internal to transactions. When a transaction is committed, the 2PC protocol is run internally using all the participants (primitives and partitions) and the result is returned (success/failure). Internally, prepare is run on all participants and then commit depending on whether the prepares all succeeded. There’s no reason that protocol has to be explicitly exposed to the user. The begin/commit pattern is pretty common to database transactions.
But there is a bug in the 2PC implementation that has yet to be documented. Currently, the failure of a coordinator (the node committing the transaction) can cause locks to be held forever. We still need to add replication transaction info and failure detection for coordinators to failover and commit/rollback transactions ion another node.
Jordan Halterman
@kuujo
Dec 28 2017 02:07
Hmm actually I’ll add a bug for that so it’s not forgotten
10 pull requests!? 😳
Jordan Halterman
@kuujo
Dec 28 2017 02:20
👏
Johno Crawford
@johnou
Dec 28 2017 17:09
@GEverding I haven't touched clojure before, sorry
Jordan Halterman
@kuujo
Dec 28 2017 17:36
Yeah, I don’t fully understand it. Sometimes I can parse it if I look hard enough, and we’ve had a few Clojure lurkers around.
@jhalterman
Johno Crawford
@johnou
Dec 28 2017 17:51
@kuujo the phi timeout check in netty seems, interesting, is it to guard against response spikes, or?
I was running a stress test and it blow up with Caused by: java.util.concurrent.TimeoutException: Request timed out in 8 milliseconds
at io.atomix.messaging.impl.NettyMessagingService$RemoteClientConnection.timeoutCallbacks(NettyMessagingService.java:879)
8 ms timeout seems a bit harsh :D
Johno Crawford
@johnou
Dec 28 2017 20:38
@kuujo think i'll stop at 16, getting to the point where I might cause merge conflicts by changing more code :(