These are chat archives for atomix/atomix

20th
Mar 2018
Jordan Halterman
@kuujo
Mar 20 2018 00:59
I’m finishing up work on Atomix 2.1 now as we’ll begin to integrate it into ONOS in a few weeks. Once we have our feature freeze in ONOS I’ll be finishing the primary-backup protocol, tests, documentation, etc.
DIGAT Inventions
@digat
Mar 20 2018 08:21
is Atomix Ready for production mode?
Colas
@LitlBro
Mar 20 2018 17:15
hi,
on atomix-all 1.0.0-rc9, sometimes the completableFuture for distributed primitive (like DistributedLong) hang and never complete.
it seems to happen if multiple instance try to access a distributed variable at the same time (multiple distributed Long with the same name or not)
I have a homogeneous cluster/application so it happen regularly. Do you know if it is an issue (and if yes, has it been corrected in atomix 2.0 ? )
example :
node().getLong("myLong").get().incrementAndGet().join()
where node is an AtomixReplica. this piece of code will never complete. (as I stated, there are 3 nodes that run the same piece of code)
Jordan Halterman
@kuujo
Mar 20 2018 17:30

@digat we use everything but the primary-backup protocol (which is new) very heavily, and we’re in the process of deploying it in production in some very large networks. It is in production currently, but at a smaller scale than it will be by the end of the year. Currently all my time is invested in productionizing Atomix for large scale production deployments.

The Atomix Raft implementation in particular is stable and production ready. But the rest of Atomix 2.1 is actually in a state of flux right now. We’ve been in the process of refactoring the code base to consolidate our custom primitives, partitioning, protocols, etc in Atomix 2.1. That’s almost done, but Atomix 2.1 as a whole won’t be called production ready until we get it deployed in the same environments in which its various pieces are currently deployed.

Basically, Atomix 2.1 is the combination of a lot of code from Atomix and ONOS that’s currently being deployed to production, but don’t yet feel confident after the refactoring that it’s itself production ready.

TBH so much has been fixed and improved for our production deployments in Atomix 2.x that it’s better to assume it’s fixed in 2.x than not. We’d have to get TRACE logs to see what’s happening in that cluster, but I haven’t yet encountered any Atomix 1.x issues that haven’t been resolved in 2.x
Colas
@LitlBro
Mar 20 2018 17:40
thanks for the quick reply !
(I was trying to decide if it was better to focus on the application layer on top of atomix, or if I needed to work on switching to 2.x in the first place, you just gave me a proper answer ^^)
@kuujo the fact is there is no stack trace, I pipe all stack trace into two file (stderr, stdout) and nothing appear. That is why I find it weird
Jon Hall
@jhall11
Mar 20 2018 17:55
There were a few exceptions being incorrectly handled that got fixed in 2.x, that might have been one of them. It also could have been a deadlock
Johno Crawford
@johnou
Mar 20 2018 19:05
@jhall11 sounds like the most probable cause, cf is interesting like that eg. thenApply an exception could be swallowed and a cf chain can break etc.
Jordan Halterman
@kuujo
Mar 20 2018 19:06
We’ve been completely off 1.x and focusing full time on improving 2.x for the better part of a year now