These are chat archives for atomix/atomix

26th
Apr 2017
Jordan Halterman
@kuujo
Apr 26 2017 01:34

@qorpus on the first example, does the lock() call ever complete?

I'm actually rewriting the threading model right now. All Copycat/Atomix events are currently completed on the same thread, and if a thread is blocked then they're completed on a backup thread. This is what ZooKeeper does as well, and it's necessary to enforce order on clients, e.g. to ensure the user sees a lock event before an unlock event and vice versa. But I think there are some bugs with how it's implemented.

In ONOS, I rewrote the threading model to use a separate logical thread per-resource-partition because we were seeing timeouts from the threading model. Eventually, that threading model will make its way back into Atomix in a few months.

But that doesn't explain the last example. We were actually just recently working on the DistributedLock and found there are some issues with it in certain environments. I wrote a lot more thorough tests for our lock wrapper, and a few of them failed consistently on our Jenkins server. I haven't been able to reproduce those issues in an environment that allows me to debug the issue, but I suspect there's something wrong with the lock implementation.

@qorpus do you mind enabling TRACE logging for io.atomix.copycat and sending the logs? We abandoned the pessimistic lock in ONOS in favor of more improvements to transactions, but I suspect this is still a problem I'm going to have to solve. I just need the logs to do it.
terrytan
@txm119161336_twitter
Apr 26 2017 03:10
@kuujo Hi,Jordan , nice to meet you here ,previously , i was talking to you on google forum . Thank you for supporting us these days . You said ,the promotion is not important ,i am a little bit confused about adding servers process? if we add one more server to cluster(3 nodes ),by the current logic ,it will change the qurom size immediately,and the joining server will spend a period of time to catch up the leader
Johno Crawford
@johnou
Apr 26 2017 07:50
@kuujo do you have a link to the threading model rewrite for ONOS? i'd be interested in checking it out
Johno Crawford
@johnou
Apr 26 2017 09:17
i'm looking at integrating atomix with Orbit orbit/orbit#230 but I do wonder if I am taking the correct approach to dealing with completable futures, essentially every time I use an async op on a library, such as atomix (what I hope to use for the distributed cache and messaging) is that once the completable future returns / completes the thread might belong to the other library eg. atomix / netty is that if you start using the return value / response of the cf in the application with things like blocking IO they may block the netty event loop and cause problems..