These are chat archives for atomix/atomix

14th
Dec 2016
Thiago Santos
@thiagoss
Dec 14 2016 12:44
I have a patch but I'm still not happy with it, will try to make it cleaner and test with my ONOS scenario. Right now it already passes all copycat tests.
Jon Hall
@jhall11
Dec 14 2016 19:16
great!
Thiago Santos
@thiagoss
Dec 14 2016 21:13
It seems it is not as easy as I thought. When expanding the solution to also reject Commands and Queries, the client is connected but the Follower server hasn't received the Publish event about the client connecting, but the client is already sending commands/queries and those are being rejected as the session is only set when the publish event is received.
Checking for local connections could also work but I guess this can be racy depending on the order connections and publishes happen
Thiago Santos
@thiagoss
Dec 14 2016 22:08
We could try rejecting them on the leader itself but it would mean tracking request origins in them
Thiago Santos
@thiagoss
Dec 14 2016 22:13
@jhall11 @kuujo Any ideas?
Jon Hall
@jhall11
Dec 14 2016 23:06
Do you have code I could look at?
Thiago Santos
@thiagoss
Dec 14 2016 23:06
Let me put it up
@jhall11 waltznetworks/copycat@935fe0f
Jon Hall
@jhall11
Dec 14 2016 23:13
I’m curious, if you apply this to only the keep alive, does your cluster recover?
Thiago Santos
@thiagoss
Dec 14 2016 23:14
The atomix tests pass but I didn't try with the cluster yet. Shouldn't it fail if the client keeps bombarding other requests (unlikely)?
I'm going to give it a spin tomorrow, but wanted to find a generic solution
Jon Hall
@jhall11
Dec 14 2016 23:16
i think with just rejecting keep-alives, it will take longer for the session to time out, but it should eventually. I don’t think ONOS will continue to send commands or queries after it gets into a stable state.
Thiago Santos
@thiagoss
Dec 14 2016 23:17
Checking against local connections in the server could also be done, but I still need to think properly if it opens any windows or opportunity for a connection to never expire. I guess the connection entries when published should kill wrong connections on the servers
The issue is my ONOS scenario is very write-intensive. Lots of rules change every second
Jon Hall
@jhall11
Dec 14 2016 23:18
The only thing I’m thinking of now, is to have the client wait for some sort of ack from the server before using the session. But that seems heavy handed to me
Thiago Santos
@thiagoss
Dec 14 2016 23:19
yes, could increase latency a bit when connecting
My backup plan is to have PublishEvents for Connections to kill local connections from clients if they don't belong to that server
Client X is connected to Server A, if Server A gets a message from the leader that Client X is connected to Server B then it kills that connection.
Jon Hall
@jhall11
Dec 14 2016 23:24
yes, I think this will probably be the correct approach. I’m trying to think of a way where a newer connection event( in terms of the log) would ever be older than a client connection in real time( which would close the newer connection with this approach)
Thiago Santos
@thiagoss
Dec 14 2016 23:29
That can indeed happen given the right sequence of events
My impression is that, if the client has a backoff time, it will stabilize on the next connection attempt
Thiago Santos
@thiagoss
Dec 14 2016 23:50
@jhall11 I'll let this solution floating around my brain for tonight. If I don't think on alternatives or problems with it I'll that tomorrow. Let me know if you find any flaws in it.