These are chat archives for atomix/atomix

1st
Feb 2017
Jordan Halterman
@kuujo
Feb 01 2017 00:21
Sure @jhall11 taking a look at it right now
Jordan Halterman
@kuujo
Feb 01 2017 00:35
;-)
just have to put my thinking cap on for a bit
Jordan Halterman
@kuujo
Feb 01 2017 00:46
hmmm…. there may be a more elegant way to fix this issue actually
Jordan Halterman
@kuujo
Feb 01 2017 01:17
First I was thinking we could use a logical timestamp in ConnectRequest to ensure only the most recent connection is stored. But actually, ConnectRequest may not even be a necessary aspect of the protocol any more. IIRC is was a product of linearizability guarantees in session events, and those guarantees are no longer. So, we may be able to simplify the client communication by removing that altogether. I’m going to give it a shot
less is more :-)
Jon Hall
@jhall11
Feb 01 2017 01:19
nice! Its always good to get rid of code
Jordan Halterman
@kuujo
Feb 01 2017 01:34
ahh yes this is nice
I think the commit I just pushed should be pretty close
just need to run some tests
Jordan Halterman
@kuujo
Feb 01 2017 01:50
atomix/copycat#275
Jon Hall
@jhall11
Feb 01 2017 01:51
cool, taking a look now
Jordan Halterman
@kuujo
Feb 01 2017 01:58
I still need to poke around a bit so I can feel comfortable but I think this is right
Jordan Halterman
@kuujo
Feb 01 2017 02:10
there are a couple of Atomix bugs I’ll try to tackle tonight too
Jon Hall
@jhall11
Feb 01 2017 02:45
I’ll be running some more tests tonight/tomorrow, but my basic sanity tests all pass with these changes
Jordan Halterman
@kuujo
Feb 01 2017 03:02
Sounds good
Jordan Halterman
@kuujo
Feb 01 2017 05:06
I cleaned up a few other issues in Atomix I’ve been meaning to get to. I can push new releases any time ONOS needs them. Have to get back to hacking on work work now though :-)
Let me know if you guys need anything else or if that PR is not looking right. I think it should be a pretty nice improvement
Jon Hall
@jhall11
Feb 01 2017 05:23
Awesome! I’ll let you know how the testing looks
Jon Hall
@jhall11
Feb 01 2017 19:25
I’ll need to investigate more, but I’m getting more timeout exceptions with these changes than I was before
Jordan Halterman
@kuujo
Feb 01 2017 20:47
Hmm... that seems odd. The client times out and switches servers more frequently? Thinking about it...
Jordan Halterman
@kuujo
Feb 01 2017 21:02
I might just need to force it to send a keep-alive after connecting to a new server. I can imagine a scenario where failing to associate a connection with a session quickly enough could result in more timeouts after connecting to a new server. If the state machine is publishing events to the client, the client can't progress until events are received. That can cause commands/queries to take longer than the request timeout until the next keep-alive is sent. That should definitely be done anyways, so I'll update the PR and we can retest.
Hmm
Damnit Gitter seems to have lost my message. Ugh. Stupid distributed systems ;-)
So... I think the problem may be the time it takes between connecting to a new server and sending the first keep-alive. The client should be sending it right after connecting to a new server to associate the connection with the session. Otherwise, I can imagine a scenario where requests timeout due to the lack of ability to progress until the client sends a keep-alive, particularly if the state machine is sending events.
I'll push a fix for this, which definitely needs to be done.
Jordan Halterman
@kuujo
Feb 01 2017 21:14
k… actually I think I may have a better way to go about this. Will push it in a sec
Jordan Halterman
@kuujo
Feb 01 2017 21:29
@jhall11 I think #276 may be a better implementation. It preserves the protocol clients use to update their connection information with the servers, but still removes the replication/consistency checks servers use to clean up connection information.
I think that should still resolve the issues in #260 but keep the client’s behavior otherwise the same as before
Jon Hall
@jhall11
Feb 01 2017 21:30
ok, I’ll look at that now
Jordan Halterman
@kuujo
Feb 01 2017 22:57
I guess Gitter didn't lose my first message after all :-)
Jordan Halterman
@kuujo
Feb 01 2017 23:06
@jhall11 does this mean I get an ONOS fleece jacket :-P
Jon Hall
@jhall11
Feb 01 2017 23:07
I think it is highly likely
Jordan Halterman
@kuujo
Feb 01 2017 23:08
:clap:
and ONOS-5129 is already fixed?
Jon Hall
@jhall11
Feb 01 2017 23:14
I closed that bug after the last atomix release, but the atomix/copycat#260 is really being tracked under ONOS-5902. Although they are really the same. Or ONOS-5129 was caused by a few bugs, #260 being one of them
Jordan Halterman
@kuujo
Feb 01 2017 23:17
gotcha thanks
Jon Hall
@jhall11
Feb 01 2017 23:18
You changed your avatar, it is disorienting :)
Jordan Halterman
@kuujo
Feb 01 2017 23:18
lol
imagine if I did that every five minutes