These are chat archives for atomix/atomix

7th
Jul 2016
Max Lord
@maxl0rd
Jul 07 2016 14:22
@kuujo Thanks for taking a look at it. What I was seeing in that example is that sending with REQUEST_REPLY failed immediately, as did SYNC. Sending with ASYNC sent the message, but the reply was null (expected). I'm going to put this aside for a little while, but def want to come back to it.
Roman Pearah
@neverfox
Jul 07 2016 19:15
We're struggling with io.atomix.copycat.session.ClosedSessionException: session closed. We have long-running parallel processes that read/write data to atomix (lots of clients essentially) and we get deep into a job and hit that issue, which can wipe out hours of work since we don't have recovery code figured out. So a few questions: Should this be happening, i.e. is it the kind of exception you should expect on the regular? What's the recommended way to recover from such a problem? Do you need to reestablish the whole client connection or just retry getting the resource or the write?
Madan Jampani
@madjam
Jul 07 2016 19:18
To the question of is this normal (under typical load) the answer from my experience is no. When this happens there is usually something else that is amiss that is potentially causing session keepalives to not occur at a regular candence.
However the system is designed to recover from such errors by reestablishing the session
Roman Pearah
@neverfox
Jul 07 2016 19:19
Interesting
Okay, then we could, at a minimum, just be sure we're catching the exception so we don't bring the house down.
Madan Jampani
@madjam
Jul 07 2016 19:20
What I’m curious about is why and how this wipes our hours of work. That is potentially very bad. Did I miss something?
Roman Pearah
@neverfox
Jul 07 2016 19:20
I just mean that the Exception wasn't caught
so hit killed our program
It's possible to recover but not easily since this is early stage work
not wrong with Atomix
our code
We're at the stage of finding all the ways Exceptions can be thrown and writing the appropriate handlers
Madan Jampani
@madjam
Jul 07 2016 19:25
ClosedSessionException is one of those things that should be rare but is certainly recoverable. From my experience even when running under high load, I rarely encounter session churn. When I did run into it was usually a symptom. (For example the issue underlying atomix/copycat#235 caused these exceptions)
Roman Pearah
@neverfox
Jul 07 2016 19:27
Interesting
@madjam So is the protocol to just try a get or set again?
Assuming I understand you when you say that it re-establishes the session
Madan Jampani
@madjam
Jul 07 2016 20:12
A better option is to monitor client state change events. This you can do by calling CopycatClient::onStateChange You should assume when the client is not in CONNECTED state it will have trouble submitting operations to the cluster. When the recovery logic completes the client should transition back to CONNECTED
Roman Pearah
@neverfox
Jul 07 2016 21:22
@madjam Great. And when is the exception thrown? When doing something like client.getSet("key").join() or set.get(value).join() during a period where the client has lost its session?
So I essentially need to be prepared to catch exception on any operation
But rather than try to recover there, let the recovery happen in onStateChange
and then retry
Madan Jampani
@madjam
Jul 07 2016 21:26
Yes. Operations (queries or commands) can fail with that exception during periods of instability.
But again, if you are seeing these exceptions frequently, something else is wrong. So it is a good idea to keep an eye on how often you see these.