These are chat archives for atomix/atomix

18th
Apr 2017
josh gruenberg
@joshng
Apr 18 2017 00:23

Greetings, atomixers! I'm kicking the tires on this fine machine, and I've encountered a behavior that looks incorrect... Thought I'd float it here to see if you think it's my problem, or something in atomix that needs fixin':

I'm using a DistributedMap to coordinate some state across my cluster. I've registered callbacks for onAdd, onUpdate, etc, to respond to state-changes. In a test-case, I'm spinning up a handful of clustered AtomixReplicas, then connecting to the cluster via a separate AtomixClient to inject state-changes in the Map and observe the reaction.

In my test-case teardown, I first close the client, then concurrently tell all of the replicas to shutdown. Meanwhile, I've got other threads that are concurrently trying to manipulate the Map. Somewhere in this shutdown routine, I see an error in my logs:

Caused by: io.atomix.copycat.error.ApplicationException: an application error occurred
    at io.atomix.copycat.error.CopycatError$Type$4.createException(CopycatError.java:113)
    at io.atomix.copycat.client.session.ClientSessionSubmitter$CommandAttempt.accept(ClientSessionSubmitter.java:337)
    at io.atomix.copycat.client.session.ClientSessionSubmitter$CommandAttempt.accept(ClientSessionSubmitter.java:302)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
    at io.atomix.copycat.client.util.ClientConnection.handleResponse(ClientConnection.java:155)
    at io.atomix.copycat.client.util.ClientConnection.lambda$sendRequest$1(ClientConnection.java:131)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
    at io.atomix.catalyst.transport.local.LocalConnection.lambda$handleResponseOk$3(LocalConnection.java:104)

...Then, trolling around with a debugger, I discover the cause on the "server"-side:

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at io.atomix.copycat.server.StateMachine.lambda$wrapValueMethod$1(StateMachine.java:374)
    at io.atomix.resource.ResourceStateMachineExecutor.executeCommand(ResourceStateMachineExecutor.java:84)
    at io.atomix.manager.internal.ResourceManagerStateMachineExecutor.execute(ResourceManagerStateMachineExecutor.java:101)
    at io.atomix.manager.internal.ResourceManagerState.operateResource(ResourceManagerState.java:102)
    at io.atomix.copycat.server.state.ServerStateMachineExecutor.executeOperation(ServerStateMachineExecutor.java:144)
    at io.atomix.copycat.server.state.ServerStateMachine.executeCommand(ServerStateMachine.java:859)
    at io.atomix.copycat.server.state.ServerStateMachine.lambda$apply$19(ServerStateMachine.java:801)
    at io.atomix.catalyst.concurrent.Runnables.lambda$logFailure$2(Runnables.java:20)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: session is closed
    at io.atomix.catalyst.util.Assert.state(Assert.java:69)
    at io.atomix.catalyst.util.Assert.stateNot(Assert.java:76)
    at io.atomix.copycat.server.state.ServerSessionContext.publish(ServerSessionContext.java:428)
    at io.atomix.manager.internal.ManagedResourceSession.publish(ManagedResourceSession.java:64)
    at io.atomix.resource.ResourceStateMachine.notify(ResourceStateMachine.java:163)
    at io.atomix.collections.internal.MapState.notify(MapState.java:81)
    at io.atomix.collections.internal.MapState.put(MapState.java:195
... so, by my reading, it appears that one of the various sessions in my system is getting closed, but not removed from the MapState's eventListeners, resulting in an exception the next time an event is being fired into notify. This looks unavoidable; seems as if the ResourceStateMachine ought to be removing dead sessions in its close(ServerSession) callback, but it's not doing that.
publicocean0
@publicocean0
Apr 18 2017 10:32
@kuujo In my system i have also to replicate files and streams saved locally. So following your manual i realized a Command class passing a ImputStream to StateMachine. I implemented a custom serializer for inputstream. My question is about serializer .... This class receives Input like BufferInput/BufferOutput. In my situation it is not good all the buffer is in memory. The file could be very big. So my question is .... BufferInput... save all the buffer in memory in a bitearray or works like a inputStream passing just the data read forward?
Johno Crawford
@johnou
Apr 18 2017 13:13
@jhalterman hope you don't mind off topic discussion here, but couldn't help but notice your recent gist https://gist.github.com/jhalterman/f7b18b30160ae7817bb93894056eb380 ..
oh thanks gitter
are you familiar with any similar algos for a moving sum? in the sense that you would be able to detect a sudden spike eg. disconnections so 100 clients dropping in 1-2 seconds
Johno Crawford
@johnou
Apr 18 2017 16:13
ah nvm, worked it out
Jonathan Halterman
@jhalterman
Apr 18 2017 16:59
hey @johnou I'm not 100% confident in that code - it was just something I was playing with a bit :)
did you figure out an improvement?
Johno Crawford
@johnou
Apr 18 2017 17:00
well I had originally rewrote it to use millis instead of nanos, then wrote a unit test at which point I realised it's not going to do what I actually want
that's what happens when you code before eating breakfast
Jonathan Halterman
@jhalterman
Apr 18 2017 17:01
exactly :smile:
Johno Crawford
@johnou
Apr 18 2017 17:02
completely untested early proto but I think this shows what i'm after
Jonathan Halterman
@jhalterman
Apr 18 2017 18:04
Thanks @johnou will try it out next time I'm hacking on that code
Johno Crawford
@johnou
Apr 18 2017 18:14
@jhalterman is that actually applicable to what you were working on?
what is that code, if you don't mind me asking
Jonathan Halterman
@jhalterman
Apr 18 2017 22:32
@johnou I think I was using this to see if I could add time based circuit breakers (if X failures within Y time window, open circuit breaker) to Failsafe (https://github.com/jhalterman/failsafe). Time based is more tricky because it's a moving window...
But I ended up opting against time based circuit breakers for now