These are chat archives for atomix/atomix

26th
Apr 2016
David Moravek
@dmvk
Apr 26 2016 15:25
Hey, I'm starting to be really desperate. Atomix Replica starts fine, but when I'm trying to get resource from atomix client, I always get session timeout (atomix client seems to be started though)... logs are not really helpful, all I get is
17:24:05.351 [copycat-client-io-1] DEBUG io.atomix.copycat.client.util.ClientConnection - Received ConnectResponse[status=OK, leader=docker.loc/172.17.0.2:1234, members=[docker.loc/172.17.0.2:1234]]
17:24:05.361 [copycat-client-io-1] DEBUG io.atomix.copycat.client.session.ClientSession - Received RegisterResponse[status=OK, session=276, leader=docker.loc/172.17.0.2:1234, members=[docker.loc/172.17.0.2:1234]]
17:24:05.364 [copycat-client-io-1] DEBUG io.atomix.copycat.client.DefaultCopycatClient - State changed: CONNECTED
17:24:05.367 [copycat-client-io-1] INFO io.atomix.copycat.client.session.ClientSession - Registered session 276
17:24:05.368 [copycat-client-io-1] DEBUG io.atomix.copycat.client.session.ClientSession - 276 - Sending KeepAliveRequest[session=276, commandSequence=0, eventIndex=276]
17:24:05.408 [copycat-client-io-1] DEBUG io.atomix.copycat.client.util.ClientConnection - Connecting to docker.loc/172.17.0.2:1234
17:24:05.408 [copycat-client-io-1] INFO io.atomix.catalyst.transport.NettyClient - Connecting to docker.loc/172.17.0.2:1234
17:24:05.415 [copycat-client-io-1] DEBUG io.atomix.copycat.client.session.ClientSession - 276 - Sending CommandRequest[session=276, sequence=1, command=io.atomix.manager.internal.GetResource@754d665b]
17:24:05.420 [copycat-client-io-1] DEBUG io.atomix.copycat.client.util.ClientConnection - Connection closed
17:24:05.423 [catalyst-event-loop-2] INFO io.atomix.catalyst.transport.NettyClient - Connected to docker.loc/172.17.0.2:1234
17:24:05.423 [copycat-client-io-1] DEBUG io.atomix.copycat.client.util.ClientConnection - Setting up connection to docker.loc/172.17.0.2:1234
17:24:05.423 [copycat-client-io-1] DEBUG io.atomix.copycat.client.util.ClientConnection - Sending ConnectRequest[client=a53041c6-53b9-4b2b-867c-eacf179169db]
17:24:05.437 [copycat-client-io-1] DEBUG io.atomix.copycat.client.util.ClientConnection - Received ConnectResponse[status=OK, leader=docker.loc/172.17.0.2:1234, members=[docker.loc/172.17.0.2:1234]]
17:24:05.441 [copycat-client-io-1] DEBUG io.atomix.copycat.client.session.ClientSession - 276 - Received CommandResponse[status=OK, index=278, eventIndex=276, result=23]
17:24:05.446 [copycat-client-io-1] DEBUG io.atomix.copycat.client.session.ClientSession - 276 - Received KeepAliveResponse[status=OK, error=null, leader=docker.loc/172.17.0.2:1234, members=[docker.loc/172.17.0.2:1234]]
17:24:07.947 [copycat-client-io-1] DEBUG io.atomix.copycat.client.session.ClientSession - 276 - Sending KeepAliveRequest[session=276, commandSequence=1, eventIndex=276]
17:24:07.950 [copycat-client-io-1] DEBUG io.atomix.copycat.client.session.ClientSession - 276 - Received KeepAliveResponse[status=OK, error=null, leader=docker.loc/172.17.0.2:1234, members=[docker.loc/172.17.0.2:1234]]
17:24:10.451 [copycat-client-io-1] DEBUG io.atomix.copycat.client.session.ClientSession - 276 - Sending KeepAliveRequest[session=276, commandSequence=1, eventIndex=276]
any idea what might be going wrong?
Richard Pijnenburg
@electrical
Apr 26 2016 15:27
@davidmoravek do you have a gist of your code?
David Moravek
@dmvk
Apr 26 2016 15:31
sure, one sec
Jordan Halterman
@kuujo
Apr 26 2016 15:33
Nah my internet is not working too well ATM
a.getMap never completes :(
Jordan Halterman
@kuujo
Apr 26 2016 15:35
Bah
Let me try to send this again...
Roman Pearah
@neverfox
Apr 26 2016 15:35
@davidmoravek If you get this working, I'm going to bug you about your Docker/Kafka/Atomix integration ;)
That's probably something I'll be doing
Richard Pijnenburg
@electrical
Apr 26 2016 15:35
docker it self is easy. I’ve been playing around with that for ages :-)
Jordan Halterman
@kuujo
Apr 26 2016 15:35
Those logs show all successful requests/responses (status=OK). Where is the response with the session expiration?
Hey I'm a huge fan of all of those things :-)
Roman Pearah
@neverfox
Apr 26 2016 15:36
I'm up on Docker itself, just mainly how to get a Atomix cluster going. Clustered services can be tricky in containers.
Jordan Halterman
@kuujo
Apr 26 2016 15:36
Indeed
Roman Pearah
@neverfox
Apr 26 2016 15:36
At least Zookeeper was a pain
then...Kubernetes :)
it's actually part of my sprint right now to get Atomix going on Kube
David Moravek
@dmvk
Apr 26 2016 15:38
17:24:05.420 [copycat-client-io-1] DEBUG io.atomix.copycat.client.util.ClientConnection - Connection closed
Jordan Halterman
@kuujo
Apr 26 2016 15:39
You don't get any logs beyond those?
David Moravek
@dmvk
Apr 26 2016 15:39
I also managed to get response when joining distributed group io.atomix.copycat.session.ClosedSessionException: session closed
not from client, I can give you logs from master, but I dont find them any useful also
Jordan Halterman
@kuujo
Apr 26 2016 15:39
hmm
If they're debug logs I really need the full client and server logs. I can make sense of them
They should show where/why the session is expired
Jordan Halterman
@kuujo
Apr 26 2016 15:45
Awesome looking
David Moravek
@dmvk
Apr 26 2016 15:48
this is ok, right? (when I want to setup just single node replica)
    final AtomixReplica replica = AtomixReplica.builder(address)
        .withTransport(NettyTransport.builder().withThreads(3).build())
        .withStorage(Storage.builder().withStorageLevel(StorageLevel.MEMORY).build())
        .build();

    replica.bootstrap().join();
Jordan Halterman
@kuujo
Apr 26 2016 15:49
I see session
So, in those logs session 22 is registered and then is expired 5 seconds later since no keep-alives are received for that session after it's registered.
Yeah
You see Detected expired session: 22
David Moravek
@dmvk
Apr 26 2016 15:53
found it! the problem is synchronous operation inside callback
    client.connect(cluster).whenComplete((a, e) -> {
      a.getGroup("xxx").whenComplete((group, e2) -> {
        System.out.println("aaa");
      });
      try {
        Thread.sleep(5000);
      } catch (InterruptedException e1) {
        e1.printStackTrace();
      }
    });
Jordan Halterman
@kuujo
Apr 26 2016 15:53
Ahh
This is interesting...
David Moravek
@dmvk
Apr 26 2016 15:54
so it wouldn't even work if I call a.getGroup("xx").join() :/
Jordan Halterman
@kuujo
Apr 26 2016 15:54
So, there are mechanisms to detect when the event thread is blocked, but those mechanisms are based on futures. Since this is using Thread.sleep it doesn't detect it
You can do join() it will detect that the thread is blocked
The future that's returned by the client is actually an extension of CompletableFuture that sets a flag when the future is blocked so it doesn't try to use the event thread
David Moravek
@dmvk
Apr 26 2016 15:56
    client.connect(cluster).whenComplete((a, e) -> {
      a.getGroup("xxx").join();
      System.out.println("X");
    });
this doesn't detect anything
Jordan Halterman
@kuujo
Apr 26 2016 15:57
That will be a bug if it's not working... I wonder if it could be because futures are wrapped by Atomix
David Moravek
@dmvk
Apr 26 2016 15:58
anyway, thanks for help ;)
Jordan Halterman
@kuujo
Apr 26 2016 15:59
https://github.com/atomix/copycat/blob/master/test/src/test/java/io/atomix/copycat/test/ClusterTest.java testBlockOnEvent tests this feature. The same test probably needs to be in Atomix. I'm guessing the resource's client is actually returning a different future that doesn't detect blocking
Will file a bug
David Moravek
@dmvk
Apr 26 2016 16:01
@neverfox I would call it an integration :D only thing running in docker is atomix
Jordan Halterman
@kuujo
Apr 26 2016 16:02
I guess there may be better ways to detect the event thread is blocked
Like a time limit
Though that's sort of arbitrary
David Moravek
@dmvk
Apr 26 2016 16:13
is AtomixClient thread safe?
Jordan Halterman
@kuujo
Apr 26 2016 16:13
yeah
David Moravek
@dmvk
Apr 26 2016 16:23
nooo, all of this was a really stupid bug on my side :/ forgot @Singleton annotation for guice provide method, so I always got new atomix instance...
Jordan Halterman
@kuujo
Apr 26 2016 16:23
ahh
Roman Pearah
@neverfox
Apr 26 2016 16:27
@kuujo If you put a replica cluster behind a load balancer, and pointed clients to the LB, would you expect that to work in lieu of the actual cluster address list?
Jordan Halterman
@kuujo
Apr 26 2016 19:07
hmmm… I think you’d have to override ServerSelectionStrategy to force it to connect to the load balancer
the problem is, clients connect to specific servers based on the ServerSelectionStrategy
so, if the strategy is to connect to a leader, the client will switch to wherever the leader is
if the strategy is to connect to a follower, it will disconnect from a leader
etc
Roman Pearah
@neverfox
Apr 26 2016 19:08
ah I see
Jordan Halterman
@kuujo
Apr 26 2016 19:08
not sure if setting the ServerSelectionStrategy would work TBH but it might
grahamashby
@grahamashby
Apr 26 2016 20:54
I'm having an issue using BslsncingClusterManager. With the following code:
    AtomixReplica atomixReplica = AtomixReplica.builder(new Address("localhost", getLocalAddress())) //
            .withClusterManager(BalancingClusterManager.builder() //
                    .withQuorumHint(1) //
                    .withBackupCount(0) //
                    .build()) //
            .build(); //
    CompletableFuture<AtomixReplica> future = atomixReplica.bootstrap();
    AtomixReplica join = future.join();
I get the following exception:
Exception caught while initializing context: java.util.concurrent.CompletionException: java.lang.NullPointerException
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:284)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:291)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:972)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:937)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:485)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1973)
at io.atomix.copycat.server.CopycatServer.lambda$null$27(CopycatServer.java:660)
at io.atomix.copycat.server.CopycatServer
KaTeX parse error: Unexpected character: '$' at position 6: Lambda̲$52.000000000C: Lambda$52.000000000C027400.accept(Unknown Source)
    at io.atomix.catalyst.util.Listeners$ListenerHolder.lambda$accept$1(Listeners.java:100)
    at io.atomix.catalyst.util.Listeners$ListenerHolder
Lambda$67.000000000BCA45C0.run(Unknown Source)
at io.atomix.catalyst.util.concurrent.Runnables.lambda$logFailure$17(Runnables.java:20)
at io.atomix.catalyst.util.concurrent.Runnables$$Lambda$11.000000000C41DA60.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:191)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:785)
Caused by: java.lang.NullPointerException
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:963)
... 16 more
But If I comment out the use of the BalancingClusterManager, the future.join() succeeds. ANy ideas? Am I out to lunch in my use of the ClusterManager?