These are chat archives for atomix/atomix

16th
Feb 2017
Jordan Halterman
@kuujo
Feb 16 2017 21:58
@jhall11 I'm assuming you want the open PR in the release?
Jon Hall
@jhall11
Feb 16 2017 21:59
Yes, That would be ideal
Jordan Halterman
@kuujo
Feb 16 2017 21:59
Cool
Jon Hall
@jhall11
Feb 16 2017 22:00
I noticed when running tests, that there is always one that fails when running all together, but always succedes when run by itself
unit tests*
János Pásztor
@janoszen
Feb 16 2017 22:09
Hello everyone! I would have two questions, if anyone is available:
  1. how do you shut down an Atomix cluster? It seems to me that if I start 3 bootstrap nodes and then try to shut them down using leave(), the threads are stuck>
  2. is there any way to create an authentication relationship between nodes? Maybe a shared secret?
Jordan Halterman
@kuujo
Feb 16 2017 22:20
  1. It depends on what you want. leave() will remove each node from the cluster until there are no more nodes and the state is lost. shutdown() will simply stop the node and preserve state. There may be a bug when removing nodes from the cluster that's causing it to hang. The tests are passing, but that could be a product of differences in test/production environments. I can try to reproduce it this weekend.
  2. I suppose you'd have to implement a Transport to handle that. The NettyTransport that's typically used does support SSL.
@jhall11 in Atomix?
János Pásztor
@janoszen
Feb 16 2017 22:21
Regarding 1. I can try and give you the code that causes it to hang tomorrow.
Jon Hall
@jhall11
Feb 16 2017 22:21
in copycat, the test is testThreeNodeCloseEvent
János Pásztor
@janoszen
Feb 16 2017 22:21
Regarding 2. I think that would be useful, in case one needs to run the cluster across untrusted networks.
Jordan Halterman
@kuujo
Feb 16 2017 22:26
totally agree
hmm
I’ve seen that in the past, but haven’t seen it in a while
János Pásztor
@janoszen
Feb 16 2017 22:28
OK, I gotta hit the sack, but I'll try to whip up a PoC tomorrow. Right now my code is starting 3 threads, and then starts one Atomix node in each.
Jordan Halterman
@kuujo
Feb 16 2017 22:29
sounds great
János Pásztor
@janoszen
Feb 16 2017 22:29
MAybe the whole threading thing has something to do with the shutdown issues
Jordan Halterman
@kuujo
Feb 16 2017 22:29
hmmm… may be possible
János Pásztor
@janoszen
Feb 16 2017 22:29
Regarding authentication, I think I'll try create a modified nette transport that does some sort of authentication
Jordan Halterman
@kuujo
Feb 16 2017 22:37
@jhall11 I got that too… lemme look into it
I guess the possibilities are the state is not being cleaned up completely between test runs or…?
hmm
the failure seems to happen at a really odd place in my test logs though
János Pásztor
@janoszen
Feb 16 2017 22:53
OK, so I am now running Atomix as a single version in one program, no more threading on my part, and the hanging still occurs. When I stop the program after my main thread exits, I get this stack trace: https://www.dropbox.com/s/9jfhydc9mkn8hqx/Screenshot%20from%202017-02-16%2023-52-27.png?dl=0
Jordan Halterman
@kuujo
Feb 16 2017 22:55
hmm
János Pásztor
@janoszen
Feb 16 2017 22:55
If I'd have to guess, Atomix forgets to terminate something when I call leave()
This is with two nodes btw.
Jordan Halterman
@kuujo
Feb 16 2017 22:55
indeed
János Pásztor
@janoszen
Feb 16 2017 22:56
This is all my code:
        AtomixReplica replica = AtomixReplica.builder(new Address(listenAddress, port))
                                             .withTransport(new NettyTransport())
                                             .withStorage(Storage.builder()
                                                                 .withStorageLevel(StorageLevel.MEMORY)
                                                                 .build())
                                             .build();

        CompletableFuture<AtomixReplica> future;

        if (isBoostrapNode) {
            System.out.println("Bootstrapping cluster...");
            future = replica.bootstrap(bootstrapNodes);
        } else {
            System.out.println("Joining cluster...");
            future = replica.join(bootstrapNodes);
        }

        replica = future.join();
        replica.leave().join();
Either I'm doing something incredibly stupid or this is indeed a bug
Jordan Halterman
@kuujo
Feb 16 2017 22:57
it’s the Transport
János Pásztor
@janoszen
Feb 16 2017 22:57
Should I close the transport myself?
Jordan Halterman
@kuujo
Feb 16 2017 22:57
it’s closed in shutdown but not in leave
no
well, I guess that’s debatable, but I think it should be done in Copycat
it’s already done in the client and in shutdown so shouldn’t be changed now anyways
one sec
János Pásztor
@janoszen
Feb 16 2017 22:58
Not sure. I mean I did create it, so I could also be responsible for closing it. Either way, this should be documented.
Jordan Halterman
@kuujo
Feb 16 2017 22:58
right
I would agree with that, but since it’s already done in Copycat, changing it now would cause everyone’s code to hang :-)
if it were created internally through a factory or something then it would make more sense to stop it internally
János Pásztor
@janoszen
Feb 16 2017 23:01
I could use NettyTransport.builder(), but I don't see it being closed that way either
Jordan Halterman
@kuujo
Feb 16 2017 23:02
Copycat 2.0 refactors communication a bit and it can be changed there
János Pásztor
@janoszen
Feb 16 2017 23:02
Actually, I'm fine with how it works as long as I know. I'm pretty amazed how well the rest works
Right now I'm under a bit of time pressure, but if this makes it into production, I'll spend some time writing documentation
Jordan Halterman
@kuujo
Feb 16 2017 23:03
:-D
atomix/copycat#281
János Pásztor
@janoszen
Feb 16 2017 23:09
Subscribed, awesome
Jordan Halterman
@kuujo
Feb 16 2017 23:10
I’ll be pushing a release candidate today
which we need for other tests anyways