These are chat archives for atomix/atomix

6th
Jul 2018
william.z
@zwillim
Jul 06 2018 06:42

Hi, All.I got an exception when send an event message.
This is the client code :

        CompletableFuture<String> rtn = atomix.getEventService().send("test_event", "test_msg");
        System.out.println("test_event result:" + rtn.join());

And this is the Exception:

Exception in thread "main" java.util.concurrent.CompletionException: io.atomix.cluster.messaging.MessagingException$NoRemoteHandler: No remote message handler registered for this message
    at java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375)
    at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
    at cn.ac.iie.di.atomix2test.atomix.n.Client.main(Client.java:41)
Caused by: io.atomix.cluster.messaging.MessagingException$NoRemoteHandler: No remote message handler registered for this message
    at io.atomix.cluster.messaging.impl.DefaultClusterEventService.send(DefaultClusterEventService.java:134)
    at io.atomix.cluster.messaging.ClusterEventService.send(ClusterEventService.java:98)
    at cn.ac.iie.di.atomix2test.atomix.n.Client.main(Client.java:40)

And my server do subscribe the topic:

         atomix.getEventService().subscribe("test_event", (String message) -> {
            System.out.println("get event from test_event:" + message);
            return CompletableFuture.completedFuture("result from server1_" + message);
        });

I fond that the event do succeeded once because I found this in server log:

        get event from test_event:test_msg

So I'm confused where the error is.

Jordan Halterman
@kuujo
Jul 06 2018 07:11
The event succeeds once and then begins returning those exceptions?
or the message is received but the sender gets back an error?
william.z
@zwillim
Jul 06 2018 07:30
do succeeds once and then begins returning those exceptions
I ran the test client twice and succeeded once.
Jordan Halterman
@kuujo
Jul 06 2018 07:31
gotta try to reproduce it in a unit test
william.z
@zwillim
Jul 06 2018 07:34
it succeeded again when I deleted the ".data" directory
and then exceptions appear again
william.z
@zwillim
Jul 06 2018 07:40
Does the subscription only accept one message and then be closed automatically?
Jordan Halterman
@kuujo
Jul 06 2018 07:40
no
Also, the .data directory is only used by the core, not clustering, so it shouldn’t have any impact on messaging
william.z
@zwillim
Jul 06 2018 07:50
does the implementation of Event Service and Messaging Service similar? Messaging Service does not have this problem.
Jordan Halterman
@kuujo
Jul 06 2018 07:51
ClusterEventService uses ClusterMessagingService
it’s just most likely sending the message to the wrong node or something
just need to reproduce it in a unit test
william.z
@zwillim
Jul 06 2018 07:55
you means I write a unit test and try it again?
I'm not a native English speaker, so I may sometimes misunderstand.
Jordan Halterman
@kuujo
Jul 06 2018 08:12
I mean write a unit test that reproduces it to we can debug it
  @Test
  public void testEventService() throws Exception {
    Atomix atomix1 = startAtomix(1, Arrays.asList(1, 2), Profile.dataGrid()).get(30, TimeUnit.SECONDS);
    Atomix atomix2 = startAtomix(2, Arrays.asList(1, 2), Profile.dataGrid()).get(30, TimeUnit.SECONDS);
    atomix1.getEventService().subscribe("test", message -> CompletableFuture.completedFuture("world!")).join();
    assertEquals("world!", atomix2.getEventService().send("test", "Hello").join());
    assertEquals("world!", atomix2.getEventService().send("test", "Hello").join());
    assertEquals("world!", atomix2.getEventService().send("test", "Hello").join());
    assertEquals("world!", atomix2.getEventService().send("test", "Hello").join());
    assertEquals("world!", atomix2.getEventService().send("test", "Hello").join());
  }
this test passes
so the question is, what does this do differently?
william.z
@zwillim
Jul 06 2018 08:13
I use this :
builder.addProfile(Profile.CONSENSUS);
does this metters
Jordan Halterman
@kuujo
Jul 06 2018 08:14
Doesn’t make a difference. The messaging is really done in the cluster package which doesn’t even know anything about primitives or protocols
it uses a gossip protocol to replicate subscriptions
william.z
@zwillim
Jul 06 2018 08:15
I found another problem..
Johno Crawford
@johnou
Jul 06 2018 08:15
@zwillim create code with junit like the example above to show the problem
william.z
@zwillim
Jul 06 2018 08:15
I start one server, and sometimes I connot find the Leader
@johnou got it ,thanks :)
leadership:Leadership{leader=null, candidates=[]}
Johno Crawford
@johnou
Jul 06 2018 08:17
that means that there are no nodes available in your cluster that are eligible to become a leader
william.z
@zwillim
Jul 06 2018 08:18
I start only one server, and the server log shows the connection is ok:
ClusterMembershipEvent{type=MEMBER_ADDED, subject=StatefulMember{id=client, address=127.0.0.1:5000, metadata={}}, time=1530864899607}
ClusterMembershipEvent{type=MEMBER_REMOVED, subject=StatefulMember{id=client, address=127.0.0.1:5000, metadata={}}, time=1530865027149}
I'm confused..
Jordan Halterman
@kuujo
Jul 06 2018 08:19
Well, there are a couple of different leadership elections that happen. One is in Raft, which requires that a majority of each partition be started before a leader can be elected. So if the Raft partition group is configured with more than one member then more than one node will be required to elect a leader. The other uses Raft to election primary-backup partition leaders. Leadership log looks like it’s from the latter.
william.z
@zwillim
Jul 06 2018 08:20
acturally only one node is configured.
        builder.withLocalMember(Member.builder("client")
                .withAddress("127.0.0.1:" + 5000)
                .build());
        builder.withMembers(
                Member.builder("server1")
                        .withAddress("127.0.0.1:5001")
                        .build());
Johno Crawford
@johnou
Jul 06 2018 08:20
but again, to save you time, and us time, we would need a reproducer project or junit test, you could fill out a bug report on github if you manage to put together some code which shows the problem, one issue per problem
Jordan Halterman
@kuujo
Jul 06 2018 08:21
that’s two nodes
touche
Well… one server, assuming the server is the one configured with partitions
william.z
@zwillim
Jul 06 2018 08:21
        final String server = "server1";
        int port = 5001;
        builder.withLocalMember(Member.builder(server)
                .withAddress("127.0.0.1:" + port)
                .build());
        builder.withMembers(
                Member.builder("server1")
                        .withAddress("127.0.0.1:5001")
                        .build()
@johnou ok,I will fill out a bug report
Jordan Halterman
@kuujo
Jul 06 2018 08:22
Need the whole reproducer like @johnou says. There are too many variables. This all depends on how the partition groups are configured. It seems like the Leadership log is from a leader elector primitive, but if you’re using Profile.CONSENSUS then there shouldn’t even be any leader elector primitives created in the cluster
Johno Crawford
@johnou
Jul 06 2018 08:23
don't forget the code which clearly demonstrates the problem, otherwise it would be like finding a needle in a haystack
william.z
@zwillim
Jul 06 2018 08:24
Ok, I'll try another protocol and fill up an bug report
thanks a lot ! @kuujo @johnou