These are chat archives for atomix/atomix

9th
Dec 2016
Roman Pearah
@neverfox
Dec 09 2016 01:41
@belowm queue.poll()
belowm
@belowm
Dec 09 2016 09:45

@neverfox Thanks for the reply. The future returned by queue.poll() seems to complete immediately even if there is no item on the queue:

queue.poll().thenAccept(msg -> {
    System.out.println("retrieved " + msg );
});

Gives me:

retrieved null

I would like the future to be pending, until an item becomes available (or an error occurs). Of course I could also poll the queue periodically, but that seems odd to me.

Thiago Santos
@thiagoss
Dec 09 2016 19:54

Hi, I've been having an issue when using atomix/copycat with ONOS. Basically there is a mismatch between a client/session connection and the server's status because a client tries to connect to one server, timesout and tries another one that succeeds but the order of the events reaching the Leader is different.

Some more details here: https://jira.onosproject.org/browse/ONOS-5347?focusedCommentId=14167&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14167

I'm nos sure how/where to fix this

Jon Hall
@jhall11
Dec 09 2016 20:12
Is this an accurate summary of the events:
  1. Client 3 sends connect request to Server 3
  2. Client 3 timesout connect request to Server 3
  3. Client 3 sends connect request to Server 2
  4. Server 2 accepts the connection from Client 3
  5. Server 3 accepts the connection to Client 3
oops
Jon Hall
@jhall11
Dec 09 2016 20:34
Can you tell which connection the server(s) are trying to use?
I’m thinking the servers should try to recover the session, but I’m not sure if that would have any side effects
Thiago Santos
@thiagoss
Dec 09 2016 20:40
What do you mean with which connection. Do you mean at the end? Client 3 is sending its messages to Server 2 as it was the last connection it attempted. And then Server 2 Session Context for client 3 has no connection so it never gets to send PublishRequest so Client 3 never sees any events anymore
Jon Hall
@jhall11
Dec 09 2016 20:59
so the client only has a connection to server 2( it cleans up the old one to server 3 in step 2). At step 4, server 2 has a connection to client3 correct? Does it give this up at step 5? What does server 3 think by the end?
Thiago Santos
@thiagoss
Dec 09 2016 21:03
Server 3 lastly receives the ConnectEntry from the Leader and believes the Client is connected to itself
And yes, the Server 2 gives up the connection on step 5
Jon Hall
@jhall11
Dec 09 2016 21:06
So is the client still sending keep alive to server 2 and is server 2 appending them? If not, the session should expire and it should attempt to recover( or close depending on recovery strategy)
Thiago Santos
@thiagoss
Dec 09 2016 21:08
3 - Received KeepAliveResponse[status=OK, error=null, leader=/172.17.0.3:9876, members=[/172.17.0.2:9876, /172.17.0.4:9876, /172.17.0.3:9876]] It actually gets a response from Server 2 (172.17.0.3)
But you might be right, maybe the server shouldn't respond to keep alive from clients that aren't connected to it?
Jon Hall
@jhall11
Dec 09 2016 21:29
@kuujo, Any thoughts on this?
Jon Hall
@jhall11
Dec 09 2016 22:07
It seems like there are two issues:
  1. A server should not respond to keep alive requests from client they aren’t connected to( the case when a server doesn’t think a client is connected, but a client does). This keeps the client thinking it is still connected, which it really isn’t. If we fix this, the system should recover.
  2. This is what led to the situation: When a server receives a connect request from a client, all previous connect requests should be invalid from the same client. I’m not sure how to enforce this though. We shouldn’t rely on timestamps, but would the log indexes be in the correct order if one of the servers is over utilized?
Roman Pearah
@neverfox
Dec 09 2016 22:51
@belowm I see. No, DistributedQueue doesn't implement take which would be similar to what you're looking for, but as far as I know, implementations of blocking queues just loop on isEmpty and then polls the value.