These are chat archives for atomix/atomix

23rd
Mar 2016
HeroesHannibal
@HeroesHannibal
Mar 23 2016 01:55

Hi! I'm trying out the sample code for leader election and I'm running into an issue with the following callback function:

group.election().onElection(leader -> {
      if (leader.equals(member)) {
        System.out.println("Elected leader!");
      }
    });

Even though the callback function is being called whenever the leader is elected, its failing the check in the if statement and I'm wondering why this may be? I noticed that when I printed leader.leader().address(), this was null which doesn't seem right?

Jordan Halterman
@kuujo
Mar 23 2016 01:57
Yeah… the address can be null… that address is actually for direct communication and is configured via the DistributedGroup.Options. If no group Address is configured then you just can’t do direct messaging… but to your real question
so, the onElection callback will be called any time a leader is elected. On one node, that leader should equal the local member’s id(). So, you can check member.id() vs leader.id() to determine whether they should equal each other
HeroesHannibal
@HeroesHannibal
Mar 23 2016 02:09
ah! thank you! Updating the if condition to if (leader.leader().id().equals(member.id())) works!
Jordan Halterman
@kuujo
Mar 23 2016 02:10
Ugh so must be a bug in the equals implementation
return object instanceof GroupMember && ((GroupMember) object).memberId.equals(memberId);
Weird
Oooh ita because the argument was hanged to a GroupTerm
So if (term.leader().equals(member)) should work
It's just the example that needs to be updated
changed*
HeroesHannibal
@HeroesHannibal
Mar 23 2016 02:20
Ah, i see. Another question! :) So i posted on the issues page of the github about seeing timeouts and I'm seeing some kind of timeout frequently now. I've upload my log to pastebin since there seems to be a character limit here http://pastebin.com/4ywtdkBN. Any idea why I may be seeing this frequently?
The log for the other server is:
[copycat-server-/10.144.144.21:6055-copycat] INFO io.atomix.catalyst.transport.NettyServer - Binding to 10.144.144.21/10.144.144.21:6055
[copycat-server-/10.144.144.21:6055-copycat] INFO io.atomix.catalyst.transport.NettyServer - Listening at /10.144.144.21:6055
[copycat-server-/10.144.144.21:6055-copycat] INFO io.atomix.copycat.server.state.ServerContext - 10.144.144.21/10.144.144.21:6055 - Transitioning to FOLLOWER
[copycat-server-/10.144.144.21:6055-copycat] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler002/10.144.144.22:6055
[catalyst-event-loop-1] INFO io.atomix.catalyst.transport.NettyClient - Connected to scheduler002/10.144.144.22:6055
[copycat-server-/10.144.144.21:6055-copycat] INFO io.atomix.copycat.server.state.ServerContext - 10.144.144.21/10.144.144.21:6055 - Found leader scheduler002/10.144.144.22:6055
[copycat-server-/10.144.144.21:6055-copycat] INFO io.atomix.copycat.server.state.ServerContext - 10.144.144.21/10.144.144.21:6055 - Transitioning to INACTIVE
[copycat-server-/10.144.144.21:6055-copycat] INFO io.atomix.copycat.server.state.ServerStateMachine - 10.144.144.21/10.144.144.21:6055 - Taking snapshot 1

I've also noticed that if the leader election happens successfully without any timeouts, and I kill the server using ^C, the other server keeps outputting these messages:

[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to scheduler001/127.0.1.1:6055
[copycat-client-2] INFO io.atomix.catalyst.transport.NettyClient - Connecting to /10.144.144.22:6055

Is there some sort of timeout where the Inactive server waits until realizing the leader is down? Can I configure this?

Jordan Halterman
@kuujo
Mar 23 2016 02:24
Hmm
Yeah that is the election timeout. Decreasing it will increase the frequency of heartbeats from the leader but can result in faster convergence
Jordan Halterman
@kuujo
Mar 23 2016 02:31
one sec

Those servers seem to be timing out due to servers being unreachable.

[copycat-server-/10.144.144.22:6055-copycat] INFO io.atomix.copycat.server.state.FollowerState - 10.144.144.22/10.144.144.22:6055 - Polling members [ServerMember[type=ACTIVE, status=UNAVAILABLE, serverAddress=scheduler001/10.144.144.21:6055, clientAddress=null]]
[copycat-server-/10.144.144.22:6055-copycat] WARN io.atomix.copycat.server.state.FollowerState - 10.144.144.22/10.144.144.22:6055 - request timed out

/10.144.144.22:6055 is requesting a vote from 10.144.144.21:6055, but 10.144.144.21:6055 is not responding. It seems like neither server is able to communicate with the other

HeroesHannibal
@HeroesHannibal
Mar 23 2016 02:39
Hmm, so its an issue with the network then?
Does the elected leader need heartbeats from the servers that elected it in order to stay Leader?
Jordan Halterman
@kuujo
Mar 23 2016 02:42
Nope…
When no leader exists, each server will timeout at some semi-random time and request votes from all the other servers. The cluster will elect the most up-to-date leader. Once that leader is elected, the leader will send periodic heartbeats to followers. If a follower doesn’t hear from the leader for electionTimeout, the election will start all over again.
You can enable DEBUG logging for much more useful log messages
You’ll see the entire Raft protocol running - sending heartbeats and entries between nodes, election timeouts, etc
HeroesHannibal
@HeroesHannibal
Mar 23 2016 02:50
Hmm, so do you have any ideas what I can do about the instability?
Jordan Halterman
@kuujo
Mar 23 2016 02:51
Do you see that when you run the leader election example?
HeroesHannibal
@HeroesHannibal
Mar 23 2016 02:52
Yeah
Jordan Halterman
@kuujo
Mar 23 2016 02:52
hmm
let me double check on my own machine since I haven’t run that example in a while
HeroesHannibal
@HeroesHannibal
Mar 23 2016 02:55
sounds good
Jordan Halterman
@kuujo
Mar 23 2016 03:00
Still working well for me. What exact commands are you running? I’m running these three commands:
java -jar target/atomix-leader-election.jar logs/server1 localhost:5000 localhost:5001 localhost:5002
java -jar target/atomix-leader-election.jar logs/server2 localhost:5001 localhost:5000 localhost:5002
java -jar target/atomix-leader-election.jar logs/server3 localhost:5002 localhost:5000 localhost:5001
I start three servers, each of which writes to a separate log directory… the first host:port is the server’s host:port and the rest are remote host:ports
are you doing it over a network?
Jordan Halterman
@kuujo
Mar 23 2016 04:02
I wonder if this is a difference between our networks. I'm not having an issue and typically users don't. Would be interesting to figure this one out
Sometimes users start three servers with the same log directory which causes erratic behavior since they're all reading from and writing to the same log, but that doesn't seem like what's happening here
It could be differences in environment. We spent a lot of time working through some of those issues months ago and haven't seen them at all recently. There was a race condition in NettyTransport that, once fixed, resolved all instances of those issues we could find
Joachim De Beule
@joachimdb
Mar 23 2016 12:22
Question: I'm slowly running out of memory on my replicas (not in clients it seems). I'm using memory storage on all replicas, but the only thing I do is define a group and a bunch of topics, and I periodically send messages over the topics. (Messages don't always arrive on all group members, but I'm not sure that is relevant?) Could this be related to atomix? Should I perhaps use a disk storage instead?
BTW, I'm frequently seeing the same kind of timeouts as reported by @HeroesHannibal as well..
Richard Pijnenburg
@electrical
Mar 23 2016 15:48
@joachimdb it also stores keepalive / ping messages. but there should be a cleanup process running at times
Joachim De Beule
@joachimdb
Mar 23 2016 16:25
hmm, can I configure the cleanup or sth? I see memory increases on all replicas (5) until all mem is eaten, not on clients (15), so seems rather likely that it's related to replicas somehow no?
Jordan Halterman
@kuujo
Mar 23 2016 16:27
Yeah... You will see memory grow until the log are compacted if you're using memory storage. Even if you don't write anything, logs will still grow for session information. This is necessary to provide linearizable semantics. Once a segment fills up and the log compaction timer fires, the log compactor will log a message like "Compacting segment x with MINOR compaction." during compaction, the segment will be rewritten to remove all entries that are no longer needed (e.g. all but the last keep-alive for each session, acked tasks, etc).
This can be controlled by setting the maxSegmentSize and minirCompactionInterval and majorCompactionInterval when creating the Storage object. By default the segment sizes are some big like 32MB I think
But TBH it has only been tested extensively with DISK storage. Over the last couple weeks I've been running it with something like 10k writes/sec and monitoring the logs. I did fix a bug during that time but I don't remember if it was in the last Copycat release.
Joachim De Beule
@joachimdb
Mar 23 2016 16:30
k, so better use disk storage to be safe then?
Jordan Halterman
@kuujo
Mar 23 2016 16:33
You should check the logs for the compaction message. I don't think there should be any issue with memory storage. They both use the same abstractions. If you don't see a message saying n compaction tasks are running it can just be a configuration issue
Joachim De Beule
@joachimdb
Mar 23 2016 16:35
k, will look into that then. Thanks!
Jordan Halterman
@kuujo
Mar 23 2016 16:38
I'm not sure what to think about the timeouts yet. It's normal to see them under high load, which I do see them also when testing. the timeouts are purposely set relatively low to allow the cluster to converge more quickly. I'm wondering if the DNS lookup issue that was fixed could be contributing to them.
Going to run some tests right now
Jordan Halterman
@kuujo
Mar 23 2016 16:48
We should add better logging to compaction to say: which segments were selected, how many segments were removed, and how much space was freed
Madan Jampani
@madjam
Mar 23 2016 17:52
Hi people!
Jordan Halterman
@kuujo
Mar 23 2016 17:54
Oh hey @madjam! Was wondering if you were going to join us
Richard Pijnenburg
@electrical
Mar 23 2016 17:54
Heya madam
Madan Jampani
@madjam
Mar 23 2016 17:54
Looks like all the action is here :)
Jordan Halterman
@kuujo
Mar 23 2016 17:54
Indeed
Madan Jampani
@madjam
Mar 23 2016 17:54
Good to be here!
Lets talk about an interesting behavior
I suspect its a bug
I have just seen a ServerSession go through the following state transition: OPEN -> UNSTABLE -> EXPIRED -> UNSTABLE -> OPEN
Does that seem odd?
The set up is a 3 node cluster where each node runs a client/sever combo (basically a replica) and I unceremoniously restart 2 nodes. The above behavior is in the node/client that was not killed.
Madan Jampani
@madjam
Mar 23 2016 18:02
The reason why a expired session sticks around is because there are server commits that hold references to it. Later the state machine suspects the expired session and that changes the state to UNSTABLE. So @kuujo what I did not understand was the reason for this reference tracking.
Oops! Did I lose everyone :worried:
Jordan Halterman
@kuujo
Mar 23 2016 18:08
Sorry I'm here... Simultaneously scheduling meetings :-)
Jordan Halterman
@kuujo
Mar 23 2016 18:14
So... The session states are a bit less meaningful on the server side. But a session shouldn't go to EXPIRED and then back to UNSTABLE, that does seem like a potential bug. In the state machine, once the session timeout has elapsed without a keep-alive, state machines will mark the session as UBSTABLE. But only the leader can expire a session by committing UnregisterEntry. This is necessary because incremental log compaction can leave gaps in time in the log. Once a server removes a bunch of keep-alives, it can think a session was expired when it wasn't... Those entries were just compacted. But since the leader has the most up-to-date log, it's up to the leader to say when the session is actually expired by committing that entry. That will transition the ServerSession state to EXPIRED from which it should never change
Madan Jampani
@madjam
Mar 23 2016 18:19
So only the leader having authority to expire sessions makes good sense and that is what is happening. However the session continues to stick around in ServerSessionManager because there are active references to the ServerSession from various ServerCommit entries. That prevents it from purging the session state. Later when another command is processed the leader checks to see if any sessions need be suspected now (purely based on last keep alive times) and this logic loops through all sessions (including the expired ones that were not purged) and if a session is inactive simply marks it as UNSTABLE.
Jordan Halterman
@kuujo
Mar 23 2016 18:51
Ahh gotcha. Makes sense
Let me poke in the code and see if I see this
It just looks like the session is never being removed from ServerSessionManager? I don't see where that's happening, but I need to stop looking at code on my phone so much:-P
Jordan Halterman
@kuujo
Mar 23 2016 18:56
It should be removed. It's fine for Commit to hold a reference to an expired session. State machines that expect to be able to communicate through the session should monitor expiration by implementing SessionListener
But I don't think it should be in ServerSessionManager after expiration
Hmm actually let me think about this...
Jordan Halterman
@kuujo
Mar 23 2016 18:58
Ahh thanks
So I see the problem
Jordan Halterman
@kuujo
Mar 23 2016 19:12
unregisterSession just needs to be outside the if statement. The if statement checks to see if all entries related to the session have been released by the state machine. This was a bug I fixed a while back. The entry that created the session can't be removed before all the entries committed by the session to ensure linearizable semantics can be maintained with the in-memory session state.
Long story short we should move the unregisterSession call outside the if statement
Madan Jampani
@madjam
Mar 23 2016 19:14
Gotcha. That was exactly what I had in mind as well. Good to get a confirmation :)
Is there still any reason for the if (references == 0) check?
Madan Jampani
@madjam
Mar 23 2016 19:23
On second thought that is needed .... so that everything remains consistent on log replay
Jordan Halterman
@kuujo
Mar 23 2016 20:58
Right
So references obviously refers to the number of Commits for that session that haven't been closed by the state machine. The references == 0 check is done on close to check if there are no more commits open so the register/unregister commits can also be released. Each time a Commit is closed it does those same checks. Once the session is closed and all its commits are closed those two entries will be released. That ensures on replay of the state machine the session will always be created as long as there are entries relevant to the session still present in the log so the ServerSession can properly track the command sequence number to ensure the same command is not applied twice either initially or on replay/after compaction
Jordan Halterman
@kuujo
Mar 23 2016 21:06
This actually reminds me that part of the algorithm is not documented in the internals docs yet
This is a good find :-)
Madan Jampani
@madjam
Mar 23 2016 21:11
Cool. That makes good sense. I see that ServerSessionContext::release does the appropriate clean up when the last commit (referencing the session) is cleaned.
If you don't mind I want to jump from server to client session tracking :)
Madan Jampani
@madjam
Mar 23 2016 21:17
What is the reason for this schedule call upfront? The problem I noticed is when the server responds with a UNKNOWN_SESSION_ERROR we do not cancel the scheduled keep alive task. We therefore keep sending keep alives and servers keeps responding with UNKOWN_SESSION_ERROR
Jordan Halterman
@kuujo
Mar 23 2016 21:21
Ahh I see what you’re saying… yeah I don’t think that needs to be there.
the UNKNOWN_SESSION_ERROR should cause it to be cancelled, but might as well not schedule it in the first place
Jordan Halterman
@kuujo
Mar 23 2016 22:38
@madjam want me to submit those PRs?
Madan Jampani
@madjam
Mar 23 2016 22:54
@kuujo: I just did.
atomix/copycat#198
Jordan Halterman
@kuujo
Mar 23 2016 22:54
excellent
Madan Jampani
@madjam
Mar 23 2016 22:59
Even with these fixes, I am still seeing some unexpected behavior on random node restarts. I strongly suspect that even those are related to session life cycle management. I will post updates when I find something concrete for you to look at.
Jordan Halterman
@kuujo
Mar 23 2016 22:59
excellent
Jordan Halterman
@kuujo
Mar 23 2016 23:22
Really interested in what else is going on