These are chat archives for akkadotnet/akka.net

19th
Jun 2017
Shikha Madaan
@smadaan2
Jun 19 2017 01:25
I want to modify spray generated response header i.e Date but not able to modify and its creating new Date response header so I am getting duplicate Date response header.
 responseHeader(`Date`, new DateTime()) {
  complete()
}
Stephen Newman
@goodisontoffee
Jun 19 2017 08:25
@Aaronontheweb Yeah, I took a look at the JVM akka source and saw that those were placed in the system guardian's hierarchy - I shall make some changes to my tests to reflect :)
Marc Piechura
@marcpiechura
Jun 19 2017 08:30
@smadaan2 I think you're in the wrong room, this one is for Akka on .Net not JVM ;-)
chipdice
@chipdice
Jun 19 2017 14:18
@Horusiath - The only isue I'm having so far is that when I publish my first message, it gets lost unless I put a sleep in after I have received a memberUp for that node. I'm unclear as to what I can use as a trigger to verify I am fully connected and can publish that first message. As for the question, my first PubSub will not be a lot of messages, but I am setting up a lot of topics (approx 800). I'm also looking at a project that will have a lot more volume and about the same number of topics. I'm collecting the metrics for that now. For that, I need to know what sort of volume it can handle.
Aaron Stannard
@Aaronontheweb
Jun 19 2017 14:50
got to the bottom of that node stuck joining issue yesterday
was able to use the Petabridge.Cmd log tailing feature to capture the output
and see what's going wrong
[DEBUG][6/18/2017 5:42:22 PM][Thread 0003][[akka://webcrawler/system/cluster/core/daemon#1695848106]] [Initialized] Received GossipStatus(from=UniqueAddress: (akka.tcp://webcrawler@127.0.0.1:16666, 927344630), version=VectorClock(akka.tcp://webcrawler@127.0.0.1:4053-1233580835->9))
actually, that will be easier to read if I do it like this
[DEBUG][6/18/2017 5:42:22 PM][Thread 0003][[akka://webcrawler/system/cluster/core/daemon#1695848106]] [Initialized] Received GossipStatus(from=UniqueAddress: (akka.tcp://webcrawler@127.0.0.1:16666, 927344630), version=VectorClock(akka.tcp://webcrawler@127.0.0.1:4053-1233580835->9))
that's what a normal piece of gossip typically looks like in a recently started cluster that has had a few nodes join
the only node who appears in the vector clock is the leader
and there's been 9 generations of gossip thus far, mostly from nodes joining the cluster
the only thing that increments the vector clock is a change in the cluster's state
otherwise there's no need to update it
so, the key thing that causes the node stuck joining issue is network instability... this is because the vector clock takes on a very different shape when an Unreachable event fires
[DEBUG][6/18/2017 4:51:59 PM][Thread 0029][[akka://webcrawler/system/cluster/core/daemon#589039921]] "Couldn't establish a causal relationship between "remote" gossip and "local" gossip - Remote[Gossip(members = [Member(address = akka.tcp://webcrawler@127.0.0.1:4053, status = Up, role=[lighthouse], upNumber=1), Member(address = akka.tcp://webcrawler@127.0.0.1:16666, status = Up, role=[web], upNumber=3), Member(address = akka.tcp://webcrawler@127.0.0.1:51251, status = Up, role=[tracker], upNumber=2), Member(address = akka.tcp://webcrawler@127.0.0.1:51279, status = Up, role=[crawler], upNumber=4), Member(address = akka.tcp://webcrawler@127.0.0.1:51515, status = Up, role=[crawler], upNumber=5), Member(address = akka.tcp://webcrawler@127.0.0.1:51522, status = Up, role=[crawler], upNumber=6), Member(address = akka.tcp://webcrawler@127.0.0.1:51528, status = Up, role=[crawler], upNumber=7)], overview = GossipOverview(seen=[UniqueAddress: (akka.tcp://webcrawler@127.0.0.1:51251, 1406624544)], reachability=Reachability([akka.tcp://webcrawler@127.0.0.1:51251 -> UniqueAddress: (akka.tcp://webcrawler@127.0.0.1:51522, 2035728246): Unreachable [Unreachable] (3)])), version = VectorClock(akka.tcp://webcrawler@127.0.0.1:16666-992966279->2, akka.tcp://webcrawler@127.0.0.1:4053-158093790->30, akka.tcp://webcrawler@127.0.0.1:51251-1406624544->3, akka.tcp://webcrawler@127.0.0.1:51279-700901421->2, akka.tcp://webcrawler@127.0.0.1:51290-1038231532->1)] - Local[Gossip(members = [Member(address = akka.tcp://webcrawler@127.0.0.1:4053, status = Up, role=[lighthouse], upNumber=1), Member(address = akka.tcp://webcrawler@127.0.0.1:16666, status = Up, role=[web], upNumber=3), Member(address = akka.tcp://webcrawler@127.0.0.1:51251, status = Up, role=[tracker], upNumber=2), Member(address = akka.tcp://webcrawler@127.0.0.1:51279, status = Up, role=[crawler], upNumber=4), Member(address = akka.tcp://webcrawler@127.0.0.1:51515, status = Up, role=[crawler], upNumber=5), Member(address = akka.tcp://webcrawler@127.0.0.1:51522, status = Up, role=[crawler], upNumber=6), Member(address = akka.tcp://webcrawler@127.0.0.1:51528, status = Up, role=[crawler], upNumber=7)], overview = GossipOverview(seen=[UniqueAddress: (akka.tcp://webcrawler@127.0.0.1:4053, 158093790)], reachability=Reachability([akka.tcp://webcrawler@127.0.0.1:4053 -> UniqueAddress: (akka.tcp://webcrawler@127.0.0.1:51522, 2035728246): Unreachable [Unreachable] (3)])), version = VectorClock(akka.tcp://webcrawler@127.0.0.1:16666-992966279->2, akka.tcp://webcrawler@127.0.0.1:4053-158093790->31, akka.tcp://webcrawler@127.0.0.1:51251-1406624544->2, akka.tcp://webcrawler@127.0.0.1:51279-700901421->2, akka.tcp://webcrawler@127.0.0.1:51290-1038231532->1)] - merged them into [Gossip(members = [Member(address = akka.tcp://webcrawler@127.0.0.1:4053, status = Up, role=[lighthouse], upNumber=1), Member(address = akka.tcp://webcrawler@127.0.0.1:16666, status = Up, role=[web], upNumber=3), Member(address = akka.tcp://webcrawler@127.0.0.1:51251, status = Up, role=[tracker], upNumber=2), Member(address = akka.tcp://webcrawler@127.0.0.1:51279, status = Up, role=[crawler], upNumber=4), Member(address = akka.tcp://webcrawler@127.0.0.1:51515, status = Up, role=[crawler], upNumber=5), Member(address = akka.tcp://webcrawler@127.0.0.1:51522, status = Up, role=[crawler], upNumber=6), Member(address = akka.tcp://webcrawler@127.0.0.1:51528, status = Up, role=[crawler], upNumber=7)], overview = GossipOverview(seen=[], reachability=Reachability([akka.tcp://webcrawler@127.0.0.1:4053 -> UniqueAddress: (akka.tcp://webcrawler@127.0.0.1:51522, 2035728246): Unreachable [Unreachable] (3)][akka.tcp://webcrawler@127.0.0.1:51251 -> UniqueAddress: (akka.tcp://webcrawler@127.0.0.1:51522, 2035728246): Unreachable [Unreachable] (3)])), version = VectorClock(akka.tcp://webcrawler@127.0.0.1:16666-992966279->2, akka.tcp://webcrawler@127.0.0.1:4053-158093790->31, akka.tcp://webcrawler@127.0.0.1:51251-1406624544->3, akka.tcp://webcrawler@127.0.0.1:51279-700901421->2, akka.tcp://webcrawler@127.0.0.1:51290-1038231532->1)]"
apologies for the WALL-O-TEXT
this is the leader of the cluster seeing conflicting pieces of gossip, which have to be resolved via merge
this is because each node reports individually that it can't contact the node I removed from the cluster
Aaron Stannard
@Aaronontheweb
Jun 19 2017 14:56
[DEBUG][6/18/2017 4:52:03 PM][Thread 0036][[akka://webcrawler/system/cluster/core/daemon#589039921]] [Initialized] Received GossipStatus(from=UniqueAddress: (akka.tcp://webcrawler@127.0.0.1:16666, 992966279), version=VectorClock(akka.tcp://webcrawler@127.0.0.1:16666-992966279->4, akka.tcp://webcrawler@127.0.0.1:4053-158093790->31, akka.tcp://webcrawler@127.0.0.1:51251-1406624544->4, akka.tcp://webcrawler@127.0.0.1:51279-700901421->4, akka.tcp://webcrawler@127.0.0.1:51290-1038231532->1, akka.tcp://webcrawler@127.0.0.1:51515-1026537112->2))
the vector clock now includes version data for all nodes
instead of just the leader
at some point in the future, if I remove one or down one of these nodes who appear in the vector clock... guess what
their version data is still contained inside the vector clock... it doesn't get pruned properly
this causes a false positive that makes the older vector clock appear newer than the correct status
hence why nodes can get stuck joining
and I've also been able to verify that behavior via a model-based test on the VectorClock class itself
the fix that is needed is reviewing the code we added in https://github.com/akkadotnet/akka.net/pull/2107/ as part of 1.1
where the pruning is supposed to occur
rebooting the leader is effective as a workaround right now because that effectively purges the vector clock
and discards the old entities
which allows the versions to be compared correctly in the future
so anyway, this should be a simple fix - going to take a look at the pruning code and see what needs fixing there
but this is definitely the issue
the MBT confirmed it
Aaron Stannard
@Aaronontheweb
Jun 19 2017 15:03
provided that I don't get sucked back into accounting / administrative hell again today, I'm hoping to have a fix tonight
Jose Carlos Marquez
@oeaoaueaa
Jun 19 2017 15:20
that's great! looking forward to test it
Alex Gibson
@crucifieddreams
Jun 19 2017 15:31
@Aaronontheweb This is good news :) look forward to testing out a fix