Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 20:05
    Arkatufus edited #4253
  • 19:27
    Arkatufus synchronize #4253
  • 19:20
    Arkatufus synchronize #4253
  • 19:03
    Aaronontheweb commented #4255
  • 18:47
    Aaronontheweb commented #4255
  • 18:46
    Arkatufus synchronize #4253
  • 18:20
    Aaronontheweb commented #4255
  • 18:03
    Aaronontheweb synchronize #4255
  • 18:02
    Aaronontheweb synchronize #4238
  • 18:02
    Aaronontheweb synchronize #4211
  • 18:02
    Aaronontheweb assigned #3573
  • 18:00
    Aaronontheweb synchronize #4257
  • 17:59
    dependabot-preview[bot] synchronize #3985
  • 17:59

    dependabot-preview[bot] on nuget

    Bump FsCheck.Xunit from 2.9.0 t… (compare)

  • 17:59
    dependabot-preview[bot] edited #3985
  • 17:59
    dependabot-preview[bot] synchronize #4251
  • 17:59
    dependabot-preview[bot] synchronize #3986
  • 17:59
    dependabot-preview[bot] synchronize #4066
  • 17:59

    dependabot-preview[bot] on nuget

    Bump NBench from 1.2.2 to 2.0.1… (compare)

  • 17:59

    dependabot-preview[bot] on nuget

    Bump NUnit from 3.6.1 to 3.12.0… (compare)

Ricky Blankenaufulland
@ZoolWay
@crucifieddreams If you list the cluster members from another node (I made myself some kind of monitoring and admin tool), is the exiting node really removed from the cluster? Most problems like that I get when the original was not really removed, I manually down them in this case.
Maxim Cherednik
@maxcherednik
if the node restarts, it should kick out the old one
Alex Gibson
@crucifieddreams
When a node gets into this state it doesn't leave cleanly. It tries to, I have a monitor running in all service discovery nodes (2 of them). They both report the cluster status that they see. When this problem happens the cluster status us everything UP and everything Seen. The leader gets the request that the node is exiting and this is logged every second that it is moving the node to exiting but it never exits. After 15 seconds my windows service will kill the service and failure detection will kick in. I manually down the node although just starting it again causes the cluster to see the new node and remove the old one. The cluster monitors both report the node is removed. The node rejoins and gets stuck.
Ricky Blankenaufulland
@ZoolWay
Does the windows service wait until the exit completed before it shuts down itself? Got lots of problems without graceful shutdown, I even got some examples up at github how I get it working. This is the code for a windows service using TopShelf: https://github.com/ZoolWay/akka-net-cluster-graceful-shutdown-samples/blob/master/TopShelfNode3/Worker.cs
critical is to wait with process exit until the member is really removed. @crucifieddreams
windows will restart immediately otherwise which is too early
Alex Gibson
@crucifieddreams
That's a good call we have similar code
But it times out after 15 seconds
Thanks :)
Maxim Cherednik
@maxcherednik
It's not that fast - sometimes it takes time.
Put more just to see how it goes.
the cluster will move it down only if it operates well
Alex Gibson
@crucifieddreams
I will remove the timeout from the
Maxim Cherednik
@maxcherednik
if there are other nodes missing - you will get a timeout
Alex Gibson
@crucifieddreams
Manual reset event we have there
And see how that goes
Maxim Cherednik
@maxcherednik
another thing with this approach - if node never joined the cluster and you try to stop it - it will get stuck forever here
Bartosz Sypytkowski
@Horusiath

@maxcherednik

if node never joined the cluster and you try to stop it - it will get stuck forever here

This sounds like a design issue /cc @Aaronontheweb

Maxim Cherednik
@maxcherednik
yep - i just didn't have time to report :)
Alex Gibson
@crucifieddreams
Interesting that is a useful piece of information, i didn't realise that.
:) thanks
Maxim Cherednik
@maxcherednik
btw Alex, just create an empty cluster without any logic and try to play around with all those edge cases
it helped me a lot
and 1.1.3 is way cleaner in terms of logging.
Alex Gibson
@crucifieddreams
It might be worth upgrading while I am making changes. Thanks for all your help folks.
Aaron Stannard
@Aaronontheweb
@crucifieddreams @maxcherednik akkadotnet/akka.net#2347 would that fix it?
sounds like that's what you need
Maxim Cherednik
@maxcherednik
Maybe, but I am not sure :)
Alex Gibson
@crucifieddreams
Looks promising, certainly it was an issue I wasn't aware of. I am just running some tests to see if that is the problem I am seeing.
Aaron Stannard
@Aaronontheweb
In this state it doesn't leave cleanly. It tries to, I have a monitor running in all service discovery nodes (2 of them). They both report the cluster status that they see. When this problem happens the cluster status us everything UP and everything Seen. The leader gets the request that the node is exiting and this is logged every second that it is moving the node to exiting but it never exits.
whoops
there we go
so I've suspected that we have an issuer with MemberRemoved not firing correctly
I've not been sure under what circumstances this occurs
no idea if that report gets shown to guests or not
but either way, this is the flaky test report for
ClusterSpec.A_cluster_must_complete_LeaveAsync_task_upon_being_removed
that information you just mentioned is very helpful. Confirms for me that this is a bug.
that under some circumstances, the MemberRemoved event is not received or processed correctly
if you have some logs from that situation you described, that would be helpful
Aaron Stannard
@Aaronontheweb
opened an issue, #2492
Alex Gibson
@crucifieddreams
I'll gather up some logs of what we see and post them on the issue log. Thanks!
Thomas Tomanek
@thomastomanek
#2491 has been opened btw
Chris Ochs
@gamemachine
so back trying to debug why distributedpubsub isn't working for me. basically after some time period publish just stops working. I enabled DEBUG logging and for a while I see 'Received Akk.Cluster.GossipStatus' messages, and then it just stops after some time, and that's when publish stops working also
Chris Ochs
@gamemachine
so more testing it looks like the connection is coincidental
Chris Ochs
@gamemachine
so it looks like possibly some bad logic in pruning. If I bump up pub-sub.removed-time-to-live the issue seems to go away. So what I was seeing is that pubsub worked until I unsubscribed from a topic, and that seemed to start a countdown where at the end, I couldn't publish to any topic. It's like the topic just isn't there anymore.
but you also can't resub to it either, it's like when it gets pruned it's then in a bad state where it's just not functional at all anymore
Aaron Stannard
@Aaronontheweb
@gamemachine would you mind capturing this in an issue?
can take a look at the pruning logic and see what's up there
but outlining the steps you took to produce the error and the behavior you've observed would be valuable to capture
Chris Ochs
@gamemachine
not at all, but given it seems time based it's difficult to debug without running from source.