Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 06:43
    dependabot-preview[bot] labeled #4233
  • 06:43

    dependabot-preview[bot] on nuget

    Bump Microsoft.Extensions.Depen… (compare)

  • 06:43
    dependabot-preview[bot] opened #4233
  • 06:33
    dependabot-preview[bot] labeled #136
  • 06:33
    dependabot-preview[bot] opened #136
  • 06:33

    dependabot-preview[bot] on nuget

    Bump System.Data.SqlClient from… (compare)

  • 05:25
    Arkatufus commented #4232
  • 02:38
    Aaronontheweb labeled #4232
  • 02:25
    Aaronontheweb labeled #4232
  • 02:25
    Aaronontheweb milestoned #4232
  • 02:25
    Aaronontheweb labeled #4232
  • 02:25
    Aaronontheweb opened #4232
  • 02:24
    Aaronontheweb closed #4226
  • 02:24
    Aaronontheweb commented #4226
  • 02:23

    Aaronontheweb on dev

    Arkatufus/#4221 performance reg… (compare)

  • 02:23
    Aaronontheweb closed #4231
  • 01:27
    Aaronontheweb synchronize #4231
  • 01:27
    Aaronontheweb opened #4231
  • Feb 18 20:47
    Aaronontheweb commented #4223
  • Feb 18 20:46
    cptjazz commented #4223
Ricky Blankenaufulland
@ZoolWay
I cannot use the generic overload as the type is determined from string configuration and I only got it in variable of type Type. But I found another Props.Create() overload helpful and it fixes the logger:
Props props = Props.Create(actorType, para1, para2);
var actor = Context.ActorOf(props);
But thanks for your input!
Arjen Smits
@Danthar
ah. well I was about to suggest reflection binding and stuff to generate a typed delegate, such as in my example. Which you could then execute
but i forgot there is a simpler api :P
Ricky Blankenaufulland
@ZoolWay
:D
@crucifieddreams Restarting of that node in your case means the nodes leaves the cluster gracefully, process terminates, new process starts?
Alex Gibson
@crucifieddreams
Yes exactly :), it's a restart of a windows service with code for graceful exit. When a node is going to get into this state the leader sees the exit (it logs this every second) but never carries out the exit process. The node restarts and the leader removes the old incarnation and then won't rejoin (leader tries to bring it up and logs this fact every second). Other nodes join happily even if the rejoining node is stuck at joining which is confusing as the behaviour seems like a convergence problem.
Maxim Cherednik
@maxcherednik
@crucifieddreams ports are static?
Alex Gibson
@crucifieddreams
Yes all configured statically in hocon. We set up 53500-53520 as our port range and each node uses a different port in that range. 10 nodes exist on a single server and the other nodes are split across two other servers. The nodes don't share ports they have their own allocated even if running on another server.
Maxim Cherednik
@maxcherednik
usually node stucks in joining state if cluster does not have convergence . Are you sure that while some node got stuck in Joining -it's the only node which is out ?
Lutando Ngqakaza
@Lutando

I am getting some odd logs when i shedule a message using quartz this is what the logger says to me

[DEBUG][2017/01/31 12:28:14 PM][Thread 0037][akka://MySystem/user/my-system/my-coordinator/5kcfZkKW0ku4Uk-A6j8MFA/MPp3gd5y8EK1m-8snEuZZA] Unhandled message from akka://MySystem/user/quartz : DEFAULT.f6bdcd16-9950-41d1-894a-9453368679d2 with trigger DEFAULT.d3e56bf7-2c8d-48a6-bf3e-86a6646924d9/MPp3gd5y8EK1m-8snEuZZA has been created.

Ricky Blankenaufulland
@ZoolWay
@crucifieddreams If you list the cluster members from another node (I made myself some kind of monitoring and admin tool), is the exiting node really removed from the cluster? Most problems like that I get when the original was not really removed, I manually down them in this case.
Maxim Cherednik
@maxcherednik
if the node restarts, it should kick out the old one
Alex Gibson
@crucifieddreams
When a node gets into this state it doesn't leave cleanly. It tries to, I have a monitor running in all service discovery nodes (2 of them). They both report the cluster status that they see. When this problem happens the cluster status us everything UP and everything Seen. The leader gets the request that the node is exiting and this is logged every second that it is moving the node to exiting but it never exits. After 15 seconds my windows service will kill the service and failure detection will kick in. I manually down the node although just starting it again causes the cluster to see the new node and remove the old one. The cluster monitors both report the node is removed. The node rejoins and gets stuck.
Ricky Blankenaufulland
@ZoolWay
Does the windows service wait until the exit completed before it shuts down itself? Got lots of problems without graceful shutdown, I even got some examples up at github how I get it working. This is the code for a windows service using TopShelf: https://github.com/ZoolWay/akka-net-cluster-graceful-shutdown-samples/blob/master/TopShelfNode3/Worker.cs
critical is to wait with process exit until the member is really removed. @crucifieddreams
windows will restart immediately otherwise which is too early
Alex Gibson
@crucifieddreams
That's a good call we have similar code
But it times out after 15 seconds
Thanks :)
Maxim Cherednik
@maxcherednik
It's not that fast - sometimes it takes time.
Put more just to see how it goes.
the cluster will move it down only if it operates well
Alex Gibson
@crucifieddreams
I will remove the timeout from the
Maxim Cherednik
@maxcherednik
if there are other nodes missing - you will get a timeout
Alex Gibson
@crucifieddreams
Manual reset event we have there
And see how that goes
Maxim Cherednik
@maxcherednik
another thing with this approach - if node never joined the cluster and you try to stop it - it will get stuck forever here
Bartosz Sypytkowski
@Horusiath

@maxcherednik

if node never joined the cluster and you try to stop it - it will get stuck forever here

This sounds like a design issue /cc @Aaronontheweb

Maxim Cherednik
@maxcherednik
yep - i just didn't have time to report :)
Alex Gibson
@crucifieddreams
Interesting that is a useful piece of information, i didn't realise that.
:) thanks
Maxim Cherednik
@maxcherednik
btw Alex, just create an empty cluster without any logic and try to play around with all those edge cases
it helped me a lot
and 1.1.3 is way cleaner in terms of logging.
Alex Gibson
@crucifieddreams
It might be worth upgrading while I am making changes. Thanks for all your help folks.
Aaron Stannard
@Aaronontheweb
@crucifieddreams @maxcherednik akkadotnet/akka.net#2347 would that fix it?
sounds like that's what you need
Maxim Cherednik
@maxcherednik
Maybe, but I am not sure :)
Alex Gibson
@crucifieddreams
Looks promising, certainly it was an issue I wasn't aware of. I am just running some tests to see if that is the problem I am seeing.
Aaron Stannard
@Aaronontheweb
In this state it doesn't leave cleanly. It tries to, I have a monitor running in all service discovery nodes (2 of them). They both report the cluster status that they see. When this problem happens the cluster status us everything UP and everything Seen. The leader gets the request that the node is exiting and this is logged every second that it is moving the node to exiting but it never exits.
whoops
there we go
so I've suspected that we have an issuer with MemberRemoved not firing correctly
I've not been sure under what circumstances this occurs
no idea if that report gets shown to guests or not
but either way, this is the flaky test report for
ClusterSpec.A_cluster_must_complete_LeaveAsync_task_upon_being_removed
that information you just mentioned is very helpful. Confirms for me that this is a bug.
that under some circumstances, the MemberRemoved event is not received or processed correctly
if you have some logs from that situation you described, that would be helpful