These are chat archives for akkadotnet/akka.net

11th
Jul 2016
Ivan R. Perez
@irperez
Jul 11 2016 04:57
@Aaronontheweb I've put in a pull request for Lighthouse. Upgrading it to Akka 1.1. petabridge/lighthouse#19
Ivan R. Perez
@irperez
Jul 11 2016 05:30
I've also updated Akka.Logger.NLog to 1.1 akkadotnet/Akka.Logger.NLog#11
Peter Bergman
@peter-bannerflow
Jul 11 2016 07:40
If anyone feel like helping out with a cluster router question, check this out https://stackoverflow.com/questions/38301328/akka-net-round-robin-group-router-only-routing-to-one-routee
It's basically the same question that I posted here a couple of days ago, but I never got it to work properly...
Bartosz Sypytkowski
@Horusiath
Jul 11 2016 07:44
@peter-bannerflow I'm not sure, but how the router knows where it can find routees? Relative actor path is sufficient within local node, but on remotes usually a full address is needed
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 07:47
@Horusiath For my understanding when a node joins the cluster the router checks if that node has the requested role and if it has any potential routees. If it still needs routees (compared to max-nr-instances), it adds them. If any node providing routees leaves the cluster, the router removes the routees.
There is a message you can send to the router to get a list of the current routees. Maybe that can help finding out if the router really has no other routees.
Also, you should try to provide nr-of-instances = 10 here
Bartosz Sypytkowski
@Horusiath
Jul 11 2016 07:48
@ZoolWay so if you have node A with actor under path /user/workers/worker0 and node B with another actor under path /user/workers/worker0, which one will join the router?
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 07:48
maybe it defaults to 1 and when it got its first routee out of all the potential routee paths, it will stick with only that one.
@Horusiath Assuming both nodes have the same role? Assuming nr-of-instances is greater than 1?
I think the local routees will join first
but as long as nr-of-instances it not fullfilled all potential cluster nodes with the role should provide routees
providing full qualified routee addresses would result in a non-scalable cluster
that might be true if using akka.remote only but with cluster the node addresses should not matter
Bartosz Sypytkowski
@Horusiath
Jul 11 2016 07:53
taking a short look at the source, it looks like what I've told
@ZoolWay fully qualified path isn't any way less scalable than specifying routees in config on the first place
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 07:54
but the fully qualified contains IP addresses, don't they? oO
also I wonder why it works to transfer messages. As he told us the message are going through the other node - they are just not round-robin...
Bartosz Sypytkowski
@Horusiath
Jul 11 2016 07:55
so? Routees in config also assumes, that those actors have been incarnated on each node
to little info to reply, why that work on a single actor - maybe there is a one local instance already?
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 07:56
If its with IP, I cannot exchange a node with another one on another IP address without changing the configuration file. Which should be possible because of the role concept I believe
Peter Bergman
@peter-bannerflow
Jul 11 2016 07:56
@ZoolWay Yes, the messages seem to go through to the other node (which is also on another machine). I'll try adding the instance number property to see if that helps
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 07:57
Hm, seems like it is not one local instance then
I am always making that sure with allow-local-routees = off
Bartosz Sypytkowski
@Horusiath
Jul 11 2016 07:59
@peter-bannerflow are you sure, your actors have been created?
Peter Bergman
@peter-bannerflow
Jul 11 2016 08:01
@Horusiath I think so yes, after creating each worker (which is done by sending a msg to the actor supervising all workers) I print out the children of the supervisor, and they seem to all be there
Alright, so after adding nr-of-instances = 10 round robin works :) thanks for the help!
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 08:07
So it defaults to 1 and does not accept the additional routees after the first one joined. At least that I would assume now.
Peter Bergman
@peter-bannerflow
Jul 11 2016 08:12
Yep, looks like that
voltcode
@voltcode
Jul 11 2016 09:05
how's .net fringe so far @Aaronontheweb? Will there be any akka-related videos made available later on?
Alex Valuyskiy
@alexvaluyskiy
Jul 11 2016 10:28
@Horusiath could we merge it? akkadotnet/Akka.Persistence.MySQL#5
Bartosz Sypytkowski
@Horusiath
Jul 11 2016 10:35
merged
Alex Valuyskiy
@alexvaluyskiy
Jul 11 2016 10:35
thanks
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 12:57
Is there a documentation how to gracefully shutdown nodes? cluster.Leave() does not seem to have any effect on other nodes rather than on SelfAddress. The state of the nodes does not change and I seem to be unable to get rid of unreachable nodes
Max
@maxpaj
Jul 11 2016 13:26
@ZoolWay I've been digging the in same hole as you... I made a post about it in the Google Group today (https://groups.google.com/forum/#!topic/akkadotnet-user-list/UmNeyu3Xujc)
I'll do a post on StackOverflow too, I could link it in here when it's up
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 13:28
That would be great. I have observed that even without TopShelf I must say, with a pure ASP.NET Core project after the webhost terminated.
voltcode
@voltcode
Jul 11 2016 13:38
@maxpaj @ZoolWay > I think you have to use the Cluster.RegisterOnMemberRemoved method (https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka.Cluster/Cluster.cs#L259) to wait for the cluster.Leave operation to finish. Although at this point, I haven't upgraded to 1.1 yet so I'm just assuming that the method made it into the release (it didn't seem to exist in 1.0.8)
Sorry this may be my lack of skill with gitter. You can do a search in gitter for "Cluster.Leave" and pick second or third result from the top
that's the result I meant^
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 13:39
I guess basically you are right. But I got a running node showing all other nodes. And when I call Leave on the address of one of the other nodes, they will not leave even after many minutes.
voltcode
@voltcode
Jul 11 2016 13:40
Hmm, I never made another node leave the cluster, I only used it in a way where node itself says it's leaving.
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 13:41
I just got a little idea...
voltcode
@voltcode
Jul 11 2016 13:41
tell the node to leave?
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 13:41
When I do cluster.Down() before cluster.Leave() - it seems to work... will have another test though...
voltcode
@voltcode
Jul 11 2016 13:42
seems superfluous to down the node if you just want it to leave
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 13:53
I agree
But when handling unreachable, obsolete nodes I can now get rid of them this way
When it comes to self-leave for a graceful termination, it does not seem to work out
The RegisterOnMemberRemoved is there and is executed but the cluster state still lists the node when the process ended
The IntelliSense help on Leave notes something about it might be necessary to call Down when network failures occur or similar. That might explain that
Max
@maxpaj
Jul 11 2016 14:27
@ZoolWay if you happen to find a solution, please post here http://stackoverflow.com/questions/38309461/akka-net-cluster-node-graceful-shutdown
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 14:27
sure
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 15:17
@maxpaj Worked it out in a WPF app, will try out ASP.NET webhost and Topshelf tomorrow...
basically I am doing the same thing but in normal shutdown mode the app does not give the actor system the time to handle all the leaving stuff
and the same will go with the topshelf and webhost shutdown I guess
Chris G. Stevens
@cgstevens
Jul 11 2016 15:20
I ended up putting a Thread.Sleep(5000) after I do a .Leave or .Down to give a little bit of time for everything react.
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 15:35
I tried that in the ASP.NET core ApplicationStopping handler but did not seem to do the job, not sure why
Chris G. Stevens
@cgstevens
Jul 11 2016 15:37

This is what I have...

protected void Application_Stop()
        {
            SystemActors.ClusterHelper.Tell(new ClusterHelper.RemoveMember());

            Thread.Sleep(5000); // Give the Remove time to actually remove...

            ClusterSystem.Terminate();
        }

Which then the RemoveMember does...

Receive<RemoveMember>(mem =>
            {
                _logger.Warning("Service stopping; Issuing a Cluster.Leave() command for following address: {0}", Cluster.SelfAddress);
                Cluster.Leave(Cluster.SelfAddress);
            });
Ricky Blankenaufulland
@ZoolWay
Jul 11 2016 15:42
I see, but here it seems that the app does not wait for Application_Stop() to complete before the app terminates. Not sure how that is possible
Ah, ASP.NET core executes the application stopping thing in another thread and the main thread drops back to program.cs. Not sure why they did that not waiting for the stopping event to complete
Have to go for today. But thanks for your input Chris!
Chris G. Stevens
@cgstevens
Jul 11 2016 15:45
thanks for the input as well... I didn't know that
Aaron Stannard
@Aaronontheweb
Jul 11 2016 18:51
@maxpaj I'll take a look at that issue
I've been at .NET Fringe since Saturday so I've run behind on checking in on Gitter / SO
Aaron Stannard
@Aaronontheweb
Jul 11 2016 18:57
@maxpaj ah, I see - you're worried about the other node's exceptions
I tried to change the logging on that in 1.1 to make it less scary
but as long as it shows that the node has "left" and those exceptions you're seeing are "ShutdownAssociation" exceptions
then they're harmless
TL;DR; the EndpointManager logs an exception even for graceful shutdowns
on the other nodes
I tried to make it stop doing that for ShutdownAssociation cases
which are thrown only when a node intentionally shuts down
looks like I must have missed a spot
reason why we throw an exception: we use the supervision strategy of the EndpointManager to guarantee that the underlying hierarchy of remoting system actors gets cleaned up after an association terminates
so we don't leak resources
the hierarchy looks like: EndpointManager --> ReliableDeliverySupervisor --> EndpointWriter --> EndpointReader
and the EndpointReader is what detects the disassociation / shutdown
so that exception has to travel up the stack a few times
if you could post the exception you're seeing onto the question I could give you a better answer