These are chat archives for akkadotnet/akka.net

31st
Oct 2017
Juan Lorenzo Hinojosa Hernandez
@MindD3v
Oct 31 2017 00:33

Hi guys,

I just ran into a weird scenario; I have a cluster with two nodes working as a Windows Service, node A contains an actor that has a reference to an actor in node B (remote actor); when stoping the windows service on node B, it'll perform a coordinated shutdown on node B. When I start node B windows service, it’ll join the cluster again, but the actor that holds the reference to the actor on node B doesn’t work anymore.

It’s not the reference to the remote actor the one that doesn’t work, it’s the actor holding the reference to the remote actor. It won’t receive or send messages anymore.

Any idea what’s happening ?

Arjen Smits
@Danthar
Oct 31 2017 07:27
Could it be that Node B has a different address now ? (Different port or something)
Jose Carlos Marquez
@oeaoaueaa
Oct 31 2017 10:53
did the actor error because was not able to "talk" with the remote actor and was recreated?
Bart de Boer
@boekabart
Oct 31 2017 10:53
@MindD3v is it possible that it somehow died, restarted, and because of that lost the reference or 'became' in a different state?
... ok, so, 10 hours and 20 minutes after the question, 2 people suddenly give the same suggestion in 1 second...?
Jose Carlos Marquez
@oeaoaueaa
Oct 31 2017 10:56
timezone differences :smile:
Jose Carlos Marquez
@oeaoaueaa
Oct 31 2017 11:03
anyone can review/merge AkkaNetContrib/Akka.DI.Ninject#11 please?
Garrard Kitchen
@garrardkitchen
Oct 31 2017 11:42
This is a little out there but...does a node consistently becoming unreachable just after 9 hours resonate with anybody (v1.2.0)? I've 2 nodes in cluster across 2 diff machines with the same role [consumer], one of them always becomes unreachable just after 9 hours but the other can go on functioning for days. Any pointers on what to check for?
Jose Carlos Marquez
@oeaoaueaa
Oct 31 2017 11:57
are you using lighthouse? could be network split?
Garrard Kitchen
@garrardkitchen
Oct 31 2017 12:09
Hi @oeaoaueaa, similar yes but my own version of it. Seed logs don't suggest split brain scenario
Kris Schepers
@schepersk
Oct 31 2017 12:51
Why is Akka.TestKit a dependency for Akka.Persistence.SqlServer?
Bart de Boer
@boekabart
Oct 31 2017 12:56
That must be a 1.3.2 regression...
Aaron Stannard
@Aaronontheweb
Oct 31 2017 13:16
lol
yikes
I should have caught that one on the PR
ugh
Bart de Boer
@boekabart
Oct 31 2017 13:18
Finally, proof that @Aaronontheweb is just human after all!
Aaron Stannard
@Aaronontheweb
Oct 31 2017 13:19
@boekabart lol!
Aaron Stannard
@Aaronontheweb
Oct 31 2017 13:26
I wish these were as low-magnitude as my screw-ups get :p
Aaron Stannard
@Aaronontheweb
Oct 31 2017 14:16
@oeaoaueaa that's live on NuGet now btw
Jose Carlos Marquez
@oeaoaueaa
Oct 31 2017 15:10
that's great thanks @Aaronontheweb , sorry for breaking the build the first time
Aaron Stannard
@Aaronontheweb
Oct 31 2017 15:10
all good
build systems are an under-appreciated artform
especially when you have to do wacky stuff like multi-targeting
Jose Carlos Marquez
@oeaoaueaa
Oct 31 2017 15:11
is easy to miss something, and then you get all green but no nugets
RoBiK75
@RoBiK75
Oct 31 2017 15:43
hi! i have a question regarding the use of a dispatcher. I have an actor that creates a context that is bound to a thread. I can use the PinnedDispatcher for this but i also need the actor to create children that should also run on the same thread because they need to work within the context created by the parent. I need to create multiple instances of such parent child groups, each having a dedicated thread. How could i accomplish this?
Juan Lorenzo Hinojosa Hernandez
@MindD3v
Oct 31 2017 15:55
@Danthar @boekabart the problem is not node B, I have an actor listening to cluster events, when a new member comes up, it'll refresh the reference.. (even if it has a new address), I'm also watching the remote actor with Context.Watch(remoteActor); and handling the local actor state based on that. But the local actor (the one that holds the reference to the remote actor), is the one that doesn't receive/send messages.
Bart de Boer
@boekabart
Oct 31 2017 17:16
is it stuck waiting for an async action to complete? How do you 'know' it's not receiving? (and why would you expect it to send anything without first receiving smth)
Juan Lorenzo Hinojosa Hernandez
@MindD3v
Oct 31 2017 17:24

so this is the scenario, I have this 3 actors: MyActor (nodeA), LocalActor (nodeA), RemoteActor (nodeB); after nodeB is stoped and started, LocalActor will receive a new reference for RemoteActor, then I consider my environment as ready. At this point, MyActor should be able to send messages to LocalActor, and LocalActor will send a message to RemoteActor.

By send message I'm referring to: _actor.Tell(new WhateverMessage());

my logs are showing that MyActor is doing the _actor.Tell(); but LocalActor never receives the message, and also I don't see those messages going to deadletters
Bart de Boer
@boekabart
Oct 31 2017 17:32
So I'm thinking it's "stuck" in the mailbox of LocalActor, because LocalActor is still dealing with a previous message. Possible?
Aaron Stannard
@Aaronontheweb
Oct 31 2017 17:33
@oeaoaueaa just spent about 45 minutes doing the same thing with the NUnit testkit plugin
same type of issue lol
speaking of which
Juan Lorenzo Hinojosa Hernandez
@MindD3v
Oct 31 2017 17:36
@boekabart possible, I'm using the IWithUnboundedStash.. I'll try to monitor the mailbox and stash
thanks :)
Bart de Boer
@boekabart
Oct 31 2017 17:37
That shouldn't be related UNLESS you stash the message (but you should be able to log/breakpoint that fact)
Juan Lorenzo Hinojosa Hernandez
@MindD3v
Oct 31 2017 17:38
I'm stashing the messages, but I'm also unstashing whenever I consider a change in the state of the actor
Bart de Boer
@boekabart
Oct 31 2017 17:49
Unstashing all ?
Juan Lorenzo Hinojosa Hernandez
@MindD3v
Oct 31 2017 17:50
yes :) Stash.UnstashAll();
Bart de Boer
@boekabart
Oct 31 2017 17:52
I think you'll find smth there, when you log all happening in your localactor. 99.9% it's the actor, not the system
Juan Lorenzo Hinojosa Hernandez
@MindD3v
Oct 31 2017 17:56
ok, thanks, I'll add some logging there; worst scenario, I'll try to reproduce this out of our application
Bart de Boer
@boekabart
Oct 31 2017 22:08
Strange thing - In unit tests for a persistent actor (test 'recovery after restart' behaviour), this happens about 50% of the time:
[WARNING][31-10-2017 22:05:39][Thread 0013][[akka://AsyncTest/user/$d#1447415842]] Restart caused by exception System.Exception: Actor restart request received.
[ERROR][31-10-2017 22:05:39][Thread 0013][akka://AsyncTest/user/$d] Stack empty.
Cause: System.InvalidOperationException: Stack empty.
   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.Stack`1.Peek()
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)
[WARNING][31-10-2017 22:05:39][Thread 0013][akka://AsyncTest/user/$d/$a] DeadLetter from [akka://AsyncTest/user/$d/$a#205270925] to [akka://AsyncTest/user/$d/$a#205270925]: <Received dead system message: <Suspend>>
The code for the restart:
            ((IInternalActorRef)Self).Restart(new Exception("Actor restart request received."));
is that code illegal, causing the aforementioned problem?
Bart de Boer
@boekabart
Oct 31 2017 22:18
Bottom line is, Actor never seems to get to PostRestart
[DEBUG][31-10-2017 22:19:19][Thread 0007][EventStream(AsyncTest)] Default Loggers started
PreStart
PreRestart
[WARNING][31-10-2017 22:19:20][Thread 0011][[akka://AsyncTest/user/$d#1274098419]] Restart caused by exception System.Exception: Actor restart request received.
PostStop
PostStop
[ERROR][31-10-2017 22:19:20][Thread 0011][akka://AsyncTest/user/$d] Stack empty.
Cause: System.InvalidOperationException: Stack empty.
   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.Stack`1.Peek()
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)
[WARNING][31-10-2017 22:19:20][Thread 0011][akka://AsyncTest/user/$d/$a] DeadLetter from [akka://AsyncTest/user/$d/$a#1818256669] to [akka://AsyncTest/user/$d/$a#1818256669]: <Received dead system message: <Suspend>>
Isn't that weird!? (just added COnsole.WriteLines to all the Pre/Post overrides, then calling base.
The times the test passes, the sequence looks:
PreRestart
[WARNING][31-10-2017 22:21:33][Thread 0011][[akka://AsyncTest/user/$d#951848533]] Restart caused by exception System.Exception: Actor restart request received.
PostStop
PostRestart
PreStart
I guess the PostStop is for the old instance, in case of the failing test, one of them
Bart de Boer
@boekabart
Oct 31 2017 22:24
But why does the new instance sometimes not get his PostRestart, PreStart calls...
Bart de Boer
@boekabart
Oct 31 2017 22:49
In fact, both PostStop calls are on the old instance. There never is a new instance created.
boekabart @boekabart is starting to get the feeling it might be related to running multiple tests that use InMemoryPersistence , in parallel...
Bart de Boer
@boekabart
Oct 31 2017 23:17
... or consecutively in 1 process... can no longer reproduce when running just the failing test alone, but only (intermittently) when running multiple tests in 1 go ...
Bart de Boer
@boekabart
Oct 31 2017 23:31
@Aaronontheweb no clue?
Aaron Stannard
@Aaronontheweb
Oct 31 2017 23:31
@boekabart sorry Bart, have been on a phone call this entire time
I'll see if I can take a look at it in a second once I'm done wrapping up some commits here
Bart de Boer
@boekabart
Oct 31 2017 23:32
gracias
RoBiK75
@RoBiK75
Oct 31 2017 23:34
anyone has any suggestions regarding my problem?
Bart de Boer
@boekabart
Oct 31 2017 23:37
Can you elaborate why having single threads per 'subtree' is a requirement?
RoBiK75
@RoBiK75
Oct 31 2017 23:40
it's about creating a CUDA context and working with it... CUDA context is bound a thread. if i want to make calls to a particular CUDA context it has to happen on that thread to which that context is bound
and i would like to have child actors for handling multiple cuda streams
Bart de Boer
@boekabart
Oct 31 2017 23:42
@Aaronontheweb I've changed the 'restart' to simply throw an exception in the Command<> handler; just needed some Hocon to enable top level actor restarting during unit tests... Since I've made the change, I can't get the error back
So I suppose calling Restart isn't perfectly safe. I got the code from a blog post...
Aaron Stannard
@Aaronontheweb
Oct 31 2017 23:45
ok, can take a look at it now lol
is that code illegal, causing the aforementioned problem?
yeah
basically the restart is something that gets handled by the parent
via supervision
and there's a managed process for cleanly restarting the actor
calling that Restart method on the internal IActorRef is part of it
but there might be another stage missing
IMHO, better way to test that
just have the actor throw the exception upon receiving a message of some kind
and have that actor run as the child of someone else
don't even need a custom supervision strategy or anything
Bart de Boer
@boekabart
Oct 31 2017 23:48
I just pass this into the c'tor of TestKit:
$@"akka.actor.guardian-supervisor-strategy = ""{typeof(DefaultSupervisorStrategy).AssemblyQualifiedName}"""
Aaron Stannard
@Aaronontheweb
Oct 31 2017 23:48
ahhh
was it not doing that by default before?
I don't remember offhand what the default guardian supervision strategy is
btw I want to address your issue too @RoBiK75 but I have to run in a minute here
RoBiK75
@RoBiK75
Oct 31 2017 23:49
sure, no problem
Bart de Boer
@boekabart
Oct 31 2017 23:49
It might be that 'our' default config doesn't, in order to not quitely ignore crashing Subject-Under-Tests
Aaron Stannard
@Aaronontheweb
Oct 31 2017 23:52
which blog post did you get this from?
Aaron Stannard
@Aaronontheweb
Oct 31 2017 23:56
haha
gotta run, but I'll be back later and will try to answer that in more depth