These are chat archives for akkadotnet/akka.net

17th
Apr 2018
Arsene Tochemey GANDOTE
@Tochemey
Apr 17 2018 09:42
Hello I am having challenges create actors with Netcore where the actors need an injection of a repository at startup.
At any time I got System.ObjectDisposedException in the actor when accessing the repository interface
Bartosz Sypytkowski
@Horusiath
Apr 17 2018 10:30
@Tochemey how and when are you creating your actors? Example: If you create an actor in some managed other context (i.e. http request handler) using that context's lifecycle to inject repository, it's quite probable that this repository will be disposed, while actor still lives - as actor context may live longer that context of http request.
Arsene Tochemey GANDOTE
@Tochemey
Apr 17 2018 10:30
Ok
Arsene Tochemey GANDOTE
@Tochemey
Apr 17 2018 10:59
@Horusiath Please is there a way to create actors with DI where the actors has constructors params?
Hyungho Ko
@hhko
Apr 17 2018 15:07
My app got this log.
The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
How to solve it?
When it occur?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 15:08
occurs when a system message, which normally has reliable delivery attached to it, fails to get delivered
i.e. stopping an actor
death watching an actor
etc
sending a Termination notice back
way to fix it is to restart the quarantined system
that quarantine signal basically means that the state of your ActorSystem is corrupt as far as the remote system is concerned
you can subscribe to a ThisSystemQuarantined event
from the EventStream
that you'll receive that signal when you are quarantined by someone else
Hyungho Ko
@hhko
Apr 17 2018 15:10
Yes I can subscribe to a "ThisSystemQuarantined" event.
but I don't know how to restart ActorSystem on run-time?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 15:11
you'd want to have something that sits outside the ActorSystem
that you can invoke from an actor
so in the case of a Topshelf service
I could have a static method I could invoke
that would dispose the old ActorSystem and re-create it
Hyungho Ko
@hhko
Apr 17 2018 15:14
Could you recommend some code or example for it?
I am wondering while ActorSystem re-creating
the remote another actor can send some messages to that?
is it ok?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 15:17
if you get quarantined by one system
that's only a 1:1 relationship
you might not be quarantined by the others
but I don't have any sample code on-hand unfortunately :(
Jack Wild
@jackowild
Apr 17 2018 15:23
@Horusiath Please is there a way to create actors with DI where the actors has constructors params?
I would also like to know the answer to this one.
At the moment I create the actor using Context.ActorOf(Context.DI().Props<MyActor>())
Then send in an initialisation message which contains some state I want the actor to have. I'd rather pass this state into the constructor, along with the dependencies I want resolving using Akka.DI. Is this possible? Or are we asking for too much...
Hyungho Ko
@hhko
Apr 17 2018 15:23
when quarantined state, must ActorSystem re-start?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 15:46
yep
that's the only way to clear the quarantine
Hyungho Ko
@hhko
Apr 17 2018 16:00
@Aaronontheweb thank you for your consideration.
Bartosz Sypytkowski
@Horusiath
Apr 17 2018 17:08
@jackowild there's no way atm to suply only part of the params
Lutando Ngqakaza
@Lutando
Apr 17 2018 19:46

Whats the proper way to test my actors in a clustered environment. This? http://getakka.net/articles/networking/multi-node-test-kit.html

Lets say i want to test a very simple case of a message being sent through a cluster proxy being received by an actor bound to receive that message on the actual cluster?

Aaron Stannard
@Aaronontheweb
Apr 17 2018 19:46
@Lutando I really need to do some videos on the multi-node testkit
which is the proper way to do integration testing with Akka.Cluster
let me grab you an example from the OSS real fast
so these are the specs we use to integration test Akka.Cluster itself
however, all of the tools for doing this are available on NuGet
https://www.nuget.org/packages/Akka.MultiNodeTestRunner/ - our specialized Xunit2 test runner
works on .NET and .NET Core
simulates a cluster by spawning multiple external processes, each one acting as an individual node
has the ability to do things like introduce packet loss, latency, etc
defines all of the methods needed to create a spec that can be run by the multi node test runner (MNTR)
it's the "happy path" spec for Akka.Cluster
spin up a cluster of N nodes
verify that everyone can join
and that's it
public class SunnyWeatherNodeConfig : MultiNodeConfig
    {
        public RoleName First { get; set; }

        public RoleName Second { get; set; }

        public RoleName Third { get; set; }

        public RoleName Fourth { get; set; }

        public RoleName Fifth { get; set; }

        public SunnyWeatherNodeConfig()
        {
            First = Role("first");
            Second = Role("second");
            Third = Role("third");
            Fourth = Role("fourth");
            Fifth = Role("fifth");

            CommonConfig = ConfigurationFactory.ParseString(@"
                akka.actor.provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
                akka.loggers = [""Akka.TestKit.TestEventListener, Akka.TestKit""]
                akka.loglevel = INFO
                akka.remote.log-remote-lifecycle-events = off
                akka.cluster.failure-detector.monitored-by-nr-of-members = 3
            ");
        }
    }
you subclass MultiNodeConfig and define a spec-specific configuration class
this tells the MNTR how many nodes you need, what they should be called, and how they should have their ActorSystems configured
so this is a five node cluster
public class SunnyWeatherSpec : MultiNodeClusterSpec
    {
        private class Listener : UntypedActor
        {
            private readonly AtomicReference<SortedSet<Member>> _unexpected;

            public Listener(AtomicReference<SortedSet<Member>> unexpected)
            {
                _unexpected = unexpected;
            }

            protected override void OnReceive(object message)
            {
                message.Match()
                    .With<ClusterEvent.IMemberEvent>(evt =>
                    {
                        _unexpected.Value.Add(evt.Member);
                    })
                    .With<ClusterEvent.CurrentClusterState>(() =>
                    {
                        // ignore
                    });
            }
        }

        private readonly SunnyWeatherNodeConfig _config;

        public SunnyWeatherSpec() : this(new SunnyWeatherNodeConfig())
        {
        }

        protected SunnyWeatherSpec(SunnyWeatherNodeConfig config) : base(config, typeof(SunnyWeatherSpec))
        {
            _config = config;
        }

        [MultiNodeFact]
        public void SunnyWeatherSpecs()
        {
            Normal_cluster_must_be_healthy();
        }

        public void Normal_cluster_must_be_healthy()
        {
            // start some
            AwaitClusterUp(_config.First, _config.Second, _config.Third);
            RunOn(() =>
            {
                Log.Debug("3 joined");
            }, _config.First, _config.Second, _config.Third);

            // add a few more
            AwaitClusterUp(Roles.ToArray());
            Log.Debug("5 joined");

            var unexpected = new AtomicReference<SortedSet<Member>>(new SortedSet<Member>());
            Cluster.Subscribe(Sys.ActorOf(Props.Create(() => new Listener(unexpected))), new[]
            {
                typeof(ClusterEvent.IMemberEvent)
            });

            foreach (var n in Enumerable.Range(1, 30))
            {
                EnterBarrier("period-" + n);
                unexpected.Value.Should().BeEmpty();
                AwaitMembersUp(Roles.Count);
                AssertLeaderIn(Roles);
                if (n % 5 == 0)
                {
                    Log.Debug("Passed period [{0}]", n);
                }
                Thread.Sleep(1000);
            }

            EnterBarrier("after");
        }
    }
next you define the spec class itself
it'll take the config type you defined as an argument
Aaron Stannard
@Aaronontheweb
Apr 17 2018 19:51
and then it has some special methods that you can use for writing a distributed unit test
RunOn(() => { }, role);
specifies on which node this code can run
EnterBarrier("barrier-name") is a synchronization barrier across processes
Lutando Ngqakaza
@Lutando
Apr 17 2018 19:52
@Aaronontheweb thanks, this is amazing,
Aaron Stannard
@Aaronontheweb
Apr 17 2018 19:52
if you have 5 nodes in the cluster
all five nodes must cross the barrier at the same time
if one node is still doing work
the other four nodes will wait for it
up until about 30 seconds I think
for running the MNTR executable itself
our build script has a good example of how to do that: https://github.com/akkadotnet/akka.net/blob/dev/build.fsx#L168
Lutando Ngqakaza
@Lutando
Apr 17 2018 19:54
So there are MNTR = multi node test x ?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 19:55
yeah
MNTR = multi node test runner
is the abbreviation we use
Lutando Ngqakaza
@Lutando
Apr 17 2018 19:55
oh
Aaron Stannard
@Aaronontheweb
Apr 17 2018 19:55
you can have dozens of MNTR specs defined in the same assembly
each one gets its own cluster
and these specs, by their very nature, take a while to run
and they all use XUnit2
since that's what our custom test runner uses
but it's pretty cool - outside of the Akka.NET OSS project I've helped users put the MNTR to work for things like "what happens if all of these nodes get disconnected for 10 seconds in the middle of this type of operation?"
Lutando Ngqakaza
@Lutando
Apr 17 2018 19:56
I think i will dive into this in detail tomorrow, its quite important for me to test multi node scenarios
Aaron Stannard
@Aaronontheweb
Apr 17 2018 19:56
so they can verify that stuff like reliable delivery works correctly
and disaster recovery too
had one user who wanted to simulate that messages could be persisted, recovered, and retransmitted after a catastrophic crash
was able to do that with MNTR
Lutando Ngqakaza
@Lutando
Apr 17 2018 19:58
fantastic
Aaron Stannard
@Aaronontheweb
Apr 17 2018 19:58
that's really what this stuff is for
ensuring that your application works even when the network does weird, unexpected, and hopefully infrequent things
I should really get around to doing a video or a tutorial on that stuff
since we don't have much in the way of documentation on it today
Onur Gumus
@OnurGumus
Apr 17 2018 22:51
What are system messages regarding the quarantining ?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 22:51
there's only a handful that can be sent over the network
Onur Gumus
@OnurGumus
Apr 17 2018 22:51
For example ?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 22:51
it's usually Context.Watch and the deathwatch stuff
we have reliable delivery built into Akka.Remote
for those messages specifically
and any of the other system messages, like Context.Stop
Onur Gumus
@OnurGumus
Apr 17 2018 22:52
I am also confused about quarantining. I thought quarantining was for actors specifically, but addresses also quarantined right ?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 22:52
quarantining is an entire ActorSystem
here's the rationale
two systems, A and B
both death-watching a bunch of actors on each other
which is pretty standard
A and B start experiencing some connection difficulties
and during those connection difficulties
system B starts killing off actors on its own (i.e. shutting down idle actors according to however the user programmed it)
if those death watch notifications can't make it to system A
and system A can't be notified that those actors on system B are terminated
because the connection difficulties disrupt the ability for those messages to be sent back repeatedly
i.e. one minor hiccup in the network isn't enough to invoke a quarantine
Onur Gumus
@OnurGumus
Apr 17 2018 22:54
I understand. Suppose that, I don't watch anything myself, any remaining system messages?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 22:54
no
deathwatch is the biggest producer of system messages over the network
second biggest is remote deployments
Onur Gumus
@OnurGumus
Apr 17 2018 22:55
Ok but Clustering auto implies system messages right ?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 22:55
since the Create / Fault / Shutdown messages used to start and stop actors are all system messages
Onur Gumus
@OnurGumus
Apr 17 2018 22:55
I mean if I don't watch anything my self, but I have a cluster, then quarantining again is in charge.
Aaron Stannard
@Aaronontheweb
Apr 17 2018 22:55
ah, I see what you're asking
let's make a distinction
ISystemMessage == " a system message"
most messages sent by Akka.NET's internal actors don't implement that interface
that includes the remoting, clustering, and persistence control messages
they're processed just like normal user-land messages
ISystemMessages are part of core Akka
and are used around the area of actor lifecycles
they are responsible for triggering the startup, restart, and shutdown of all actors
therefore, outside of creating / killing / death-watching (lifecycle monitoring) of actors
ISystemMessages don't really get used
Onur Gumus
@OnurGumus
Apr 17 2018 22:58
Okay.
Aaron Stannard
@Aaronontheweb
Apr 17 2018 22:58
if two cluster nodes are gossiping back and forth
and they get disrupted for a long period of time
days, even
they can still heal that broken connection and re-connect without quarantines usually
quarantines are more likely to happen in raw Akka.Remote than Akka.Cluster because the mechanism we use for invoking termination of remotely deployed / monitored actors is coupled to the health of the connection in Akka.Remote
so if you have a period of turbulent network activity, that may cause some DeathWatch notifications to get produced which in turn, may not be delivered because the network is having issues
Akka.Cluster doesn't really suffer from that issue because deathwatch is totally decoupled from connectivity
instead deathwatch is driven by changes in a node's membership status
i.e. only gets triggered when a node leaves the cluster
Onur Gumus
@OnurGumus
Apr 17 2018 23:01
Actually I have a cluster and I am having frequent quarantining issues with it.
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:02
ok, first thing I have to ask: are you using auto-down?
Onur Gumus
@OnurGumus
Apr 17 2018 23:02
No
The thing is one of the nodes run some native devices. And I think of of them memory leaks
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:02
oh right, I think I remember you telling me about your app before
Onur Gumus
@OnurGumus
Apr 17 2018 23:02
Which corrupts CLR putting it into an inconsistent state.
Yes.
Then even the network connection goes down
I thought quarantining is pretty much standard for a cluster if one of the nodes become unresponsive.
I use distributed pub sub via mediator btw
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:04
well, let me take a look again real quick
Onur Gumus
@OnurGumus
Apr 17 2018 23:06
So?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:06
gotta give me a second to write out an explanation man
Onur Gumus
@OnurGumus
Apr 17 2018 23:06
:)
so that spec defines the range of things that can trigger a quarantine in Akka.Cluster
Onur Gumus
@OnurGumus
Apr 17 2018 23:07
I might be wrong with my observations though.
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:07
  1. too many outstanding, unacknowledged system messages
  2. Node that was DOWNED by the cluster
case number 2 is probably the more common one
but you don't care anyway because once that node leaves the cluster it can't rejoin without restarting
Onur Gumus
@OnurGumus
Apr 17 2018 23:08
Since I don't down any node.
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:08
so it's a wash
in that case your issue is probably case 1
Onur Gumus
@OnurGumus
Apr 17 2018 23:09
I wonder what those system messages be
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:09
there might be system actors deathwatching each other as part of distributed pub sub
mediators watching each other's replicas
been a while since I've looked at the code
Onur Gumus
@OnurGumus
Apr 17 2018 23:09
Are they?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:10
let me look
yep
that'd do it
let me check something in Akka.Remote real quick
Onur Gumus
@OnurGumus
Apr 17 2018 23:11
Yeah. I wonder if this should be the case though.
I mean ideally...
Probably this is not what I would want in most cases. That is just because I use distributed pub sub, it shouldn't cause quarantining. After all events are something you publish and forget.
I agree
if you can put some logs up and some details around how you're using it
maybe we're doing something wrong
quarantine should be rare
Onur Gumus
@OnurGumus
Apr 17 2018 23:13
okay. I will check.
Quarantine(member.Address, member.UniqueAddress.Uid);
what is uniquqe address for a member
Each node also has an incarnation like an actor/
?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:14
yep
same implementation
ActorSystems have a unique id
Onur Gumus
@OnurGumus
Apr 17 2018 23:14
so the node incarnation is quarantined.
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:15
and that's what we use for detecting restarts
correct
rebooting the node and changing the UID
is all it takes to clear it
Onur Gumus
@OnurGumus
Apr 17 2018 23:15
okay
Another question is I saw hardbeat and failure detection for both remoting and clustering
Heartbeat*
I wonder which one takes in effect?

`
transport-failure-detector {

  # FQCN of the failure detector implementation.
  # It must implement akka.remote.FailureDetector and have
  # a public constructor with a com.typesafe.config.Config and
  # akka.actor.EventStream parameter.
  implementation-class = "Akka.Remote.DeadlineFailureDetector,Akka.Remote"

  # How often keep-alive heartbeat messages should be sent to each connection.
  heartbeat-interval = 4 s

  # Number of potentially lost/delayed heartbeats that will be
  # accepted before considering it to be an anomaly.
  # This margin is important to be able to survive sudden, occasional,
  # pauses in heartbeat arrivals, due to for example garbage collect or
  # network drop.
  acceptable-heartbeat-pause = 20 s
}

`

we have thos one in transport failutre-detector

and below for cluster:
`
failure-detector {

  # FQCN of the failure detector implementation.
  # It must implement akka.remote.FailureDetector and have
  # a public constructor with a com.typesafe.config.Config and
  # akka.actor.EventStream parameter.
  implementation-class = "Akka.Remote.PhiAccrualFailureDetector, Akka.Remote"

  # How often keep-alive heartbeat messages should be sent to each connection.
  heartbeat-interval = 1 s

  # Defines the failure detector threshold.
  # A low threshold is prone to generate many wrong suspicions but ensures
  # a quick detection in the event of a real crash. Conversely, a high
  # threshold generates fewer mistakes but needs more time to detect
  # actual crashes.
  threshold = 8.0

`

Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:18
so both run at the same time
but Akka.Cluster nodes use the akka.cluster HOCON settings
to monitor each other
Onur Gumus
@OnurGumus
Apr 17 2018 23:19
So if I want my nodes to be more tolerant against GC pauses, which settings I should tune?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:19
if I connected to a node in my Akka.Cluster using just Akka.Remote
I'd end up using the transport-failure-connector
Onur Gumus
@OnurGumus
Apr 17 2018 23:19
or less tolerant
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:19
I know what you mean
you want to keep the connection alive if the heartbeat interval lapses longer?
Onur Gumus
@OnurGumus
Apr 17 2018 23:20
yeah
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:20
I'd change the Akka.Cluster settings to look more like the Akka.Remote ones
Onur Gumus
@OnurGumus
Apr 17 2018 23:20
I am confused with two different heartbeat settings
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:20
the deadline failure detector uses a much simpler algorithm
akka.cluster.failure-detector.acceptable-heartbeat-pause = 30s
that'd give you 10x the amount of leeway you have now
Onur Gumus
@OnurGumus
Apr 17 2018 23:21
But you said go with transport-failure-connector
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:22
I forgot that I could just tune that setting
without getting into more detail that I need to here since I have some other work I need to get back to
Onur Gumus
@OnurGumus
Apr 17 2018 23:22
alright
one last thing not a question but you might be interested in
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:22
I'd probably just pump up the acceptable-heartbeat-pause setting on the akka.cluster HOCON and see if that does it
Onur Gumus
@OnurGumus
Apr 17 2018 23:23
do you use or know about Visual Studio Intellitrace ?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:23
@/all https://github.com/akkadotnet/akka.net/releases/tag/v1.3.6 - Akka.NET v1.3.6 is now live on NuGet. You can see the full release notes here on the left.
Thanks to all of our contributors who reported bugs, submitted patches, fixed typos and broken links in the docs
You all rock. We appreciate it. Lots of bug fixes in this release.
And most importantly, updated Akka.FSharp to run on .NET Core now.
Onur Gumus
@OnurGumus
Apr 17 2018 23:24
I thought Akka.FSharp is dead
Bartosz suggested we should go with Akkling
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:25
Akka.FSharp is definitely not dead; Akkling is more expressive but is not subject to the strict versioning controls we put on the project
Onur Gumus
@OnurGumus
Apr 17 2018 23:25
okay, have you read my question about Intellitrace ?
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:25
yep
saw that
Onur Gumus
@OnurGumus
Apr 17 2018 23:26
It is utility of VS enterprise edition.
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:26
I used it when I worked at Microsoft
Azure Cloud Services et al
Onur Gumus
@OnurGumus
Apr 17 2018 23:26
Ah ok you know it very well
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:26
wouldn't go thaaaaaaaaaat far :p
but I'm familiar with it
Onur Gumus
@OnurGumus
Apr 17 2018 23:26
I just wrote a tiny tracer for akka
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:26
oh nice!
working well for you?
Onur Gumus
@OnurGumus
Apr 17 2018 23:26
let me share a screenshot
image.png
Basically all collect all ActorRefBase.Tell events
you can go back at that point in time
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:28
that is pretty cool
Onur Gumus
@OnurGumus
Apr 17 2018 23:28
see the state of your application
and the events for above screenshot collected without VS, but using stand alone collector
I find this incredibly useful when debugging
Aaron Stannard
@Aaronontheweb
Apr 17 2018 23:46
:+1: