These are chat archives for akkadotnet/akka.net

17th
Jun 2016
Aaron Stannard
@Aaronontheweb
Jun 17 2016 00:00
the NetworkStreamTransport was purpose built for Akka.NET
and is much faster than what we could achieve with Helios 2.0 or DotNetty
since it's not a whole entire framework
Esun Kim
@veblush
Jun 17 2016 00:00
because of the lack of support UDP in DotNetty, I am using Lidgren.Network
but UDP support is coming to DotNetty, i want to use it for my project. :smile:
Aaron Stannard
@Aaronontheweb
Jun 17 2016 00:02
not sure when it will happen
but it's in-demand
and personally I'd have fun working on that
UDP is fun
Esun Kim
@veblush
Jun 17 2016 00:03
lidgren.network is really good for client env but not enough for performant server situation.
btw i hope that MS guys write QUIC implementation on .NET.
which is promising on mobile network which traditional TCP slips on easily.
Stanley Goldman
@StanleyGoldman
Jun 17 2016 01:53
so i've been playing with an akka cluster, and am able to do some useful things with it
i'm trying to get it to run under mono
and the lighthouse runs under mono, but my worker application fails with
StanleyGoldman @StanleyGoldman is trying to look for the right markdown syntax for gitter
Stanley Goldman
@StanleyGoldman
Jun 17 2016 01:55
it fails with a EndpointWriter AssociationError
the full error is there
In the function SetLocal, client.LocalEndPoint is null
i checked for good measure, so is RemoteEndPoint
what would you suggest i look into next
mono/mono@884680e
wyldebill
@wyldebill
Jun 17 2016 03:25
i have a question about akka.net, clustering and the removal/convergence of a node that is removed from the cluster. i have the pluralsight sample running - a seed node, some worker nodes, and a client that schedules messages to the cluster router? when i kill a node, and there were only 2 or 3 nodes total i do not see the system 'converge' on the decision to remove the killed node from the cluster. i do see messages about the node being unreachable , and lighthouse messsages about disassociation - but the red error messages keep displaying in the other nodes as well as the lighhouse seed node.
frasermolyneux
@frasermolyneux
Jun 17 2016 07:22
Does anyone have some advise on IIS settings to improve integration with the cluster. I've turned recycling off but sometimes when I load the website a time out occurs as the node has to join the cluster
Completely non-Akka issue I know but if anyone has some ideas
Garrard Kitchen
@garrardkitchen
Jun 17 2016 07:41
@frasermolyneux Yup, I hit this one too. Spent hours / days looking for work around. There is a HOCON setting the timeout refers to but that didn't help me. I had to write code that effectively Terminates() cluster system if not joined (Akka.Cluster.Cluster.Get(ClusterSystem).RegisterOnMemberUp callback) within an arbitrary time (e.g. 30 secs). It loops until it does (some thread sleep time too - 1s). It works but there may be an additional delay for site to come online. It's a complete hack but like I say it works!
frasermolyneux
@frasermolyneux
Jun 17 2016 07:44
One hacky thing I thought of was just setting a cron to poll a the website every 30 seconds or so
Garrard Kitchen
@garrardkitchen
Jun 17 2016 07:51
yes, this is similar to something that @Aaronontheweb suggested way back when I was trying to deal with it by putting web servers behind LB so only healthy end-points are naturally available while hammering all web servers until they join the cluster. Our solution is LB / IaaS agnostic so couldn't go this route so had to come up with coded solution. Major hack and not proudest moment but it works.
hidavidpeng
@hidavidpeng
Jun 17 2016 09:12
Hi Can I try to use the Helios.DedicateThreadPool instead of the default ThreadPool/Task by CLR?
It seems provide a better performance about 4X than the default one:)
the dashboard
blob
Vagif Abilov
@object
Jun 17 2016 12:06
When using F# API for persistent actors, I see unhandled messages when I take snapshots. This is because an actor is defined as FunPersistentActor<'Command, 'Event, 'State>, so it can't handle SaveSnapshotSuccess and SaveSnapshotFailure messages. Is there a workround for this? /cc @Horusiath
Stanley Goldman
@StanleyGoldman
Jun 17 2016 15:40
To continue with my debugging on mono socket connections in helios
class MainClass
    {
        public static async Task<bool> ConnectTask(Socket client, IPAddress host, int port)
        {
            var connectTask = Task.Factory.FromAsync(
                (callback, state) => client.BeginConnect(host, port, callback, state),
                client.EndConnect,
                TaskCreationOptions.None
                );

            return await connectTask.ContinueWith(x =>
            {
                var result = x.IsCompleted && !x.IsFaulted && !x.IsCanceled;
                if (result)
                {

                }
                return result;
            }, TaskContinuationOptions.AttachedToParent | TaskContinuationOptions.ExecuteSynchronously);
        }

        public static void Main(string[] args)
        {
            var ipHostInfo = Dns.Resolve(Dns.GetHostName());
            var ipAddress = ipHostInfo.AddressList[0];
            var localEndPoint = new IPEndPoint(ipAddress, 11000);

            var server = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
            server.Bind(localEndPoint);
            server.Listen(1);

            Debug.Assert(server.LocalEndPoint != null, "client.LocalEndPoint != null");

            var client = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
            {
                ReceiveTimeout = 5,
                SendTimeout = 5,
                ReceiveBufferSize = 30
            };

            ConnectTask(client, ipAddress, 11000).ContinueWith(task =>
            {
                var b = client.LocalEndPoint == null;
                Console.WriteLine($"result: {task.Result}");
                Console.WriteLine($"LocalEndPoint == null: {b}");
            }).Wait();
        }
I copied that ConnectTask function out of Helios, into this simple example
if I run it in .NET, I get the following output
result: True
LocalEndPoint == null: False
if I run it in .Mono, I get the following output
failed to get 100ns ticks
failed to get 100ns ticks
result: True
LocalEndPoint == null: True
and that in essence is what is preventing me from running my cluster in Mono
maybe there are more
I'm hoping maybe @Aaronontheweb can give me some insight
Aaron Stannard
@Aaronontheweb
Jun 17 2016 17:23
@StanleyGoldman bad news on that front
new version of Helios is already out that replaces all of that
but can't be integrated into Akka.NET just yet, since it's a major version change and has breaking API changeds
Akka.NET 1.1 uses it internally
in other words - upgrading an Akka.NET 1.0.8 project to use Helios 2.1 won't work OOB
but when you upgrade to 1.1 it should work; we're about 1-2 weeks out still on 1.1
Stanley Goldman
@StanleyGoldman
Jun 17 2016 17:30
@Aaronontheweb cool, good to hear you've got it fixed, i can wait 2 weeks
Aaron Stannard
@Aaronontheweb
Jun 17 2016 17:31
glad to hear it
good lord my typos were bad in that response
lol
just re-read it
Stanley Goldman
@StanleyGoldman
Jun 17 2016 17:32
we are programmers, not english majors, noone is juding
;)
fitting that i misspell judging in that statement
wyldebill
@wyldebill
Jun 17 2016 18:53
i have a question about akka.net clustering and it's behavior. posted it late last night, but did not get any responses. if i have nodes on a cluster, and i kill one of those nodes - gossip should figure it out and determine the node is unreachable and quit trying to use it eventually right? the gossip works fine if a new node joins the cluster but lots of read errors/warnings when i drop a node from the cluster.
Aaron Stannard
@Aaronontheweb
Jun 17 2016 18:54
a cluster will perpetually try to reconnect to an unreachable node
until you mark the unreachable node as down
Bartosz Sypytkowski
@Horusiath
Jun 17 2016 18:55
@wyldebill there are different scenarios of downing unreachable nodes - the easiest (and most risky) is to set akka.cluster.auto-down-unreachable-after to some time value
wyldebill
@wyldebill
Jun 17 2016 19:05
thank you. i'll try that for starters
Aaron Stannard
@Aaronontheweb
Jun 17 2016 19:06
the risk with autodown, as @Horusiath points out
is that if there's any network instability at all
at any point in the course of running your cluster
you will end up with lots of small clusters
that can never talk to each other
and you will inevitably have data corrupt and data loss as a result
the reason why the cluster doesn't automatically evict unreachable nodes by default
is because that's the safest behavior
it waits for the end-user's software to issue a Cluster.Down command for unreachable nodes that are truly dead
100% of Dynamo style clusters work this way by default
Apache Cassandra, Riak, etc...
as an alternative - you can try using something like https://github.com/cgstevens/Akka.Cluster.Monitor
where a human being goes in and presses a button to down a node
my preferred solution for this - and this should be made easier when we add a pluggable "downing strategy" to Akka.Cluster
is to have some metadata API I can contact
like the EC2 API on AWS
or ARM on Windows Azure
use that to query the status of an unreachable node
and if that API indicates that the node has been terminated
then down it
Aaron Stannard
@Aaronontheweb
Jun 17 2016 19:11
if I can't contact the API or if it tells me that the node is still online
then notify a human or wait for the partition to resolve
one of the tradeoffs you make for high availability is you can't treat the cluster the same way you would a load balancer
since the health of the cluster is determined by the ability of its members to cast votes in favor of adding / removing members
when one of the members disappears without saying it was leaving first
that puts the cluster into a state where it can't unanimously vote anymore
so that has to factor into your plan for how you operate your nodes in a cluster; this will be a lot easier when 1.1 comes out
in 1.1 you'll be able to call Cluster.OnMemberRemoved - which fires after you've called Cluster.Leave and the node has been successfully removed from the cluster
then you can call MyActorSystemObject.Terminate and cleanly shutdown
Aaron Stannard
@Aaronontheweb
Jun 17 2016 19:17
does that make sense? Didn't mean to post that wall-o-text up there
but wanted to make sure you had the data
qwoz
@qwoz
Jun 17 2016 19:29
@Aaronontheweb you need someone who follows you around and notes down all this insight and documents it into a wiki article :smile:
Aaron Stannard
@Aaronontheweb
Jun 17 2016 19:29
@qwoz lol
qwoz
@qwoz
Jun 17 2016 19:30
maybe Akka.NET 2.0 will have a BrainToWikiActor implementation
Aaron Stannard
@Aaronontheweb
Jun 17 2016 19:30
if it makes you feel any better
after 1.1 ships I'm going to shifting gears to documentation and training materials
combination of written docs, youtube videos, and some petabridge blog posts
since this stuff is hard
and documentation takes time
qwoz
@qwoz
Jun 17 2016 19:32
yep, it's the same at my work. I sometimes get into a documentation phase where I write tons. Other times, it's sorely lacking. Now if only code either documented itself, or you could turn documents into code...
wyldebill
@wyldebill
Jun 17 2016 20:01
@Aaronontheweb yes, thanks. that's a lot to take in. still trying to get my head around how this works with actors and clustering. how does the heartbeat health check figure into that then? i had thought that after enough heartbeats return as dead, it would automagically remove it from the cluster.
qwoz
@qwoz
Jun 17 2016 20:27
@wyldebill this is a classic "network partition" problem. Imagine a cluster of only two machines separated by a network that can, on occasion, have an interruption. When the network goes down, enough heartbeats elapse and member A thinks that member B is dead, so A removes B from the cluster and carries on. Meanwhile, member B thinks that member A is dead, so B removes A from the cluster and carries on. Now you have both members A and B thinking that they are the authoritative cluster. Member A takes over member B's responsibilities. Similarly, member B takes over member A's responsibilities. Then the network resumes again, but in the meantime A and B have done redundant work (maybe processing duplicate financial trades) and you have a huge problem to undo.
Without a way to authoritatively identify "yes, I know this machine went offline purposefully" it's generally never a good idea to merely assume the other members have died and should be removed from the cluster.
Aaron Stannard
@Aaronontheweb
Jun 17 2016 20:39
that's a great explanation
Marc Piechura
@marcpiechura
Jun 17 2016 20:39
@qwoz have you resolved your issue with the TestScheduler?
qwoz
@qwoz
Jun 17 2016 20:41
TestScheduler still seems to be an issue, requiring a Timeout in the Expect call when using a scheduled tell.
Marc Piechura
@marcpiechura
Jun 17 2016 20:43
Yeah I responded to your issue that's the reason why I'm asking :)
qwoz
@qwoz
Jun 17 2016 20:45
thanks for the heads-up... I should have some better alerting setup for that stuff!
(turns out I had the wrong email address as a default in github... fixed)
Maciek Misztal
@mmisztal1980
Jun 17 2016 21:02
@qwoz @Aaronontheweb my current employeer has graced me with some unwanted overtime, so I've been less active lately, however I'm working on a blog post to describe how to create a SF seed node ;)
@peter-bannerflow In regards of service discovery, for the time being - I believe there are 2 options worth pursuing - 1. The SF Naming Service which exposes a REST api returning cluster information and 2. an Akka-based pub-sub solution
Aaron Stannard
@Aaronontheweb
Jun 17 2016 21:05
@mmisztal1980 awesome man, can't wait to read it
I'm looking into using SF myself for a project with Akka.Cluster once I ship 1.1
have to overhaul Petabridge's ecommerce system for a ton of reasons. Decided to dogfood Akka.Cluster on it for some enterprise integration stuff :p
Maciek Misztal
@mmisztal1980
Jun 17 2016 21:07
@Aaronontheweb I've got an upcoming week of vacation, so chances are I'll get this done soon (tm)
Aaron Stannard
@Aaronontheweb
Jun 17 2016 21:09
no worries man
my current status:
blob
"how to debug a multi-node test, step 1"
I'm in no rush :p
Maciek Misztal
@mmisztal1980
Jun 17 2016 21:12
yeaaaah, doesn't look like you're bored :D
Aaron Stannard
@Aaronontheweb
Jun 17 2016 21:14
this is actually kind of fun
it's a puzzle
but it takes a while
qwoz
@qwoz
Jun 17 2016 21:38
@mmisztal1980 awesome... in that case, I think I'll spare myself a weekend of banging my head at Service Fabric and wait for your blog post.
Aaron Stannard
@Aaronontheweb
Jun 17 2016 21:41
lol.... that feeling when you're utterly confused as to how a test could possibly fail on that line
then you realize you're looking at the wrong unit test
Maciek Misztal
@mmisztal1980
Jun 17 2016 21:42
lol, yeah - we've recently been searching for an elusive bug, turned out we were hunting in the wrong environment :D
Aaron Stannard
@Aaronontheweb
Jun 17 2016 22:12
@alexvaluyskiy think I just fixed the UID issue
totally caused by me mistranslating Scala pattern matching
Aaron Stannard
@Aaronontheweb
Jun 17 2016 22:19
re running the entire MNTR test suite locally first
to see if I can reproduce the failures that happened before
Alex Valuyskiy
@alexvaluyskiy
Jun 17 2016 22:30
I still don't have an idea why Scheduler.Schedule and ScheduleOnce was marked as obsoleted
qwoz
@qwoz
Jun 17 2016 22:38

In my tests, I create a pool from config using .WithRouter(FromConfig.Intance) which works well. However, if I specify .WithDispatcher(CallingThreadDispatcher.Id) an exception is thrown:

SetUp : Akka.Configuration.ConfigurationException : Dispatcher [akka.test.calling-thread-dispatcher] not configured for routees of path akka://mysystem/user/mypool

Is there a way to do this?

qwoz
@qwoz
Jun 17 2016 22:44
FromConfig.Instance appears to just be a static string akka.test.calling-thread-dispatcher. So it looks like I'm going to need to specify that as a dispatcher in HOCON for all actors.

Is there an equivalent to:

              deployment {
                / {
                  dispatcher = akka.test.calling-thread-dispatcher
                }

for specifying the dispatcher for all user actors that won't crash when the actor system is created? Or do I need to manually specify all top-level actors in the config?

qwoz
@qwoz
Jun 17 2016 22:53
answering my own question, this appears to just be:
akka {
   actor {
      dispatcher = akka.test.calling-thread-dispatcher