These are chat archives for akkadotnet/akka.net

8th
Jul 2016
Arsene Tochemey GANDOTE
@Tochemey
Jul 08 2016 06:25
Hello Congrats to you Guys for releasing AKKA.NET 1.1.0 . However I would like someone to educate me a bit on the usability of Akka Stream since it seems the main man in the release.
Marc Piechura
@marcpiechura
Jul 08 2016 06:46
@Tochemey have you read the docs http://getakka.net/docs/#akka-streams ?
Ricky Blankenaufulland
@ZoolWay
Jul 08 2016 07:48
Hi! I was hoping for a bit more stability with 1.1.0 and Akka.Cluster. But still I cannot get some common scenarios to work. Maybe I am doing it wrong. So I fell back to try to form a cluster with 3 Lighthouses, each stating the other 2 as seed-nodes.
I understand I get errors when the first joins up and does not find the other two.
But all three continue reporting errors even when all 3 are running
Basically even the last started seed-node says the other two are invalid addresses. So I guess the seed-nodes do not start offering themselves until they found the another seed-nodes? (looks like hen-egg)
When I have just one lighthouse and a normal member, it can join and with multiple members it works. But I always get errors when members are leaving or when I stop the Lighthouse and start it again - looks like the cluster remains in an invalid state and does not heal itself
Ricky Blankenaufulland
@ZoolWay
Jul 08 2016 08:35
based on the seed node configuration: http://getakka.net/docs/clustering/cluster-overview#enabling-akka-cluster The seed node should mention itself as seed-node. but when having multiple seed-nodes, should the seed-nodes option contain only the other seed-nodes or myself and then the others? Would the order of seed-nodes matter?
Peter Bergman
@peter-bannerflow
Jul 08 2016 09:17
A question about cluster routers (strategy = round-robin-group). I can't get it to round robin between my routees in a clustered setup. My setup is as follows: Node A, starting its actor system in the context of ASP.NET. In there I have a API controller that talks to actor A1, this actor then talks to my router. This router should then round robin the messages to its routees that are created on node B. Node B creates its actor system within the context of a windows service. I have double checked that all of my routees on node B are actually created (done at service startup). In order to check which of the routees that receive the message sent from the router at node A, I just console log Self.Path in my Receive handle. For some reason, the same routee gets all the messages and they are not distributed around the routees. Perhaps I am doing something really wrong in my setup.... Se config on node A below.
/workerRouter {
  router = round-robin-group # routing strategy
  routees.paths = [
      "/user/workers/worker0",
      "/user/workers/worker1", 
      "/user/workers/worker2",
      "/user/workers/worker3",
      "/user/workers/worker4",
      "/user/workers/worker5",
      "/user/workers/worker6",
      "/user/workers/worker7",
      "/user/workers/worker8",
      "/user/workers/worker9"
   ]
   cluster {
      enabled = on
      #allow-local-routees = on
      use-role = worker
   }
}
Bartosz Sypytkowski
@Horusiath
Jul 08 2016 10:07
@peter-bannerflow shouldn't routees.paths give a full paths, if the routees are not present on the current machine?
Ricky Blankenaufulland
@ZoolWay
Jul 08 2016 10:11
@peter-bannerflow @Horusiath I don't think so, I am noting them that way too. But I also forbid local routees. And have not checked if he really does round-robin
Is it always the same routeee?
Peter Bergman
@peter-bannerflow
Jul 08 2016 10:46
@ZoolWay As far as I can see, its the same routee everytime. Althought, different routees each time I start the application. I have communication between two other nodes but through a broadcast-group and that seem to broadcast to all routees even though I don't specify the full path.
Hmm never mind the thing about different routee each time I start the application, it seem to be the same even between restarts.
Peter Bergman
@peter-bannerflow
Jul 08 2016 10:59
Eh now that I think about it, it kinda feels like I should use the full path. I mean, if I have two instances of the worker role up and running, both of them starting worker actors under the same path. How should the router then know where to send the message...
Maciek Misztal
@mmisztal1980
Jul 08 2016 11:01
Hi guys, I've got an Akka.Persistence question, should we choose to migrate from store A to store B in production, how should we proceed with the migration?
Bartosz Sypytkowski
@Horusiath
Jul 08 2016 11:11
@mmisztal1980 in general, write an application that will replay all events in place and try to persist them in second store
Bartosz Sypytkowski
@Horusiath
Jul 08 2016 11:22
this should be somewhat the code necessary:
foreach(var persistenceId in db.GetAllPersistenceIds()) {
    journalA.Tell(new ReplayMessages(fromSequenceNr: 0L, toSequenceNr: long.MaxValue, max: long.MaxValue, persistenceId: persistenceId, persistentActor: receiver);
}
// inside actor responsible for replaying
Receive<ReplayedMessage>(msg => {
    journalB.Tell(new WriteMessages(new []{  new AtomicWrite(msg.Persistent) }, Self, actorInstanceId: 1));
});
Maciek Misztal
@mmisztal1980
Jul 08 2016 11:36
btw, do we have an Akka.Persistence.ElasticSearch package?
Bartosz Sypytkowski
@Horusiath
Jul 08 2016 11:37
I haven't heard of any
Maciek Misztal
@mmisztal1980
Jul 08 2016 11:38
we're considering using this store, so I might end up coding it ;)
Bartosz Sypytkowski
@Horusiath
Jul 08 2016 11:42
I'm not sure if ElasticSearch is a something fit for eventsourcing model, but I'm not expert of it. Maybe it'll be easier to have a persistence store backed with akka-persistence-query, and use akka-streams to subscribe for incoming events and replay them on elastic search instead?
that way you can maintain shape in the elastic search that fits your read-model, while still having eventsourced write model
Maciek Misztal
@mmisztal1980
Jul 08 2016 11:56
btw, how does Akka.Persistence.Query compare to Even? I haven't seen any docs on it yet?
Bartosz Sypytkowski
@Horusiath
Jul 08 2016 11:57
akka persistence query is bascially akka streams on top of the akka persistence
so you can build stream workflows, where source of the data are akka.persistence actors
there are not many docs on this on .NET side (there are some on the JVM however), as this has came lately
Bartosz Sypytkowski
@Horusiath
Jul 08 2016 12:03
the core spec defines 3 types of sources:
  • AllPersistenceIds - which produces stream of persistence ids as they are stored in db.
  • EventsByTags - which produces stream of events marked with specific tags (this is a new feature after 1.1 release)
  • EventsByPersistenceId - which produces a stream of events for a particual persistent id (read: persistent actor)
they can work in two modes:
  • current which produces only data at the moment when source creation method has been invoked
  • live which produces stream with both existing data, but also updating it with new values as they come
Alex Valuyskiy
@alexvaluyskiy
Jul 08 2016 12:10
@Horusiath should I implement some addional functional in my MySql provider to support EventsByTags?
It doesn't work. But another two Persistence.Query specs are working
Bartosz Sypytkowski
@Horusiath
Jul 08 2016 12:12
@alexvaluyskiy do you have tags column specified? Also you may need custom query for receiving tags
is there any exception attached?
Alex Valuyskiy
@alexvaluyskiy
Jul 08 2016 12:15
false alarm
I've just forget to add event-adapters to test config
Alex Valuyskiy
@alexvaluyskiy
Jul 08 2016 12:34
@Horusiath could you review it, please? akkadotnet/Akka.Persistence.MySQL#5
wdspider
@wdspider
Jul 08 2016 13:54

@wdspider
are you using lighthouse?
it blows up right now on localhost

@Aaronontheweb Sorry, work distracted me away yesterday. No, I'm not using Lighthouse as I didn't need all of its extra features. Essentially, I'm using this TopShelf class

    internal class SeedNodeService : ServiceControl
    {
        #region Internal State
        private ActorSystem _actorSystem;
        #endregion

        #region ServiceControl Support
        public bool Start(HostControl hostControl)
        {
            // Start actor system up
            if(_actorSystem == null)
                _actorSystem = ActorSystem.Create("EmbassyActorSystem");

            return true;
        }

        public bool Stop(HostControl hostControl)
        {
            if (_actorSystem != null)
            {
                // TODO: Gracefully leave cluster

                // Shutdown actor system
                _actorSystem.Terminate();
                _actorSystem.WhenTerminated.Wait();

                // Dispose actor system
                _actorSystem.Dispose();
                _actorSystem = null;
            }

            return true;
        }
        #endregion
    }

with this config:

<akka>
  <hocon>
    <![CDATA[
      akka {
        log-config-on-start = on
        loggers = ["Akka.Logger.NLog.NLogLogger, Akka.Logger.NLog"]

        actor {
          provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"

          serializers {
            wire = "Akka.Serialization.WireSerializer, Akka.Serialization.Wire"
          }

          serialization-bindings {
            "System.Object" = wire
          }
        }

        remote {
          helios.tcp {
            transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
            public-hostname = localhost   # this needs to be updated to match deployed to server
            port = 4053
          }
        }

        cluster {
          seed-nodes = ["akka.tcp://EmbassyActorSystem@localhost:4053"]    # this needs to be updated to contain all deployed seed node addresses
          roles = ["SeedNodeV1"]

          role {
            CoreNodeV1.min-nr-of-members = 0
            EnvoyNodeV1.min-nr-of-members = 0
            EPPLinguistNodeV1.min-nr-of-members = 0
          }
        }
      }
    ]]>
  </hocon>
</akka>
troerightside
@troerightside
Jul 08 2016 17:00

I'm having an issue with messages disappearing with .Tell() under moderate to heavy load in a single application.

  • The problem exists in in Akka 1.0.8(?) and Akka 1.1
  • All mailboxes are the default mailbox and I'm running in a single process.
  • The loss of messages seems to only occur at one .Tell() in the application.
  • I seem to have something like a 99.5% message send success rate for this single piece of code; and, I've added retry logic around this piece of code that increases its reliability to 99.8%; but, I need even higher than that.

What are common reasons for loss of messages in a single process; and, how do I debug them at this point? The message goes into a .Tell() and never comes out on the Receive<> end.

I usually like to have a minimum, complete example; but, this bit of code is one step in a very long chain across several actors and I don't believe I could reproduce it as simply as creating 3 or 4 actors with a few sleeps.
to11mtm
@to11mtm
Jul 08 2016 17:40
http://getakka.net/docs/concepts/message-delivery-reliability <-- According to this, not a whole lot can go wrong. Without code I can only guess... in our shop issues like that turned out to be either Serialization or the recipient failing on processing and no watching for such behavior in the code.
While it sounds like overkill, maybe it would be a candidate for At Least Once Delivery?
troerightside
@troerightside
Jul 08 2016 17:42
After re-coding it to be AtLeastOnceDelivery, (send as many as 10 times with 10ms delay between each send, until message acknowledge), I increased reliability from 99.5 to 99.8%; so, messages that were previously failing to be received suddenly were being received and processed; but, it didn't solve all of the problems. The bigger frustration is that that the messages remain unchanged.
I've added debug statements around actor death, as well, and can confirm that the actors aren't throwing exceptions while processing. I attempted to write a custom mailbox handler earlier; but, I ran into trouble actually doing that. I was going to ask about that later.
I'm in the process of attempting to write a minimum, complete code example that will hopefully (or not hopefully?) reproduce the issue.
to11mtm
@to11mtm
Jul 08 2016 17:44
hmmm.... anything indicating severe memory pressure (namely LOH)
troerightside
@troerightside
Jul 08 2016 17:45
LOH?
to11mtm
@to11mtm
Jul 08 2016 17:52
Large Object Heap. Thinking more... I could be totally off point unless you're throwing around huge messages.
https://blogs.msdn.microsoft.com/dotnet/2011/10/03/large-object-heap-improvements-in-net-4-5/ <-- Explains LOH and what I'm talking about better than I can ATM.
troerightside
@troerightside
Jul 08 2016 17:55
I'm throwing around a pointer to a RabbitMQ BasicDeliverEventArgs; but, that should just be a pointer to it, shouldn't it? By default, Akka.NET doesn't serialize every message over and over again, right?
Even still, the messages should be less than 100k. They should be less than 10k, even.
troerightside
@troerightside
Jul 08 2016 18:06
Wouldn't that cause some sort of exception that I could look at, though? How do I get these exceptions to show up?
Bartosz Sypytkowski
@Horusiath
Jul 08 2016 18:19
@troerightside are you logging dead letters?
troerightside
@troerightside
Jul 08 2016 18:19
Yes.
Bartosz Sypytkowski
@Horusiath
Jul 08 2016 18:19
nothings shows up?
troerightside
@troerightside
Jul 08 2016 18:19
At the start of the application, there are a small collection of them, but then never again.
Aaron Stannard
@Aaronontheweb
Jul 08 2016 18:23
that's because the dead letters logging only counts up to the first N messages by default
where N = 10
you can increase the count or leave it unlimited
troerightside
@troerightside
Jul 08 2016 18:23
How?
Aaron Stannard
@Aaronontheweb
Jul 08 2016 18:31
one other thing worth doing
logging unhandled messages
difference between a deadletter and unhandled: deadletter was sent to an actor who died or never existed
unhandled is sent to a live actor who was not programmed to handle that message
looks the same to you as an end-user, but different root causes
troerightside
@troerightside
Jul 08 2016 18:33
I was seeing dead letters at the beginning. The confusing thing to me, though, is that I added some debug code around PostRestart; but, that debug code never triggered. Wouldn't that mean that the issue I'm having in these particular cases isn't a deadletter issue?
I think I might have successfully written a minimum, complete code example that reproduces the issue. The problem, funny enough, is that it reproduces it more often than I see it in production.
Aaron Stannard
@Aaronontheweb
Jul 08 2016 18:37
@ZoolWay light house is fucked up at the moment
I need to fix it
DNS resolution appears to be an issue
Clustering itself is fine
you will not run into the same issues you did before
that's been thoroughly verified with our multi node test suite
in the meantime, disable the code in lighthouse that auto-binds to 127.0.0.1
that's what is causing the problem
and in general
avoid using "localhost"
prefer IPs
you're working with sockets
IPs are what they resolve to anyway
@troerightside by all means, please post it
and do it in an issue
assume that anything in this gitter chat gets deleted by end of day
treat it as ephemeral
issues are persistent
troerightside
@troerightside
Jul 08 2016 18:41
This makes sense.
Aaron Stannard
@Aaronontheweb
Jul 08 2016 18:42
we're going to roll a 1.1.1 out ASAP with some other day 1 patch issues
troerightside
@troerightside
Jul 08 2016 18:42
I'm debating between explaining my code verbosely or not at the moment.
Aaron Stannard
@Aaronontheweb
Jul 08 2016 18:42
do it in stages
sample, error message now
explanation next
you can add a comment
that'll be the difference between whether or not this gets fixed today or next week
troerightside
@troerightside
Jul 08 2016 18:42
The repro is across 4-5 class files :/
Aaron Stannard
@Aaronontheweb
Jul 08 2016 18:42
if it's a bug
and I'm pretty skeptical that it is
as we have users with systems that handle 10,000s of messages per second with strong delivery guarantees
this would have come up before
troerightside
@troerightside
Jul 08 2016 18:43
I'm not sure if it's a bug or not, I just want to know what's going wrong and how I fix it.
Aaron Stannard
@Aaronontheweb
Jul 08 2016 18:45
understood
troerightside
@troerightside
Jul 08 2016 18:55
https://github.com/troerightside/strange-failed-requests/blob/master/ReproducableIssue/Program.cs here is the sample code, well the entry-point to the sample code. The commented out code starting at line 44 seems to always succeed; but, under load, lines 65+, seems to have intermittent failures and dead letters. I don't understand why.
Aaron Stannard
@Aaronontheweb
Jul 08 2016 18:56
oh, it's Ask that's the problem?
troerightside
@troerightside
Jul 08 2016 19:34
No, that was unclear of me, sorry. I was trying to recreate the issue as accurately to our system as I could. In the real version of the code, the problem occurs thusly:
  • I have a debug statement on ReceivingQueueActor, line 30. This line prints.
  • I have a debug statement on SingleRequestActor, line 33. This line does NOT print.
  • I have validated that States[msg.Id] is not null and does represent the intended actor.
troerightside
@troerightside
Jul 08 2016 19:48
I enabled deadletters like you recommended earlier. I'm seeing the failures, still, but no dead letters are being sent (I have a single dead letter earlier in the process, though, so I believe the setting enabled them correctly.)
Aaron Stannard
@Aaronontheweb
Jul 08 2016 19:52
ok, I'll take a look
voltcode
@voltcode
Jul 08 2016 19:59
Has anyone come across a book or larger tutorial about ddd + actor modelling /implementation? I read ms's cqrs book, Evans is next on the list (although i'm a bit afraid it's going to be too theoretical). Aby recommendations are highly appreciated.
Aaron Stannard
@Aaronontheweb
Jul 08 2016 20:05
have you read Vaugh Vernons?
he has lots of good stuff
and is a big Akka / Akka.NET fan
voltcode
@voltcode
Jul 08 2016 20:08
No i haven't, thanks! Pity it seems that he doesn't update his blog anymore
Aaron Stannard
@Aaronontheweb
Jul 08 2016 20:13
blob
@/all looks like 1.1 introduced a bug with using DNS for inbound connections
still investigating the cause, but it's a confirmed bug now
we'll get a hotfix out for this today
it's not an issue with the actual true, underlying hostname
but rather the public-hostname
does not appear to work as intended
going to determine why shortly
that's a regression we introduced in 1.1... or rather, I introduced :p
binding directly to IPV6 loopback failed also, which surprises me.
IPV6 support is new in the transport layer as of this release
so that shouldn't affect any existing users, but we use it by default when available now (which our build servers and my development machine seem to have no trouble with)
since we've had some bug reports on Gitter chat for that, wanted to give you all an update