Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
  • Dec 05 17:21
    Aaronontheweb synchronize #4079
  • Dec 05 17:20
    Aaronontheweb labeled #4084
  • Dec 05 17:20
    Aaronontheweb labeled #4084
  • Dec 05 17:20
    Aaronontheweb milestoned #4084
  • Dec 05 17:20

    Aaronontheweb on dev

    Remove string interpolation fro… (compare)

  • Dec 05 17:20
    Aaronontheweb closed #4084
  • Dec 05 17:20
    Aaronontheweb commented #4084
  • Dec 05 15:43
    ismaelhamed opened #4084
  • Dec 04 23:34

    Aaronontheweb on dev

    Made cleanup call thread-safe (… (compare)

  • Dec 04 23:34
    Aaronontheweb closed #4081
  • Dec 04 23:34
    Aaronontheweb closed #4020
  • Dec 04 19:08
    Aaronontheweb commented #4079
  • Dec 04 18:35
    maratoss review_requested #4079
  • Dec 04 18:26
    maratoss synchronize #4079
  • Dec 04 07:42
    jiyeongj edited #4083
  • Dec 04 06:45
    jiyeongj opened #4083
  • Dec 04 06:35
    dependabot-preview[bot] labeled #130
  • Dec 04 06:35
    dependabot-preview[bot] opened #130
  • Dec 04 06:35

    dependabot-preview[bot] on nuget

    Bump System.Data.SqlClient from… (compare)

  • Dec 03 19:10
    Aaronontheweb synchronize #4081
Maxim Cherednik
Cool. Yet I still wonder about the cluster. I wanted to try an empty cluster without any real actor in there. Very simple setup - it was working more or less on local(when all the nodes are on the same machine). Then when you clarified that the port should be the same and of course auto-down is off. I decided to roll out several virtual machines in azure so that to try it. Doesn't work. No matter what I do. If I restart the node hard(as if it died), seed node start acting up with lots of exceptions.
So basically cluster formed, but actual failover cases I didn't manage to try...
Aaron Stannard
If I restart the node hard(as if it died), seed node start acting up with lots of exceptions.
the stuff I'm working on now should address that
the socket server sitting underneath Akka.Cluster has a bunch of fun issues that had never been properly classified until recently
race conditions at startup
spent the past two weeks working on an update to that
on top of that, latest release today includes a patch I made to the EndpointRegistry in Akka.Remote
which also caused issues that occurred on node restart
I'm working those issues from the bottom up
dealing with the socket server first, then Akka.Remote's endpoint system, and then finally dealing with things like the cluster daemon
Vladyslav Pyshnenko
Hi , @Aaronontheweb , we using Akka.Cluster in our project hosted on Azure Cloud service and very often one of node on worker role is not starting after deploy or reboot. In logs we could see next error "Failed to startup Cluster. You can try to increase 'akka.actor.creation-timeout'.". Does this issue related to all that you described before?
Aaron Stannard
I've seen that issue before - that's unrelated
how many cores are you running those nodes on?
Vladyslav Pyshnenko
Aaron Stannard
ok, you should be good there
is where that issue occurs
the network isn't even a factor at that point
Vladyslav Pyshnenko
yeah, and after than node is shutting down
Aaron Stannard
might be something off with the sequencing at startup there, but basically one of the system actors failed to start on time
I would start by looking at the system actors there and see if there's a race condition - it's going to be easier to spot than a traditional one, because everything is happening inside actors here. Coverage is probably missing in some edge case where resource A gets a request before it gets something it needs from resource B
and rather than buffering the request / poking resource B, it waits indefinitely and times out
I personally don't have time to look into that now (I am but one man) - but if you file a bug I'll get on it
Vladyslav Pyshnenko
ok, i will create bug tommorow
Thomas Lazar
anyone here know their way around IL generation stuff? any experience?
via Reflection.Emit
Dave Sansum

Any advice on below would be much appreciated?

I'm currently using a child per entity model and after getting this running locally I'm starting to look into the remoting/clustering elements. In seems the clustering is really geared towards actors that are functional rather than entity based and I'm struggling to any documentation on dynamic systems. What I'm looking to do is have location transparency so if an entity A lives on node A, if node A fails the entity can be brought up seamlessly on node B ? It seems cluster sharding is the right (only) thing for this but it doesn't seem that mature at the moment and depends on akka persistence which I don't currently require?

Bartosz Sypytkowski
@dave-sansum in your case cluster sharding is a way to go, and unfortunately, atm persistence is required in order to work with it
since you need to reliably recover the shards state between nodes in case of crashes or failures
Dave Sansum
thanks @Horusiath
Pablo Castilla
How about cluster singleton?
Dave Sansum
@pablocastilla have you used that yourself? / do you know what the maturity of it is?
Pablo Castilla
No, never tried. I only know that it is slower. @Aaronontheweb maybe knows more
Chris G. Stevens
This message was deleted
Alex Valuyskiy
@Aaronontheweb you fixed a persistence default config in 1.0.8. But seems to be, Cluster Singleton also doesn't have a default config
Kris Schepers
Hmm, anyone else noticing this: When a ClusterClientReceptionist is started on every node of a role (running locally on 1 dev machine), those nodes consume all CPU power.
When you run a single node, everything is fine..
Christian Duhard
has anyone ever said that distributed systems are kinda hard? ;)
Hi guys I have a question related to remoting.
Main question is actually if remoting should be resilient/robust against temporary network issues (network partitioning, host not responding, not receiving any deathwatch hearbeat responses...).
To be more specific, is it acceptable that an ActorSystem can become quarantined because of a temporary network issue?
I see no issue with heartbeat systems that try to detect issues with the network and drop messages because of detected network issues, but I find it problematic that a system gets quarantined because there were some temporary network issues. I find this problematic because in Akka this means that the quarantined system needs to restart!
This is something I find as not "Reactive" since no recovery is possible (except the real dramatic recovery of restarting the actorsystem, which in server application is perhaps not possible).
We have an application in production (a lot of clients connecting to one server ) that uses remoting and because of network errors a client marks the remote server system as quarantained.
Which means that that client will not be able to connect until the server restarts/recycles (or at least restarts its actorsystem, which is not really feasible/desirable).
I have no problem that a state as "quaratined" exists, but I have a problem that something can get quarantined because of (temp) network errors or because the deathwatch hearbeat responses are not received. System should not get corrupted because of such errors and as such should not get quarantined.
What do you guys think about this ? Is this a bug that needs to be fixed (I do not mean that quarantining is a bug, but that getting quarantined because of temporary network issues is a possible bug) ?
Am I looking at this in the wrong way ?
What are the options to handle this (network errors are not that rare condition) ?
My current solution is to set parameter prune-quarantine-marker-after = 0 s (which is not recommended in the docs !!!!)
I tried also increasing some of the other heartbeat parameters (acceptable-heartbeat-pause in the transport-failure-detector and the watch-failure-detector ), but more to the effect that system would not recover at all.
If I'm not using the death-watch monitor then system can recover (meaning after being gated trying to associate/connecting again), but when having death watch enabled (by watching an actor) then suddenly there is some interaction that makes it not being able to reassociate (seems to be a bug) , not even trying, which results in the dead watch heartbeat getting dropped until that receives its pause threshold parameter value, which in turn triggers the quarantining.
Version info : using akka.net (put did also a test with the version in the dev git branch beginning of this week)
Kind regards,
Aaron Stannard
@alexgoeman 1.0.8, which came out yesterday, fixes some known endpoint management issues related to that
but there are also issues with Helios at startup that I'm working on fixing right now
I won't go into detail on them now because I'm not finished with them yet, but Helios has some race conditions on startup that can cause this
@alexvaluyskiy I'm not involved with Akka.Persistence and Akka.Cluster.Sharding much, but it sounds like you and @Horusiath need to come up with a release strategy that maintains configuration integrity between releases
since that's been a persistent issue (no pun intended) across more than one release of those
default configurations should always have explicit, easily understandable regression tests
if you don't have one, that's the easiest place to create a breaking change by accident