Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 10:41
    peirens-bart opened #4095
  • 08:37
    Aaronontheweb synchronize #4071
  • 08:13
    jiyeongj opened #4094
  • Dec 12 15:42
    Aaronontheweb synchronize #4086
  • Dec 12 15:42
    Aaronontheweb closed #4083
  • Dec 12 15:42

    Aaronontheweb on dev

    Fix #4083 - Endpoint receive bu… (compare)

  • Dec 12 15:42
    Aaronontheweb closed #4089
  • Dec 12 15:42
    Aaronontheweb labeled #4093
  • Dec 12 15:42
    Aaronontheweb labeled #4093
  • Dec 12 15:42
    Aaronontheweb labeled #4093
  • Dec 12 15:42
    Aaronontheweb opened #4093
  • Dec 12 14:20
    Aaronontheweb commented #4092
  • Dec 12 14:14
    Aaronontheweb labeled #4089
  • Dec 12 14:14
    Aaronontheweb labeled #4089
  • Dec 12 14:11
    Aaronontheweb synchronize #4089
  • Dec 12 14:10
    Aaronontheweb synchronize #4086
  • Dec 12 14:09

    Aaronontheweb on dev

    Convert to ImmutableHashSet for… (compare)

  • Dec 12 14:09
    Aaronontheweb closed #4090
  • Dec 12 12:04
    nagytech synchronize #4092
  • Dec 12 11:53
    nagytech synchronize #4092
Thomas Lazar
@thomaslazar
via Reflection.Emit
Dave Sansum
@dave-sansum

Any advice on below would be much appreciated?

I'm currently using a child per entity model and after getting this running locally I'm starting to look into the remoting/clustering elements. In seems the clustering is really geared towards actors that are functional rather than entity based and I'm struggling to any documentation on dynamic systems. What I'm looking to do is have location transparency so if an entity A lives on node A, if node A fails the entity can be brought up seamlessly on node B ? It seems cluster sharding is the right (only) thing for this but it doesn't seem that mature at the moment and depends on akka persistence which I don't currently require?

Bartosz Sypytkowski
@Horusiath
@dave-sansum in your case cluster sharding is a way to go, and unfortunately, atm persistence is required in order to work with it
since you need to reliably recover the shards state between nodes in case of crashes or failures
Dave Sansum
@dave-sansum
thanks @Horusiath
Pablo Castilla
@pablocastilla
How about cluster singleton?
Dave Sansum
@dave-sansum
@pablocastilla have you used that yourself? / do you know what the maturity of it is?
Pablo Castilla
@pablocastilla
No, never tried. I only know that it is slower. @Aaronontheweb maybe knows more
Chris G. Stevens
@cgstevens
This message was deleted
Alex Valuyskiy
@alexvaluyskiy
@Aaronontheweb you fixed a persistence default config in 1.0.8. But seems to be, Cluster Singleton also doesn't have a default config
Kris Schepers
@schepersk
Hmm, anyone else noticing this: When a ClusterClientReceptionist is started on every node of a role (running locally on 1 dev machine), those nodes consume all CPU power.
When you run a single node, everything is fine..
Christian Duhard
@cduhard
has anyone ever said that distributed systems are kinda hard? ;)
alexgoeman
@alexgoeman
Hi guys I have a question related to remoting.
Main question is actually if remoting should be resilient/robust against temporary network issues (network partitioning, host not responding, not receiving any deathwatch hearbeat responses...).
To be more specific, is it acceptable that an ActorSystem can become quarantined because of a temporary network issue?
I see no issue with heartbeat systems that try to detect issues with the network and drop messages because of detected network issues, but I find it problematic that a system gets quarantined because there were some temporary network issues. I find this problematic because in Akka this means that the quarantined system needs to restart!
This is something I find as not "Reactive" since no recovery is possible (except the real dramatic recovery of restarting the actorsystem, which in server application is perhaps not possible).
We have an application in production (a lot of clients connecting to one server ) that uses remoting and because of network errors a client marks the remote server system as quarantained.
Which means that that client will not be able to connect until the server restarts/recycles (or at least restarts its actorsystem, which is not really feasible/desirable).
I have no problem that a state as "quaratined" exists, but I have a problem that something can get quarantined because of (temp) network errors or because the deathwatch hearbeat responses are not received. System should not get corrupted because of such errors and as such should not get quarantined.
What do you guys think about this ? Is this a bug that needs to be fixed (I do not mean that quarantining is a bug, but that getting quarantined because of temporary network issues is a possible bug) ?
Am I looking at this in the wrong way ?
What are the options to handle this (network errors are not that rare condition) ?
My current solution is to set parameter prune-quarantine-marker-after = 0 s (which is not recommended in the docs !!!!)
I tried also increasing some of the other heartbeat parameters (acceptable-heartbeat-pause in the transport-failure-detector and the watch-failure-detector ), but more to the effect that system would not recover at all.
If I'm not using the death-watch monitor then system can recover (meaning after being gated trying to associate/connecting again), but when having death watch enabled (by watching an actor) then suddenly there is some interaction that makes it not being able to reassociate (seems to be a bug) , not even trying, which results in the dead watch heartbeat getting dropped until that receives its pause threshold parameter value, which in turn triggers the quarantining.
Version info : using akka.net 1.0.6.16 (put did also a test with the version in the dev git branch beginning of this week)
Kind regards,
Alex
Aaron Stannard
@Aaronontheweb
@alexgoeman 1.0.8, which came out yesterday, fixes some known endpoint management issues related to that
but there are also issues with Helios at startup that I'm working on fixing right now
I won't go into detail on them now because I'm not finished with them yet, but Helios has some race conditions on startup that can cause this
@alexvaluyskiy I'm not involved with Akka.Persistence and Akka.Cluster.Sharding much, but it sounds like you and @Horusiath need to come up with a release strategy that maintains configuration integrity between releases
since that's been a persistent issue (no pun intended) across more than one release of those
default configurations should always have explicit, easily understandable regression tests
if you don't have one, that's the easiest place to create a breaking change by accident
and compared to most of the test suite, they're 100x easier tests to write than virtually anything else
I'd be happy to help, but I'm operating with very limited bandwidth. I'm pretty focused on getting Akka.Cluster and its dependencies out of beta
alexgoeman
@alexgoeman
@Aaronontheweb : Do you then agree in principle that death watch failure should not trigger quarantining ? (PS: I did do also testing with 1.0.8 using latest version I could get via github and still had recovery issues, so do you mean that there have more changes done yesterday or that those changes were not available in git ? )
Aaron Stannard
@Aaronontheweb
you then agree in principle that death watch failure should not trigger quarantining?
I 100% do not agree with that
totally depends on when it happens
if it happens during startup, if the node you're connecting can't complete the handshake for whatever reason
quarantining is the right thing to do
as I said, there are issues down the stack I'm working on right now
that I believe are responsible for this
check back with me later - there were no additional changes made yesterday other than those published. You can easily check that by taking a look at number of commits since release on Github
alexgoeman
@alexgoeman
@Aaronontheweb : So because of a handshake procedure cannot be completed, why do you assume corruption ? You can cleanup any resources linked to the connection and just retry later.
Aaron Stannard
@Aaronontheweb
search the codebase for HopelessAssocationException and read the sourdce
if you want an explanation
quarantines only happen as a result of repeat failures typically - I don't remember the entire flow for it offhand
by default Akka.Remote will gate a connection temporarily during an unplanned failure
in order to give the other side time to recover
alexgoeman
@alexgoeman
@Aaronontheweb : In the network configuration file I found some doc saying that deathwatch triggers quarantining (I tested this). And not after a few deathwatches but immediately when some parameterized pause in heartbeat response gets exceeded. So by just disconnecting network long enough the other system will get quarantined. So then there is no corruption, but when connection can get established again, one system refuses to connect to other system just because it is quarantined. I have no problem with the gating, because indeed to avoid unnecessary communications, but when communication is possible again, after gating is over this will succeed. Which is a good thing. But the deathwatch system just marks the other system as quarantined makes recovery impossible
alexgoeman
@alexgoeman
@Aaronontheweb : Tried looking for "HopelessAssocationException" (just downloaded the zip file , opened it in Visual Studio, but could not find that class/string in the project. Am I looking in the wrong place ?
Marc Piechura
@marcpiechura
@alexgoeman search without exception
Corneliu
@corneliutusnea
Guys, any news on this Issue: akkadotnet/akka.net#1700 it seems to affect every cluster that disconnects
Pablo Castilla
@pablocastilla
I want to start the developing of a big headend system for an electric utility using akka.net . Are clustering,remoting and persistence production ready? (We would develop an Oracle persistence)
do you have good experiences with actor per entity in the IoT world? Thanks for helping
Bartosz Sypytkowski
@Horusiath
@pablocastilla there are production users of akka clusters, when it comes to persistence it's pretty solid right now, but in regards of SQL-based plugins still some changes are happening, and probably a some more will happen after Akka.Persistence.Query and Akka.Streams will come out. But in that case a migration path is usually described for them.
the rest of persistence plugins usually has a lot less aggressive changes
Corneliu
@corneliutusnea
@pablocastilla I'm testing Akka Clustering 1.0.7 beta and I'm hitting akkadotnet/akka.net#1700 and it's killing me :( just worth knowing
Bartosz Sypytkowski
@Horusiath
@corneliutusnea personally I've never used routers in any scenario
Corneliu
@corneliutusnea
I'm not using any router, just the cluster does not come back alive