These are chat archives for akkadotnet/akka.net

22nd
Jun 2018
Vagif Abilov
@object
Jun 22 2018 09:29
We have installed multiple nodes of a service that use persistent actors (and cluster sharding), and now we see millions of log entries "Resolve of path sequence [/"system/sharding/<some_path>] failed".
Millions really, approximately 1 million such entries every hour.
The cluster works, messages get handled. We frequently passivate our aggregate roots actors after message is handled, then it happens. In another installation we don't passivate actors often, there we haven't seen a single such error.
We have "remember-entities = on" setting in our Akka hocon file.
Bartosz Sypytkowski
@Horusiath
Jun 22 2018 09:46
@object have you initialized shard region on that system?
plus are you sure, you're not trying to refer to sharded actor using actor refs/actor paths? (for sharded actors actor ref can change over time)
Vagif Abilov
@object
Jun 22 2018 09:59
I don't think we are trying to refer using path but will check this out. I guess shard region is initialized, how else would things work?
Vagif Abilov
@object
Jun 22 2018 10:21
And judging from logs, these entries are not written as a part of a call from our code. We can have some periods of inactivity with no messages sent to the system but logs are still frantically ddosed by "resolve of path failed" entries.
During the night with relatively low business activities these messages were written at a rate of 300 entries per second.
Vagif Abilov
@object
Jun 22 2018 10:30
But we might have a place where we passivate actors and then there is some kind of callback that tries to reach them. In a perpetual loop since it's failing.
I'll investigate it further, thanks for the tip
Aaron Stannard
@Aaronontheweb
Jun 22 2018 12:02
But we might have a place where we passivate actors and then there is some kind of callback that tries to reach them. In a perpetual loop since it's failing.
and this is user code, right?
Bartosz Sypytkowski
@Horusiath
Jun 22 2018 12:55
@object based on your description is suggests that it has something to do with actor passivation frequency, right?
Vagif Abilov
@object
Jun 22 2018 13:08
No Aaron, I checked the code and it does not retry. There is a select that should result in Unhandled if the path is not resolved. But these selects happen at a rate of up to hundreds messages a minute, not hundreds a second.
Looks like the same path is tried repeatedly, and we don't seem to have such code.
fleed
@fleed
Jun 22 2018 13:31
@Aaronontheweb back to the question of some days ago (Akka.Net + Azure Service Fabric): could this example https://github.com/petabridge/akkadotnet-code-samples/tree/master/Cluster.WebCrawler be applied to Azure SF? Is the Akka.Cluster compatible?
Vagif Abilov
@object
Jun 22 2018 15:32
@Horusiath I thought it had to do with passivation frequency but increasing passivation frequency to a long interval didn't help.
Vagif Abilov
@object
Jun 22 2018 16:32
Created an issue with descriptions of the flood of log entries from internal Akka classes: akkadotnet/akka.net#3522
One of our servers write log messages "Forwarding request for shard" at a rate of 1000 messages/sec.
Aaron Stannard
@Aaronontheweb
Jun 22 2018 17:52
@fleed yep
it sure could
I'm not experienced with Service Fabric though
so many someone like @stijnherreman or @mmisztal1980 could give you some pointers on how best to go about it
Aaron Stannard
@Aaronontheweb
Jun 22 2018 17:59
@object that bug sounds like it's not doing any harm to your app but it's still incredibly annoying - just assigned it to myself
left a question for you on the GH issue you opened up
I'm going to be taking a look at some Cluster.Sharding issues for other reasons this week
so may as well take a crack at this while I'm at it
Vagif Abilov
@object
Jun 22 2018 18:21
@Aaronontheweb sounds good. I am reading out logs to find more info for you. The issue becomes more than just annoying once log servers run out of disk space. Once it starts writing these log entries, it generates several GB of logs every hour.
Aaron Stannard
@Aaronontheweb
Jun 22 2018 18:26
:grimacing:
no bueno
no reachability issues or anything either?
from what you wrote it sounds like the cluster is pretty stable
just entities popping in and out
(which is normal)
Vagif Abilov
@object
Jun 22 2018 19:08
Yes the cluster is now stable.
Vagif Abilov
@object
Jun 22 2018 19:31
Checking gossip status messages, looks like they are only received from nodes in the same role. E.g. if I have a cluster with 4 nodes, 2 roles x 2 instances, then all gossip messages in the log are between same roles, don't see any messages between roles A and B. Is this how it's supposed to be?
Aaron Stannard
@Aaronontheweb
Jun 22 2018 21:17
do you have different logging settings on different roles?
Vagif Abilov
@object
Jun 22 2018 21:57
All the same.