We have installed multiple nodes of a service that use persistent actors (and cluster sharding), and now we see millions of log entries "Resolve of path sequence [/"system/sharding/<some_path>] failed".
Millions really, approximately 1 million such entries every hour.
The cluster works, messages get handled. We frequently passivate our aggregate roots actors after message is handled, then it happens. In another installation we don't passivate actors often, there we haven't seen a single such error.
We have "remember-entities = on" setting in our Akka hocon file.
And judging from logs, these entries are not written as a part of a call from our code. We can have some periods of inactivity with no messages sent to the system but logs are still frantically ddosed by "resolve of path failed" entries.
During the night with relatively low business activities these messages were written at a rate of 300 entries per second.
No Aaron, I checked the code and it does not retry. There is a select that should result in Unhandled if the path is not resolved. But these selects happen at a rate of up to hundreds messages a minute, not hundreds a second.
Looks like the same path is tried repeatedly, and we don't seem to have such code.
@Aaronontheweb sounds good. I am reading out logs to find more info for you. The issue becomes more than just annoying once log servers run out of disk space. Once it starts writing these log entries, it generates several GB of logs every hour.
Checking gossip status messages, looks like they are only received from nodes in the same role. E.g. if I have a cluster with 4 nodes, 2 roles x 2 instances, then all gossip messages in the log are between same roles, don't see any messages between roles A and B. Is this how it's supposed to be?