I have a 2 websites that are part of my cluster. Sometimes one websites will become unreachable and my logic for all of the other members determine that after 120 seconds they should do a Cluster.Down(ThatWebsiteAddress). I can see that all of the members get a MemberRemoved for that website and then it is reported as Down.
My problem is that when my other service detects that this website is down and has been removed from the cluster it tries restarts that website. I can see see it trying to join and I get this message: [[akka://MyService/system/cluster/core/daemon]] - New incarnation of existing member [UniqueAddress: (akka.tcp://My@22.214.171.124
:57771, 303375918)] is trying to join. Existing will be removed from the cluster and then new member will be allowed to join.
But it never gets removed from the cluster so the member never is able to join. I can try to have a member .Leave(ThatWebAddress) and do another .Down(ThatWebAddress) but it never gets removed. Basically I have a ClusterStatus actor that monitors the status of the cluster from its view and determines if it needs to restart itself or if a member has been unreachable for x seconds to down it. If so it shuts that service down and logs to the event log for Solarwinds to determine if the service or website needs to be started back up.