These are chat archives for akkadotnet/akka.net

22nd
Dec 2017
Ricky Blankenaufulland
@ZoolWay
Dec 22 2017 08:21
Hi! Is is possible to identify the actor type (like class name) when I only have the path - for debugging purposes? pbm shows me some actors with generated names ($a...) where I do not know who they are :sweat:
Ricky Blankenaufulland
@ZoolWay
Dec 22 2017 10:12
I am also looking on how to handle the Akka.Streams Materializer correctly. Using Context.Materializer() I get a new one every time and the materializers seems to never shutdown themselves. Should I have multiple of them at all? Shoud I use one per stream? Must I shutdown them?
Paweł Bańka
@pmbanka
Dec 22 2017 12:27

I have a weird situation with cluster and sharding. I have a system with 2 lighthouses and 2 client nodes (clinets share a role). This scenario happened this morning:

  1. Clients are turned off
  2. Clients are turned on (normal redeployment)
  3. Lighthouses pick up the clients and the leader move clients UP
  4. One of the clients goes on to register cluster coordinator and do its job
  5. But the other ends up in a loop of warning messages

    Couldn't join seed nodes after [X] attempts, will try again. seed-nodes=["akka.tcp://firstLighthouse,akka.tcp://secindLighthouse"]

interlived with multiple

`Trying to register to coordinator at [""], but no acknowledgement. Total [1] buffered messages.`

There is no logs regarding the failing client on lighthouse nodes whatsoever (just a few Receiving gossip from). Anyone has seen such thing?

-- some context -- the cluster has also another clients with different role, who also use sharding - but they fail with

`Exception in ReceiveRecover when replaying event type ["Akka.Cluster.Sharding.PersistentShardCoordinator+ShardHomeAllocated"] with sequence number [63] for persistenceId ["/system/sharding/potion_commandCoordinator/singleton/coordinator"]`

with inner exception

`Region [akka://Oddjob/system/sharding/potion_command#212437374] not registered\r\nParameter name: e",`

which seems unrelated, since it is a different coordinator and different cluster (and sharding) role...

Bartosz Sypytkowski
@Horusiath
Dec 22 2017 13:42
@ZoolWay
  1. When you're creating actors, you're free to give them any name you like ;)
  2. It's enough to have one materializer per actor system. If you have one actor system per process, you can set materializer as static as well.
@pmbanka first you need to resolve the problem with joining node using seed nodes. Without it you won't be able to establish the cluster.
Hard to tell what's the reason, but most probably a connection between two machines cannot be established.
if you can, you may try to isolate the issue and show it on github, it will be easier to identify problem that way
Paweł Bańka
@pmbanka
Dec 22 2017 13:47
@Horusiath thanks for the tip. My biggest headscratcher is how is it possible that the lighthouses didn't mark the node as unreachable/down/whatever (it was UP all the time), but the node stated that it can't join the seed nodes - without any explanation :/
Bartosz Sypytkowski
@Horusiath
Dec 22 2017 13:49
if it's because connection between nodes couldn't be established, there's not much that could be explained
check if your actor system remote urls and ports are in place
Ricky Blankenaufulland
@ZoolWay
Dec 22 2017 13:50
@Horusiath Thanks for your input. I got some actors with generated names like $a, so how can I find out that /user/parent/processor/fetcher/something/a$ is an actor instance of class MyPublisherForgotToCloseMeActor? Giving them names is not possible for one-time actors which are created on demand, process a message and then terminate themselves. There could be one or six thousands of them and names are not allowed to be ambigious ;)
Bartosz Sypytkowski
@Horusiath
Dec 22 2017 13:50
other types of error usually have some info with them
Ricky Blankenaufulland
@ZoolWay
Dec 22 2017 13:50
About the materializer that's great to hear
Bartosz Sypytkowski
@Horusiath
Dec 22 2017 13:51
@ZoolWay you cannot check actor type without using reflection. From akka perspective, system doesn't care about actor type i.e. all F# actor share the same type.
Paweł Bańka
@pmbanka
Dec 22 2017 13:53
@Horusiath ok, will do (but I believe I checked them before and they seemed ok). But - how was the seed node able to mark the client node as [UP] when the client node says it can't contact the seed node? I guess it should either work both ways, or none way. Maybe the connection was established at the very beginning and then broken, but I doubt it since the Couldn't join seed nodes after is very permanent after that
One more thing - do you know if Couldn't join seed nodes after refers to both nodes, or any of the nodes?
I mean, does the client node need to see all of the seed nodes?
Bartosz Sypytkowski
@Horusiath
Dec 22 2017 13:55
@pmbanka actor system will try to join seed nodes one by one (if first one doesn't listen, it will try another one). When it has been marked as UP, it means that all nodes in the cluster have confirmed, that joining node have been connected. So most probably the UP node you see in the logs is not the same as the one which returns couldn't join message
Paweł Bańka
@pmbanka
Dec 22 2017 14:02

Huh. The sequence is:

  1. December 22nd 2017, 10:27:27.773

Client logs his config with

remote : {
      dot-netty : {
        tcp : {
          hostname : maodatest01
          port : 1965
        }
      }
  1. December 22nd 2017, 10:27:30.836
    Lighthouse logs

Information - MAODATEST01 Leader is moving node ["akka.tcp://Oddjob@maodatest01:1965"] to [Up]

and then later there is some gossip with the node

  1. BUT there is no Welcome from message on client node, so it seems like either the Lighthouse moved it to UP but the client node didn't notice, or the log message got lost somewhere :)
maybe unrelated question - we have client nodes but we don't really care how many of them run on a machine, and we can have multiple nodes on same machine. So it is natural to use 0 as a port for these nodes. But this doesn't work well with our firewall setup, which we don't have that much control (we can open some port ranges though). Is there a reliable way to make a node pick an unused node from a constrained pool? I have some hacky logic for doing that, but maybe it's just conceptually wrong and is the cause of my issues?
Bartosz Sypytkowski
@Horusiath
Dec 22 2017 14:18
@pmbanka if you pick port 0, then the OS will assign a port - I don't know if it's aware of firewall constraints in that case.
Paweł Bańka
@pmbanka
Dec 22 2017 14:21
@Horusiath it isn't, and I didn't find any way to specify "pick a port between 100 and 200". Have you ever seen anyone with such problem? I guess I should not be the only one, but maybe everyone is dockerizing their apps already and avoid the problems like that altogether :)
Bartosz Sypytkowski
@Horusiath
Dec 22 2017 14:22
I know, people have this kind of problems, but couldn't recall solution - maybe @Aaronontheweb could help here
Paweł Bańka
@pmbanka
Dec 22 2017 14:22
btw, I have simmilar problem with Petabridge.Cmd - it also requires a static port, which creates issues when you want to run 2 nodes on same machine
anyway, I'll try to configure static nodes everywhere and see if taht changes anything
thanks for the help :)
Aaron Stannard
@Aaronontheweb
Dec 22 2017 18:17
yeah, that's the nature of the beast with these things unfortunately
every bound socket needs a unique address / port pair