These are chat archives for akkadotnet/akka.net

19th
Aug 2016
Ralf
@Ralf1108
Aug 19 2016 11:29
Hi, I am testing currently a simple scenario regarding cluster sharding. One seed node and two worker node. The seed node is started first and is responsible for the ClusterSingletonManager. Then the worker nodes joins the cluster. Using the sharding region to distribute work is working. when I shutdown a worker node and restart it the cluster works as before. But when I shutdown the seed node I get warnings like:
[WARNING][19.08.2016 11:28:57][Thread 0005][[akka://sharded-cluster-system/user/sharding/worker#1857914423]] Trying to register to coordinator at [user/sharding/workerCoordinator/singleton/coordinator], but no acknowledgement. Total [43] buffered messages.
and the cluster is then broken.... hwo to heal the cluster? I thought the seed node is only required for adding/removing cluster members... so running members should be able to continue their work?
Oh... correction.... the cluster still works :-) But the warning is repeatedly printed to the console :D
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 11:31
@Ralf1108 have you set some node downing strategy? i.e. setting up akka.cluster.auto-down-unreachable-after HOCON config?
Ralf
@Ralf1108
Aug 19 2016 11:31
yes --> auto-down-unreachable-after = 3s
the warning is still printed to console
[WARNING][19.08.2016 11:32:39][Thread 0021][[akka://sharded-cluster-system/user/sharding/worker#1857914423]] Trying to register to coordinator at [user/sharding/workerCoordinator/singleton/coordinator], but no acknowledgement. Total [118] buffered messages.
and the buffered message count is increasing as I send more messages to the shard region
and it seems that each node can process the messages locally but they are not distributed to other nodes
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 11:35
yes, the problem is probably than cluster singleton had not migrated to another node (by default it's instantiated on the oldest one) - I remember there was something about that - @alex-kondrashov do you recall the problem with cluster singleton node migration?
messages to shards living on the same node will be propagated anyway - cluster shard coordinator is not necessary here
Ralf
@Ralf1108
Aug 19 2016 11:36
yes makes sense
if you want i can share the code with you. it s a solution and you only have to hit f5 for reproduction :-)
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 11:36
I think, I'll be able to recreate it anyway
Ralf
@Ralf1108
Aug 19 2016 11:49
do worker nodes try to reconnect to seed nodes in case the seed node goes down?
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 11:50
if you're connecting using seed-nodes config, then yes
Ralf
@Ralf1108
Aug 19 2016 11:50
or how can you join the cluster which remains at the "seedless" nodes
ok, I'll try this
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 11:51
only that, if seed node is dead they won't join to it
and if there was only only one seed node, if it will get up, it will probably start another cluster ;)
that's why it's advised to have at least 2-3 seed nodes
Ralf
@Ralf1108
Aug 19 2016 11:53
so in production.... if all seed nodes and redundant seed nodes are lost the cluster is lost :D
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 11:54
if you've seen cluster sharding example from the core repository, there is a sample which uses sqlite for sharing information about nodes being part of the cluster - the only downside is that it doesn't do any cleanup of nodes, that have shutdown ungracefully
Ralf
@Ralf1108
Aug 19 2016 11:55
if there is somewhere a list of all running nodes then new nodes could try to join a remaining worker node?
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 11:55
so when new node tries to join, it reads addresses of existing cluster nodes from shared sqlite database and tries to join to them, as if they were seed nodes
Ralf
@Ralf1108
Aug 19 2016 11:55
so all in all seed node is no special type of node. its only meaning is that its address is known at startup time
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 11:56
yes, to join to cluster you only need an address of one node, that you know is part of the cluster (and is live ofc)
Ralf
@Ralf1108
Aug 19 2016 11:56
great :-)
so then the open issue is this "Trying to register to coordinator at" warning message if all seed nodes went down. thx for the support!
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 11:58
no problem :D
NeotericDev
@NeotericDev
Aug 19 2016 12:16
Hi All, I was searching Distributed data(CRDT) in Akka.Net but found no clues. Is CRDT available in .Net port of Akka.
Here in gitter I saw a conversation related to CRDT in 2015.
CRDT.PNG
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 12:20
@NeotericDev ddata is work in progress, quite close to finish AFAIK @maxim-s will know more
NeotericDev
@NeotericDev
Aug 19 2016 12:21
Oh. thats cool. Can I know the expected possible release date.
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 12:26
no idea. Original creator of it had to leave his contribution before finished, and I don't know how much time Maxim can spend on that. For the rest of the contributors, we were more focused on bug fixing and making putting the existing plugins out of the beta. But I think, this area is still open to contributions
NeotericDev
@NeotericDev
Aug 19 2016 12:30
@Horusiath Is there any other options like having some kind of in-memory database for having replicated data across all the nodes.
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 12:30
only third-party for now
NeotericDev
@NeotericDev
Aug 19 2016 12:31
can u please name some of these.
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 12:33
Riak for example - they were one of the first people to implement CRDTs in real systems
NeotericDev
@NeotericDev
Aug 19 2016 12:56
thanks for the info. Will look into it. :)
NeotericDev
@NeotericDev
Aug 19 2016 13:16
If Akka.Net DData is about to get finished, can I get start using it(even though it may have some issues) now itself so that I can get used to it by the time it gets finished.
Bartosz Sypytkowski
@Horusiath
Aug 19 2016 13:26
@NeotericDev I don't think so. It's hard to say whose fork has the most mature version right now
NeotericDev
@NeotericDev
Aug 19 2016 14:09
Okay then, It seems we all have to wait till it gets finish.
Maciek Misztal
@mmisztal1980
Aug 19 2016 16:48
hi all, can anyone give me a quick hint on how to block the current thread, waiting for an actorsystem to terminate while providing an external cancellation token?
@mmisztal1980 ActorSystem.WhenTerminated.Wait()
Maciek Misztal
@mmisztal1980
Aug 19 2016 17:28
cheers!
Chris G. Stevens
@cgstevens
Aug 19 2016 17:35
Anyone ever get this error? When this happens my cluster starts to become unstable.

Akka.Remote.EndpointException: Error while decoding incoming Akka PDU ---> System.ArgumentOutOfRangeException: Length cannot be less than zero.
Parameter name: length
   at System.String.Substring(Int32 startIndex, Int32 length)
   at Akka.Remote.RemoteSystemDaemon.GetChild(IEnumerable`1 name)
   at Akka.Actor.LocalActorRef.GetChild(IEnumerable`1 name)
   at Akka.Actor.LocalActorRefProvider.ResolveActorRef(IInternalActorRef actorRef, IReadOnlyCollection`1 pathElements)
   at Akka.Remote.RemoteActorRefProvider.ResolveActorRefWithLocalAddress(String path, Address localAddress)
   at Akka.Remote.Transport.AkkaPduProtobuffCodec.DecodeMessage(ByteString raw, RemoteActorRefProvider provider, Address localAddress)
   at Akka.Remote.EndpointReader.TryDecodeMessageAndAck(ByteString pdu)
   --- End of inner exception stack trace ---
   at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level)
   at Akka.Remote.EndpointWriter.<SupervisorStrategy>b__20_0(Exception ex)
   at Akka.Actor.LocalOnlyDecider.Decide(Exception cause)
   at Akka.Actor.OneForOneStrategy.Handle(IActorRef child, Exception x)
   at Akka.Actor.SupervisorStrategy.HandleFailure(ActorCell actorCell, Exception cause, ChildRestartStats failedChildStats, IReadOnlyCollection`1 allChildren)
   at Akka.Actor.ActorCell.HandleFailed(Failed f)
   at Akka.Actor.ActorCell.SysMsgInvokeAll(EarliestFirstSystemMessageList messages, Int32 currentState)
Aaron Stannard
@Aaronontheweb
Aug 19 2016 17:35
is this happening with a remotely deployed actor by any chance?
looks like what's happening to me, just by eye-balling it
Chris G. Stevens
@cgstevens
Aug 19 2016 17:36
I think so
Aaron Stannard
@Aaronontheweb
Aug 19 2016 17:37
is that the PDU is being decoded just fine
Chris G. Stevens
@cgstevens
Aug 19 2016 17:37
I have a singleton that is deploying an actor.. that message is on the deployed system
Aaron Stannard
@Aaronontheweb
Aug 19 2016 17:37
problem is when we try to resolve the remotely deployed actor ref
which happens during decoding
we decode the message + the actorref for the local destination for that message
latter part appears to be failing
Chris G. Stevens
@cgstevens
Aug 19 2016 17:37
Ok.. so I do know one of my issues is that I shouldn't have my singleton Iref and should be using the proxy
Aaron Stannard
@Aaronontheweb
Aug 19 2016 17:38
mind logging a bug for this? I'll take a look today. Would you mind also including the details about how this singleton is doing the deployment?
sounds to me like the "special case" code we use for handling remote deployments is what goofed here
Chris G. Stevens
@cgstevens
Aug 19 2016 17:39
I sure can. We are in QA and suppose to go to prod in a few weeks
Aaron Stannard
@Aaronontheweb
Aug 19 2016 17:39
cool - we'll probably be shipping a minor update soon anyway
fixing some IPV6 issues for Mono and other stuff that's been patched since 1.1.1
sounds like this might be an issue with Cluster.Singleton
either that or the Akka.Remote resolution code should be throwing a different error message
if the actor doesn't actually exist
Chris G. Stevens
@cgstevens
Aug 19 2016 17:41
I have been trying to move out what I am doing to my github as I have updated my monitor and I have it working again.
Vagif Abilov
@object
Aug 19 2016 17:43
How do you guys implement your bootstrappers for applications with various actors? In DI manner where actors that are supposed to be created at start time are spawned from a large piece of code? Or using actors for this purpose so it
Aaron Stannard
@Aaronontheweb
Aug 19 2016 17:43
bootstrappers for Akka.Cluster? Or just for spawning your top-level actors?
Chris G. Stevens
@cgstevens
Aug 19 2016 17:43
@Aaronontheweb Thank you very much! I will get this documented
Vagif Abilov
@object
Aug 19 2016 17:43
... so there is only some root actor(s) created in a startup code and they will take care of the rest.
No not clustering scenario, single process.
Aaron Stannard
@Aaronontheweb
Aug 19 2016 17:44
@object I think either approach would be fine. Personally I prefer to do it all inside my Startup.cs or my Topshelf Service class
Chris G. Stevens
@cgstevens
Aug 19 2016 17:44
I agree
Aaron Stannard
@Aaronontheweb
Aug 19 2016 17:45
makes it easier to reason about for someone who's reading the code for the first time
Vagif Abilov
@object
Aug 19 2016 17:45
This is how we do it now but we are facing some issues with actors not being available for others.
I.e. some actors once created start consuming some external events. But bootstrapping is not finished yet, so use of ActorSelection doesn
Aaron Stannard
@Aaronontheweb
Aug 19 2016 17:46
ah
in that case you might be better off having an actor responsible for starting those up
Vagif Abilov
@object
Aug 19 2016 17:46
doesn't guarantee the ready actor.
Aaron Stannard
@Aaronontheweb
Aug 19 2016 17:46
gives you more control over the startup flow
and you can buffer those messages until those actors have started
Vagif Abilov
@object
Aug 19 2016 17:47
I am leaning towards this, can't think of a better option even though I like the idea of having a common bootstrapper.
In the past our actors had to wait for special message (Subscribe) to start consuming external events but this damages actor restart: if actor fails it should be able to restart from the same Props.
So now we design Props so they have everything ready for an actor to do active work. The only problem is startup phase.
I was wondering if there is some known pattern of dealing with this situation.
Chris G. Stevens
@cgstevens
Aug 19 2016 18:05
For me to get around your issue for subscribing..
Since I don't know when another service may come online I have my singleton send a subscribe message every x seconds
I return back to that actor and .Watch() it so I can get the terminate message. Then if that happens I can subscribe again.
I ended up doing that befor e my singleton so I am going to swap it and the members will send to the proxy.
Vagif Abilov
@object
Aug 19 2016 18:14
@cgstevens but can't the actor enter some pending state (using Become) and turn itself into active after the awaited actor is up and running? Instead of sending multiple messages? Perhaps sending messages periodically is more robust but using FSM instead is less chatty and cleaner design-wise.
Chris G. Stevens
@cgstevens
Aug 19 2016 18:15
totally agree
Jeff
@jpierson
Aug 19 2016 18:27
I'm attempting to get at-least-once delivery concepts worked into a cluster with only measured success. I'm dealing with a scenario where I want singleton actors within each node to register with a singleton somewhere else in the cluster (think pub/sub) with at-least-once message guarantees in both directions (register & publish). My first attempt is to build a sort of proxy actor on top of AtLeastOnceDeliveryReceiveActor in which potentially remote communication can go through. The issue I'm running into at the moment is that after subscribe, the subscription manager singleton is sending back messages to Context.Sender (which is my proxy) but at the proxy I'm seeing messages of type UnconfirmedDelivery... Is there any documentation on UnconfirmedDelivery and how it is used by AtLeastOnceDeliveryReceiveActor?
the API docs are updated on each release here: http://getakka.net/docs/ (top of the page)
Falsifiable, after 1 test (0 shrinks) (StdGen (533903721,296194251)):
Original:
::1:1337

---- System.UriFormatException : Can not parse an ActorPath: akka.tcp://foo@[::1]:1337/user/foo
correct me if I'm wrong @garrardkitchen @JeffCyr but.... akka.tcp://foo@[::1]:1337/user/foo is indeed a valid IPV6 URI, is it not?
spec failure I'm getting back from Mono in our Akka.Remote test suite
Jeff
@jpierson
Aug 19 2016 18:39
Thanks, those docs don't really tell much about the roll of UnconfirmedDelivery message though and how it is used. I'll poke around a bit more through the docs and if I can't figure it out I'll post a more specific question to stack overflow.
Aaron Stannard
@Aaronontheweb
Aug 19 2016 18:40
btw, the description you posted is hard to follow - you'd be better off including a diagram for that
like I had trouble visualizing that
Jeff
@jpierson
Aug 19 2016 18:40
Got it, thanks again.
Aaron Stannard
@Aaronontheweb
Aug 19 2016 18:41
Mono, why do you hate IPV6?
did it kill your dog or something?
that spec doesn't fail on Windows with .NET 4.5
yep, sure looks like that's valid: http://stackoverflow.com/a/20402797/377476
wonder if the issue is that Mono throws up when we use the @ symbol
Maciek Misztal
@mmisztal1980
Aug 19 2016 18:52
@Aaronontheweb correct me if I'm wrong, but was there a way to determine if the current cluster node (with a role) is the role leader? I can't seem to find it in the docs
Aaron Stannard
@Aaronontheweb
Aug 19 2016 20:50
those two
should do the trick
Maciek Misztal
@mmisztal1980
Aug 19 2016 20:57
RoleLeader(string) seems to be what I'm after
Aaron Stannard
@Aaronontheweb
Aug 19 2016 21:03
ah
yeah
that would do it
Maciek Misztal
@mmisztal1980
Aug 19 2016 21:08
awesome
Jeff
@jpierson
Aug 19 2016 21:41
@Aaronontheweb I was able to get through the issue eralier with getting AtLeastOnceDeliveryReceiveActor to work for me. I still see a ton of UnconfirmedWarning<unconfirmedDeliveries: 1> entries in the log that I'm not sure about but I'm getting further.