These are chat archives for akkadotnet/akka.net

5th
Jul 2016
Arsene Tochemey GANDOTE
@Tochemey
Jul 05 2016 07:20
Hello can I use try/catch in my receive handlers? Or is there any better way to do that?
Peter Bergman
@peter-bannerflow
Jul 05 2016 09:28

Hi, I have a case where I see the following message in my lighthouse log:
New incarnation of existing member [UniqueAddress: (akka.tcp://bf@10.0.0.5:5052, 1430999384)] is trying to join. Existing will be removed from the cluster and then new member will be allowed to join.

That message repeats and doesn't seem to stop. It was triggered by first shutting down an existing node in the cluster and then starting it up again. Should the cluster resolve this "by itself", or is there something wrong? In the log of the joining node, I can see that the node starts up and starts remoting, but I never get a welcome message from the cluster.

Using Akka.Cluster v1.0.9.236-beta
Also, in the lighthouse log I can see that the leader is unable to perform its duties due to the unreachable node, but I guess that is expected until the node has properly joined the cluster again?
Alex Valuyskiy
@alexvaluyskiy
Jul 05 2016 10:04
@Tochemey I know two implementations of Presistence.Redis. But none of them are ready for production
Arsene Tochemey GANDOTE
@Tochemey
Jul 05 2016 10:16
ok
Nathan Johnstone
@nrjohnstone
Jul 05 2016 11:34
@Tochemey from what we have been using here with our implementations of Akka.NET, we have been avoiding using try catch in our receive handlers and using Actor Supervision instead
then, based on the exception, telling the actor to resume, restart etc..
this feels a lot nicer and future proof than try / catches
Arsene Tochemey GANDOTE
@Tochemey
Jul 05 2016 11:36
@nrjohnstone Assuming you are connecting to a remote tcp server and there is an IOException
Nathan Johnstone
@nrjohnstone
Jul 05 2016 12:02
yep, so I'd just add a case for the IOException in the supervision strategy for your actors parent and take whatever action is appropriate
maybe restart the actor and tell it a specific message to have it backoff and retry the connection in N minutes or something?
we are pretty new with Akka however, so maybe some of the experts can recommend different approaches
Arsene Tochemey GANDOTE
@Tochemey
Jul 05 2016 12:28
I can see that all the persistence layer are still beta. Any advices.
Max
@maxpaj
Jul 05 2016 12:40
Hi guys, here's a noob question: Will my cluster still be visible to other joining nodes if my Lighthouse instances go down and restart? Do cluster nodes gossip to seed nodes at some interval? I've briefly looked through the documentation of Lighthouse but it doesn't say much about this. But since Lighthouse too is a cluster node, it should receive gossip, right?
Bartosz Sypytkowski
@Horusiath
Jul 05 2016 12:43
@maxpaj cluster is P2P which means that it won't fail if any of the nodes will go down. However to join to cluster, joining node has to know address of at least one other node, that is actively part of the cluster.
@Tochemey what advices do you need?
Arsene Tochemey GANDOTE
@Tochemey
Jul 05 2016 12:45
@Horusiath concerning the Persistences layers of Akka.Net. Should I use them or implement my own.
Max
@maxpaj
Jul 05 2016 12:46
@Horusiath Alright, thanks!
Bartosz Sypytkowski
@Horusiath
Jul 05 2016 12:49
@Tochemey depending on your requirements. Akka.Persistence uses eventsourcing, so it's quite advenced way of persisting a state. If you need simple snapshots, you can easily implement them by yourself.
Arsene Tochemey GANDOTE
@Tochemey
Jul 05 2016 12:50
@Horusiath Pardon me I don't understand what is snapshot and journal. All I know is schema, tables and so forth... Can you please educate me a bit?
Bartosz Sypytkowski
@Horusiath
Jul 05 2016 12:51
Are you familiar with evensourcing concept?
Arsene Tochemey GANDOTE
@Tochemey
Jul 05 2016 12:52
I have never heard of it.
Kevin McFarlane
@kevinmcfarlane
Jul 05 2016 12:53
Kevin McFarlane
@kevinmcfarlane
Jul 05 2016 13:02
@nrjohnstone "I'd just add a case for the IOException in the supervision strategy for your actors parent and take whatever action is appropriate"
Yes, that's what I've done in my prototype application. Takes a little practice depending on how your application is structured.
Vagif Abilov
@object
Jul 05 2016 13:12
When using supervision strategy with F# API, is it possible at all to replay the message after the actor is restarted? It doesn't look so. With C# the unprocessed message is sent as an argument to PreRestart call so the newly re-created actor has a chance to retry it. But with F# API there is no access to PreRestart method, and stashing last message doesn't work because stash container is gone. Does this mean that restart with replay scenario is not supported with F# API?
Bartosz Sypytkowski
@Horusiath
Jul 05 2016 13:14
@object unfortunately. I'd use Akkling for that.
Vagif Abilov
@object
Jul 05 2016 13:20
@Horusiath interesting. So Akkling supports that? Another reason to switch.
@Horusiath and are there any plans to incorporate Akkling into core Akka.NET?
Nathan Johnstone
@nrjohnstone
Jul 05 2016 13:37
@kevinmcfarlane : great, I was hoping we were on the right track in thinking about allowing supervision to take care of things and designing our actors to be robust in terms of restarting/resuming etc ...
while we are talking about supervision strategies, I noticed that the current repo has some nice methods for customizing the strategy on routers defined in the configuration.. these were not in the current release 1.8 ... anyone know when a new release is planned? Currently we removed our router configuration from the HOCON because we needed the default strategy to be OneForOne instead of Esclate, as this was causing the entire router to be restarted (as the doco clearly states will happen ;0)
Bartosz Sypytkowski
@Horusiath
Jul 05 2016 14:09
@object not anytime soon
Max
@maxpaj
Jul 05 2016 14:37
Hey guys, is there any good/best practice for how to share message models definitions between our node projects? In Visual Studio we can reference models/classes between projects, so we can have one project that shares its models with all other projects (GlobalProject.MessageModel). But this means that whenever we update something in that project all of the affected projects need to be updated and all affected nodes be redeployed? The other choice we see is to keep a separate definition for all message models that are used in each individual project - so we have Project1.MessageModel1 and Project2.MessageModel1.
Bartosz Sypytkowski
@Horusiath
Jul 05 2016 15:02
@maxpaj I'd go for 1st option - having project for message definitions. Depending on how intrusive message changes are, serializers are able to tolerate minor differences between message schema between versions. If you fit message schema changes within that scope, you could incrementally update your cluster nodes without any additional work. If changes are more aggressive, you may use cluster roles to annotate each cluster node with specific version and set configuration/logic of your application to not talk between nodes having different versions.
Aaron Stannard
@Aaronontheweb
Jul 05 2016 16:06
@maxpaj internal NuGet package
is the best way to go
and what @Horusiath suggested
wdspider
@wdspider
Jul 05 2016 16:07
Can you explain why an internal nuget package rather than just a project ref is a better way to go?
Aaron Stannard
@Aaronontheweb
Jul 05 2016 16:08
if you can deploy all of your services from a single solution file
then you don't need a nuget package
if you have multiple solutions however, a NuGet package on a private feed becomes significantly easier to manage
wdspider
@wdspider
Jul 05 2016 16:08
ah... yes
Damian Reeves
@DamianReeves
Jul 05 2016 17:21
Any timeline on 1.1? Midweek?
Aaron Stannard
@Aaronontheweb
Jul 05 2016 17:21
staging it today
going to leave it open briefly for a "speak now or forever hold your peace" discussion
doing some work now to clean up some of the logging stuff that we've been running in Helios 2.1
all of the mandatory bits of the milestone were finished as of yesterday
Jordan S. Jones
@jordansjones
Jul 05 2016 17:24
I'm trying to reproduce a bug I think I found in the DistributedPubSubMediator, class when used from the Cluster Client, via unit tests and noticed that all of the ClusterClient tests are commented out. Does anyone know why this might be?
Aaron Stannard
@Aaronontheweb
Jul 05 2016 17:25
I know @alexvaluyskiy has been working on some of those issues @jordansjones
he might know
Jordan S. Jones
@jordansjones
Jul 05 2016 17:25
Thanks @Aaronontheweb . I'll wait to see if he responds.
Damian Reeves
@DamianReeves
Jul 05 2016 17:26
can't wait. great work as usual
Aaron Stannard
@Aaronontheweb
Jul 05 2016 19:34
good news: just did an acceptance test with the upgrade to Akka.Cluster using WebCrawler / Lighthouse
upgraded the Lighthouse node only to use Akka.NET 1.1 (the latest 1.0.9 package on our nightly build server, which we should have re-numbered :\ )
and left the crawler nodes to run the older software (1.0.6 in this case)
everything ran fine - no issues at all
so that verifies that the modifications we made to some system messages and remote deployment are fine
going to do a couple more tests like this ;)
so I can include any upgrade steps in the release notes
want to verify that you can upgrade a running cluster with no downtime
Aaron Stannard
@Aaronontheweb
Jul 05 2016 19:39
oh man, awesome
just saw the upgraded cluster nodes go through a rejoin scenario that caused a lot of trouble for people using 1.0.8
when a node gets quarantined and can't rejoin even after a restart
saw the leader automatically down the old instance and mark the new instance as up as soon as it joined
running the 1.1 software
right now I'm running half the services using 1.1, other half still running the old software
Sean Gilliam
@sean-gilliam
Jul 05 2016 19:58
nice
Curtis Swartzentruber
@skills0
Jul 05 2016 20:41
very happy to hear that, that's one of the specific issues we are hoping is fixed in 1.1
Jordan S. Jones
@jordansjones
Jul 05 2016 21:39
So why was the ReadView property on Cluster made internal?
Aaron Stannard
@Aaronontheweb
Jul 05 2016 21:39
it's been replaced by CurrentClusterState
readview has a bunch of properties on it that aren't meant to be publicly accessible
you can get the same usable public-facing data from CurrentClusterState
Jordan S. Jones
@jordansjones
Jul 05 2016 21:41
Ahh.. good to know. Thanks.
Aaron Stannard
@Aaronontheweb
Jul 05 2016 21:45
@alexvaluyskiy might need your help with this one: akkadotnet/akka.net#2138
can reliably reproduce it
don't have a spec to reproduce it yet
but button-clicking through an app I sure can
Aaron Stannard
@Aaronontheweb
Jul 05 2016 21:59
I can verify that this works for non-seed nodes / non-leaders
so this is a special case
Aaron Stannard
@Aaronontheweb
Jul 05 2016 22:36
weird, we have a multi-node test that tests specifically for this
RestartFirstSeedNodeSpec
Aaron Stannard
@Aaronontheweb
Jul 05 2016 23:01
ok, been able to reproduce the issue with non-seed nodes too
there's a state management bug here
seeing gossip get ignored from the unreachable node
even though it's back up and able to perform
something needs to unset the unreachable flag
and it isn't happening
halting the release until this gets fixed