These are chat archives for akkadotnet/akka.net

11th
Aug 2017
Ilya Komendantov
@IlyaKomendantov_twitter
Aug 11 2017 08:36
is there a way to force read journal to perform reading when request comes (do not wait whole refresh-interval)?
mrrd
@mrrd
Aug 11 2017 08:41
Is there any way other than to restart an actor system to reassociate in a cluster once it has become disassociated? Or a way of preventing it from becoming disassociated in the first place and allow it to reconnect and and continue communicating in case of network outages? As finding once an actor system has been disassocated even though it is trying to connect back it is being rejected until it is completely restarted.
I would really like to avoid needing to restart the actor system as it seems a very heavy handed option especially if it is busy doing things still.
actor system obviously referring to a particular node
Robert Stiff
@uatec
Aug 11 2017 09:19
you can change the timeout
or write a timeout on a system to restart it automatically after becoming the only system in the cluster
mrrd
@mrrd
Aug 11 2017 10:32
The thing is I do not want to have to restart it entirely as that would stop all of the actors currently working just so it can re-establish communications
Robert Stiff
@uatec
Aug 11 2017 10:33
i think the principle is that if a system is out of contact, then the cluster as a whole cannot assume that it is running at all
it doesn't know if it's disconnected, or crashed, or what
mrrd
@mrrd
Aug 11 2017 10:34
The root of the problem is the other nodes when the node they have disassociated attempts to re-associate with them is that they immediately drop the tcp socket which seems wrong to me
Robert Stiff
@uatec
Aug 11 2017 10:34
maybe there is a cluste revent you can subscribe to
the timeout and the number of retries is configurable though, no?
mrrd
@mrrd
Aug 11 2017 10:35
And it just gets into a cycle of that node continually trying to connect but effectively being told no go away
Robert Stiff
@uatec
Aug 11 2017 10:36
shrugs, i don't know, sorry
i'm rather new to this
mrrd
@mrrd
Aug 11 2017 10:38
It is trying to retry that's the problem, but for some strange reason is not being allowed to re-associate until its been completely restarted, I really need to find a solution to this as this behaviour just doesn't seem correct
Maxim Cherednik
@maxcherednik
Aug 11 2017 10:38
actually, I was checking this recently, on the log file there is a message like this:
2017-08-04 15:10:09.250 [19] [(null)] INFO  Akka.Event.DummyClassForStringSources - Quarantined address [akka.tcp://riskengine@127.0.0.1:4055] is still unreachable or has not been restarted. Keeping it quarantined.
Arjen Smits
@Danthar
Aug 11 2017 10:39
This behavior of the default AP behavior of akka cluster
mrrd
@mrrd
Aug 11 2017 10:39
Unfortunately not no, when examining the logs either side there are log entries of the socket being terminated immediately after attempting to associate
Arjen Smits
@Danthar
Aug 11 2017 10:39
Available Partition tolerant
Maxim Cherednik
@maxcherednik
Aug 11 2017 10:39
After some time the system actually blocks it into Quarantined. Same happens on the other side of the cluster.
mrrd
@mrrd
Aug 11 2017 10:40
But when the node is completely restarted it is allowed to connect again
Maxim Cherednik
@maxcherednik
Aug 11 2017 10:40
yes
Arjen Smits
@Danthar
Aug 11 2017 10:40
@mrrd it would help if you can post some logs, via a gist or something
mrrd
@mrrd
Aug 11 2017 10:40
I'll see if i can dig some out
Maxim Cherednik
@maxcherednik
Aug 11 2017 10:41
and here is another message:
2017-08-04 15:26:32.526 [27] [(null)] WARN  Akka.Event.DummyClassForStringSources - Association to [akka.tcp://riskengine@127.0.0.1:4055] having UID [1937162489] is irrecoverably failed. UID is now quarantined and all messages to this UID will be delivered to dead letters. Remote actorsystem must be restarted to recover from this situation.
mrrd
@mrrd
Aug 11 2017 10:42
It's not doing it at the moment as i've completely restarted the entire cluster as it working is mission critical and its now working, but ive noticed after sometime it starts breaking down, often even when doing things like deployments have happened
Maxim Cherednik
@maxcherednik
Aug 11 2017 10:42
hm
@Danthar @Aaronontheweb Do you guys know why when the node is under Quarantine the actual tcp connection is not closed? Is this expected?
mrrd
@mrrd
Aug 11 2017 10:46
running a search through the logfiles now, obviously as they are set to debug logging this could take a little time bear with me ;-)
Maxim Cherednik
@maxcherednik
Aug 11 2017 10:48
Actually, what I was doing to understand the behavior of the cluster, I setup a simple cluster 4 nodes without any logic inside and started to put nodes down(graceful or killing). This make it easy to comprehend - the only logs you have are from the akka itself.
Arjen Smits
@Danthar
Aug 11 2017 10:54
@maxcherednik tcp connection !== socket
Sockets may remain open
but that does not mean there is an actual tcp connection with another system
Maxim Cherednik
@maxcherednik
Aug 11 2017 10:54
ah, right. I actually didn't see if the connection is still open
mrrd
@mrrd
Aug 11 2017 12:01

right before it started was:

2017-08-01 05:18:56.6305|ERROR|Akka.Remote.Transport.DotNetty.TcpServerHandler|Error caught channel [::ffff:127.0.0.1]:4053->[::ffff:127.0.0.1]:62119|System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
at DotNetty.Transport.Channels.Sockets.SocketChannelAsyncOperation.Validate()
at DotNetty.Transport.Channels.Sockets.AbstractSocketByteChannel.SocketByteChannelUnsafe.FinishRead(SocketChannelAsyncOperation operation)
2017-08-01 05:18:56.6461|ERROR|Akka.Remote.Transport.DotNetty.TcpClientHandler|Error caught channel [::ffff:127.0.0.1]:62123->[::ffff:127.0.0.1]:8094|System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
at DotNetty.Transport.Channels.Sockets.SocketChannelAsyncOperation.Validate()
at DotNetty.Transport.Channels.Sockets.AbstractSocketByteChannel.SocketByteChannelUnsafe.FinishRead(SocketChannelAsyncOperation operation)
2017-08-01 05:18:56.6461|WARN|Akka.Remote.ReliableDeliverySupervisor|Association with remote system akka.tcp://system@127.0.0.1:8094 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointDisassociatedException: Disassociated
at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level, Boolean needToThrow)
at Akka.Remote.EndpointWriter.Unhandled(Object message)
at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction`1 partialAction)
at Akka.Actor.ReceiveActor.<>cDisplayClass11_0.<Become>b0(Object m)
at Akka.Actor.ActorCell.<>cDisplayClass112_0.<Akka.Actor.IUntypedActorContext.Become>b0(Object m)
at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
at Akka.Actor.ActorCell.ReceiveMessage(Object message)
at Akka.Actor.ActorCell.ReceivedTerminated(Terminated t)
at Akka.Actor.ActorCell.AutoReceiveMessage(Envelope envelope)
at Akka.Actor.ActorCell.Invoke(Envelope envelope)
--- End of stack trace from previous location where exception was thrown ---
at Akka.Actor.ActorCell.HandleFailed(Failed f)
at Akka.Actor.ActorCell.SysMsgInvokeAll(EarliestFirstSystemMessageList messages, Int32 currentState)]|

one of the messages that is repeated over and over in the older logs is, this was repeated from 05:19:03 until the end of the days log every few seconds:

2017-08-01 23:59:51.4909|WARN|Akka.Remote.EndpointWriter|AssociationError [akka.tcp://system@127.0.0.1:4053] -> akka.tcp://system@127.0.0.1:8094: Error [No connection could be made because the target machine actively refused it tcp://system@127.0.0.1:8094] []|
2017-08-01 23:59:51.4909|WARN|Akka.Event.DummyClassForStringSources|Tried to associate with unreachable remote address [akka.tcp://system@127.0.0.1:8094]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [No connection could be made because the target machine actively refused it tcp://system@127.0.0.1:8094] |

mrrd
@mrrd
Aug 11 2017 12:11

the strange thing is the other end I can see logs like this, but nothing about it denying connections, it even indicates it is associated:

2017-08-01 08:39:52.3878 DEBUG Associated [akka.tcp://system@127.0.0.1:8094] <- akka.tcp://system@127.0.0.1:4053

all very strange

it also indicates it was still in contact with other nodes during the day
it even received heartbeats from the node with the connection errors at 08:39:57 despite the other end having errors saying it couldn't connect, just going to have to keep a close eye on it i guess
hence why I am getting confused as to what is going on ;-)
mrrd
@mrrd
Aug 11 2017 12:17
see if it starts re-occuring
Aaron Stannard
@Aaronontheweb
Aug 11 2017 14:15
@maxcherednik we're launching 1.3 today so Gitter chat is going to be less responsive than usual
Maxim Cherednik
@maxcherednik
Aug 11 2017 14:15
:)
Aaron Stannard
@Aaronontheweb
Aug 11 2017 14:15
expect little to no responses in here today
not because we don't care :heart:
Eduards Brown
@EduardsBrown
Aug 11 2017 14:16
akka .net 1.3?
Maxim Cherednik
@maxcherednik
Aug 11 2017 14:16
np :)
fingers crossed
Aaron Stannard
@Aaronontheweb
Aug 11 2017 14:16
but because we're pushing this thing out of the drydock lol
yep
Akka.NET 1.3
Eduards Brown
@EduardsBrown
Aug 11 2017 14:16
Nice, I was just googling when was that coming out xD
Robert Stiff
@uatec
Aug 11 2017 14:16
do you have release notes yet?
Lealand Vettleson
@spankr
Aug 11 2017 15:20
:shipit:
Aaron Stannard
@Aaronontheweb
Aug 11 2017 16:08
@uatec working on it
the TL;DR; version
Akka.Persistence is now a stable module
no longer in beta
ditto for Akka.Persistence.Query
Akka.IO has been redesigned around SocketAsyncEventArgs and has performance that is comparable to DotNetty
we've upgraded from Google Protobuf2 to Protobuf3, which means that 1.3 won't be backwards compatible with 1.2 on the network
and reason we did that is because all Akka.NET core packages now fully support .NET Standard
and Protobuf2 will not be backported to support .NET Standard
and in general, the performance of Akka.Remote has improved significantly
David Rivera
@mithril52
Aug 11 2017 16:11
Awesome. I assume the Akka.Remote performance enhancements translate over to Akka.Cluster?
Aaron Stannard
@Aaronontheweb
Aug 11 2017 16:11
my benchmark on the 1.3 branch began at like 8k msg / s. Now comfortably pushing about 53k msg / s.
yes it will
we have a lot more perf work we can do to Akka.Remote
David Rivera
@mithril52
Aug 11 2017 16:12
Nice
Aaron Stannard
@Aaronontheweb
Aug 11 2017 16:12
but none of it was a blocking issue for 1.3
so I expect there will be more rounds of optimization there in the future
and perf optimization for Akka.Persistence too
it never stops, really
one thing that kind of sucks is we had to disable compiler optimization for Akka.Streams due to a bug in the Roslyn compiler for C# 7 (edit: on .NET Standard / Core)
so the perf of Akka.Streams will be affected negatively by that. But the newest stuff coming out for .NET Standard 2.0 (final as of 48 hours ago) will address that
the optimizer was producing invalid IL that resulted in a bunch of nonsense NullReferenceExceptions
David Rivera
@mithril52
Aug 11 2017 16:14
Which serializer do you use for those tests btw?
Aaron Stannard
@Aaronontheweb
Aug 11 2017 16:15
in our benchmark we're just using the built-in string serializer
so the real perf is going to be lower once JSON.NET or Hyperion kicks in
since you have the whole polymorphism thing going on
David Rivera
@mithril52
Aug 11 2017 16:15
ok
Aaron Stannard
@Aaronontheweb
Aug 11 2017 16:16
most of what we improved in Akka.Remote is how it selects and uses its serializer
and we also reduced the number of objects it allocates pretty significantly in one area there
and then we were able to reduce the amount of address parsing, which turned out to be pretty expensive
the hottest paths in your Akka.Remote / Cluster applications (i.e. the actors who are sending and receiving the most messages) will see the biggest benefit from these changes
since they are implemented using LRU caching structures
David Rivera
@mithril52
Aug 11 2017 16:17
My hottest path is likely to be a consistent hash pool router
Aaron Stannard
@Aaronontheweb
Aug 11 2017 16:18
well in that case, all of the routees on the end of that router
are going to end up sitting pretty high up in the LRU cache
so we won't be parsing their addresses over and over again
Aaron Stannard
@Aaronontheweb
Aug 11 2017 17:11
@/all also, one more big announcement for today: https://twitter.com/dotnetfdn/status/896054195425378305
Roberto Vespa
@wasphub
Aug 11 2017 17:31
hello
great news today :smile:
I have a question about akka persistence and sql server
I'm using the myget feed for dotnet core, there I do not see the sqlserver package
Aaron Stannard
@Aaronontheweb
Aug 11 2017 17:33
@wasphub we haven't updated it for .NET Core yet
Roberto Vespa
@wasphub
Aug 11 2017 17:33
ok
Aaron Stannard
@Aaronontheweb
Aug 11 2017 17:33
we'll be starting work on getting all of the individual persistence plugins updated and out of beta too
once this release is live on NuGet
Roberto Vespa
@wasphub
Aug 11 2017 17:34
cool, in the meantime what persistence.query.sql does?
trying to find an alternative I can use in dev now
on core
Aaron Stannard
@Aaronontheweb
Aug 11 2017 17:39
don't have a good answer for you yet
the functionality will remain the same as it is now, but we haven't started porting any plugins aside from SQLite to .NET Core
Roberto Vespa
@wasphub
Aug 11 2017 17:40
ok, I'll see how to handle it for the time being, I might use Sqlite until you release the official bits including persistence
thx Aaron :smile:
Robert Stiff
@uatec
Aug 11 2017 19:44
@Aaronontheweb That's good news. A colleague of mine tweeted that to me because he knows i've been working with akka.net all week
Guillermo
@mompox
Aug 11 2017 20:21
Hi, we are having a problem with remote. I posted it in StackOverflow, could someone take a look? https://stackoverflow.com/questions/45643151/node-in-cluster-stops-listening-for-connections Thanks
Aaron Stannard
@Aaronontheweb
Aug 11 2017 20:25
@mompox just left a comment on there
need more log info
but the configuration you posted is invalid if that's what it really looks like
Guillermo
@mompox
Aug 11 2017 20:38
Thanks! I added more information from the logs and added a comment
Aaron Stannard
@Aaronontheweb
Aug 11 2017 20:38
step 1 of V1.3 release: akkadotnet/akka.net#2964
Kenneth Ito
@kennethito
Aug 11 2017 22:40
Did Akka.Remote 1.3 get released? It's not showing up on nuget although I see many of the other packages listing 1.3. Nuget caching?
Aaron Stannard
@Aaronontheweb
Aug 11 2017 22:40
not yet - had some technical difficulties with the NuGet publish step
should be up soon though
Kenneth Ito
@kennethito
Aug 11 2017 22:41
Ah, awesome. Thanks for the update
Aaron Stannard
@Aaronontheweb
Aug 11 2017 23:09
@/all Akka.NET 1.3 is now live on NuGet: https://twitter.com/AkkaDotNET/status/896146083117543425
Kenneth Ito
@kennethito
Aug 11 2017 23:10
grats ; )