These are chat archives for akkadotnet/akka.net

12th
Apr 2017
Chris Ochs
@gamemachine
Apr 12 2017 00:00
I know this is kind of a point of contention between akka/orleans, at least in java land, but after using java akka for a long time and recently playing around with orleans, orleans is just much easier to work with for what I think are the most common use cases. Enough so that akka should really provide functionality to mimic the orleans behavior
I guess sharding actually kind of does that mostly
Chris Ochs
@gamemachine
Apr 12 2017 00:43
so in the docs, what does the shardId refer to? And it doesn't say anything about how entities are created, just how to send them a message
Aaron Stannard
@Aaronontheweb
Apr 12 2017 15:09
@ronnyek I'm using Akka.IO directly in a new product
the new version of it we're still working on in #2405 will be much better
since that uses socket async event args
instead of the java-looking API it uses under the hood now
@/all FYI, Akka.NET 1.2 is now released: https://github.com/akkadotnet/akka.net/releases/tag/v1.2
available on NuGet
Jose Carlos Marquez
@oeaoaueaa
Apr 12 2017 15:35
nice!, is akkadotnet/akka.net#2584 fixed on it?
Aaron Stannard
@Aaronontheweb
Apr 12 2017 15:36
@oeaoaueaa still working on that one
planning on doing a maintenance release shortly thereafter
but we fixed a ton of other stuff in Akka.Cluster and other parts
that issue is nasty because it appears to be that two objects with the same data have different hashcode values because they're in different AppDomains - specifically the gossip versions of individual nodes
Jose Carlos Marquez
@oeaoaueaa
Apr 12 2017 15:37
we are downloading it now
Aaron Stannard
@Aaronontheweb
Apr 12 2017 15:37
I've been writing some gnarly tests that go through and systematically rule out different parts of that system, but I haven't had a chance to do more work on that since Friday
but that one and the port exhaustion issue with Akka.Cluster are the two bugs at the top of my personal list
but yep, wanted to let you know I haven't forgotten about it
:heart:
Jose Carlos Marquez
@oeaoaueaa
Apr 12 2017 15:39
thanks!
I thought that the "joining" issue could be caused because the boxes in that environment are in two different networks and the clocks are off, but tried in a newly build environment and was able to reproduce it although it is happening less often
Aaron Stannard
@Aaronontheweb
Apr 12 2017 15:42
the reason why the issue appears to be random
is a bit of a mystery to me
the best explanation I've come up with
is that the GetHashCode function used in the VectorClock will usually be consistent
but there's a chance it won't be
I've rewritten that chunk of code to be computed from the sum of the characters in the VectorClock name
rather than having it determined automatically by string.GetHashCode
since the latter can include random values potentially
Bartosz Sypytkowski
@Horusiath
Apr 12 2017 15:44
lol, wut?
The hash code itself is not guaranteed to be stable. Hash codes for identical strings can differ across versions of the .NET Framework and across platforms (such as 32-bit and 64-bit) for a single version of the .NET Framework. In some cases, they can even differ by application domain.
the last sentence is the killer
I'd had to deal with this before when we were using the hashcode for persistence stuff in a non-Akka project
so what you have to do is write the HashCode function to be something that is stable
Bartosz Sypytkowski
@Horusiath
Apr 12 2017 15:46
jesus fucking christ, and I thought that the fact that TypeHandle for the same Type differs between applications is crazy
Aaron Stannard
@Aaronontheweb
Apr 12 2017 15:47
I iterate over the array of characters in the string and compute the hashcode as a product of the bytes
you know what the worst part is?
when I was reviewing the original Akka.Cluster port code for VectorClock three years ago
I noticed this line
and remembered the bad experience I had with hashcodes not being stable
and wondered if this would be an issue
when I dug into this on Friday I was like "ohhhhhhhhhhh yeah"
"whoops"
anyway, I'm not sure if this is it
Jose Carlos Marquez
@oeaoaueaa
Apr 12 2017 15:49
.net gotchas
Aaron Stannard
@Aaronontheweb
Apr 12 2017 15:49
since the hashcode value should translate to the same thing after being serialized back into each individual node's appdomain
buuuuuuuut since I don't know for sure
I'm testing it anyway
some of the logs I received from @oeaoaueaa and one other user related to that join bug
made it really clear that what's happening is the node that is joining always appears as a new entry in the leader's gossip
which indicates that the key for that node isn't being properly merged into the gossip, and stupid hashcode issues are exactly the sort of thing that can cause that
either that or someone implemented .equals incorrectly
or not at all
the latter being the case with the VectorClock class
fixed that in my local branch
I have a pair of FsCheck model based tests I'm working on
that are designed to reproduce this issue and verify, exhaustively, that all combinations of merging and gossip updates work as expected
Jose Carlos Marquez
@oeaoaueaa
Apr 12 2017 15:58
would it help if I log the hash code in each node in the environment where this happens?
Aaron Stannard
@Aaronontheweb
Apr 12 2017 15:58
offhand, I'm not sure if you can get access to this object
since it's part of the messages that the internal cluster daemons use
Bartosz Sypytkowski
@Horusiath
Apr 12 2017 15:59
@Aaronontheweb FsCheck could potentially detect this if used across AppDomains... eventually
Aaron Stannard
@Aaronontheweb
Apr 12 2017 15:59
@Horusiath my plan is to randomize the hashcodes and see what happens
voltcode
@voltcode
Apr 12 2017 16:15
congrats on 1.2 !!!
just got the e-mail
will be testing soon
posted to HN to get us more exposure! upvotes everybody!
Franky Ostyn
@FOstyn
Apr 12 2017 16:22
@Aaronontheweb
Congrats Aaron. Thanks, you and everyone, for the great job. :+1:
Let's try it now :fire:
Roman Alexis Anastasini
@foliba
Apr 12 2017 16:52
Yay for 1.2!
Kudos :-) great job indeed.
Alex Gibson
@crucifieddreams
Apr 12 2017 18:04
Coordinated shutdown is a welcome addition, thanks for that Aaron and the team :+1: . Looking forward to seeing the result of your hashcode tests.
Maciek Misztal
@mmisztal1980
Apr 12 2017 19:18
grats!
@Aaronontheweb did you guys implement the DowningProviders yet?
Bartosz Sypytkowski
@Horusiath
Apr 12 2017 19:24
@mmisztal1980 there is an API exposed, but we haven't created any impl beside auto-down yet. Usually they are pretty easy to make, maybe just a "slightly" more irritating to write some meaningful test for
Maciek Misztal
@mmisztal1980
Apr 12 2017 19:32
@Horusiath Q: about the ssl support in DotNetty, does the config model support reading the certificate from an Azure KeyVault?
Bartosz Sypytkowski
@Horusiath
Apr 12 2017 19:33
I don't know how this works in Azure ;)
but underneat it's standard way of loading x509certificate2 (path to pptx + password + flags)
Maciek Misztal
@mmisztal1980
Apr 12 2017 22:04
@Horusiath is it plugin based? can it be extended if needed?
Aaron Stannard
@Aaronontheweb
Apr 12 2017 22:11
@mmisztal1980 you know, that's a good question
DotNetty itself is very extensible
with Azure KeyVault, is it possible to pull the key out and put it somewhere where the file system can reach it?
haven't worked with it at all myself
that's what the TLS settings take
Aaron Stannard
@Aaronontheweb
Apr 12 2017 22:32
ah dear god
I don't know what the odds of this happening are, but found an easy edge case that could explain that earlier hashcode problem
From UniqueAddress
        /// <inheritdoc cref="object.GetHashCode"/>
        public override int GetHashCode()
        {
            return Uid;
        }
if that UID is the same for two nodes
that would totally cause a problem
the UID is supposed to be unique per-actor system
and should be able to change across restarts
I doubt that's it
since the UIDs have been in use for a long time
but I know there's an instance when a node is first joining where its UID is unknown initially
that could do it