These are chat archives for akkadotnet/akka.net

13th
Dec 2016
Daniel Little
@lavinski_twitter
Dec 13 2016 00:04
It also seems to be causing a problem with Persist, which is proving difficult to debug
Aaron Stannard
@Aaronontheweb
Dec 13 2016 00:45
would anyone here find an FsCheck + Akka.TestKit integration useful?
currently using one I rolled for a project internally, hadn't thought about OSSing it
destroys and recreates the ActorSystem between each setup of the FsCheck state machine model
Aaron Stannard
@Aaronontheweb
Dec 13 2016 00:50
@to11mtm > Really can't overstate how the proj has done more to re-invigorate us about developing for .NET than anything Microsoft has attempted to do the last 3 years.
this is the highest praise I can imagine for those of us who've worked on the project
thank you. It's done the same for us too.
(well, I assume it has :p )
Daniel Little
@lavinski_twitter
Dec 13 2016 01:47
@Horusiath Figured out the issue with Persist i have having, having a "return! loop state" at the bottom of the function after return Persist(x) causes everything to (silently) stop working
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 07:07
@lavinski_twitter regarding Ask - there's a <? which work similar way (you can specify global timeout using akka.actor.ask-timeout config). Regarding persistence - there is an example here you could use as a reference. But indeed stopping behavior sounds strange. Could you provide code to reproduce?
Arsene Tochemey GANDOTE
@Tochemey
Dec 13 2016 07:07
Please any idea of the new release of Akka.Net (.NET.CORE support).
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 07:08
@Tochemey afaik we still need to change 2 things at least: 1) fix a very tricky bug in akka-io 2) move to DotNetty as a transport layer
Arsene Tochemey GANDOTE
@Tochemey
Dec 13 2016 07:09
Wow
So let us say next year.
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 08:41
@Horusiath I am stuck with my configuration right now. If I use the non-clustered configuration like
var jobCoordinatorRouter = batchJobActorSystem.ActorOf(batchJobActorSystem.DI().Props<JobCoordinatorActor>().WithRouter(new RoundRobinPool(5)), "JobCoordinator") everything runs as expected. On the other hand if I change the code to
jobCoordinatorRouter = batchJobActorSystem.ActorOf(batchJobActorSystem.DI().Props<JobCoordinatorActor>().WithRouter(FromConfig.Instance), "JobCoordinator") using the following configuration:
deployment {
/JobCoordinator {
router = round-robin-pool
nr-of-instances = 20
cluster {
enabled = on
allow-local-routees = on
use-role = prototype
max-nr-of-instances-per-node = 5
}
}
}
I receive dead letters and my application stops processing. I do not understand why the behaviour of my application changes if I only make the changes stated above? I mean I do not add another node to the cluster yet, everything is still running on a single node. Any help would be highly appreciated.
W.Zwiers
@woozer
Dec 13 2016 09:03
Hi all, question
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 09:03
@claudiobernasconi I see that you have allow-local-routees=on and use-role=prototype. Does you batchJobActorSystem has a prototype role?
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 09:04
@Horusiath this is my cluster configuration:
cluster {
seed-nodes = ["akka.tcp://BatchJobActorSystem@127.0.0.1:4053"]
roles = ["prototype"]
}
W.Zwiers
@woozer
Dec 13 2016 09:05
akka runs fine, except when i use tell at least once messaging, my windows server keeps using a lot of memory after my message run (150000 messages), messeges get delivered, but memory is not released after run for some reason.
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 09:06
@claudiobernasconi I assume, you have followed the cluster router docs?
@woozer are you confirming messages send using Deliver()?
W.Zwiers
@woozer
Dec 13 2016 09:09
@claudiobernasconi Yes, i use ConfirmDelivery(deliveryId); (returns true)
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 09:09
@Horusiath At least I read it twice and double-checkd if I did some obvious errors applying it to my solution. But because I have no experience in using Akka.Net i would still say there is a possibility that I did something wrong. What I do not understand is why behaviour changes if I switch these statements? I thought that it would run just the same as long as I do not add a second node to the cluster which is not the case. If it would help I could upload my source code. There is not much code yet.
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 09:13
@claudiobernasconi I'll make a sample
@woozer can you share the code on gist? Maybe it's a bug
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 09:18
@Horusiath All right, it seems like I did an obvious mistake then?
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 09:41
@claudiobernasconi I've got a simple sample: https://gist.github.com/Horusiath/7337490cba3740bbe8e6f33aa699f348
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 09:57
@Horusiath Thank you very much for your example. I adapted that sample to my solution, but I still get those dead letters. Seems like setting up the cluster is fine, but I have some problems within my actors. I'll show you the code of my JobCoordinatorActor as well as my WorkerActor.
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 09:57
ok
ignore the protected override void Unhandled(object message)
that was just a test
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 10:01
@Horusiath here it is: https://gist.github.com/claudiobernasconi/cd9460cdd39a13eb96887748a40d87ff Maybe I did something obviously wrong there?
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 10:03
@woozer are you aware, that ReceiveRecover will be called each time , when your actor is started/restarted? That means if you've persisted 10 000 IRemoteMessage requests and restart that actor, it will replay all those events and call Delivery for each one of them
I've created sample of at-least-once delivery actor back in the days
W.Zwiers
@woozer
Dec 13 2016 10:05
I assume receive recover will not be called if messages are successful confirmed?
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 10:05
@Horusiath This is the monitor view (you'll recognize that I send messages to a monitor in the code): https://www.dropbox.com/s/swistjmjo6ix0zk/monitor.png?dl=0 You can see that it processes about 5k messages before it stops. And you can see that it stops at random points.
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 10:07
@woozer receive recover is not associated with messages confirmaton. It will simply call all persisted events, that have not been explicitly removed.
W.Zwiers
@woozer
Dec 13 2016 10:10
My debug memory snapshot has 38818 LinkedListNode<Akka.Persistence.IPersistentRepresentation> items guess akka persists my message data
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 10:11
if you're using in-memory journal (which is default), it will "persist" every event in linked list
but as akka.persistence uses eventsourcing, it will never remove them
see an example I've attached, I'm optimizing persistance there by creating snapshots every X persisted events. Also when snapshot save is confirmed, I'm removing all events prior to it
W.Zwiers
@woozer
Dec 13 2016 10:14
i'll take a look at it now
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 10:22
@claudiobernasconi
  • for sure you don't need to inject ILogger ;) Akka has build in support for logging - to initialize the logger, just call Context.GetLogger().
  • also I think that Context.ActorSelection("akka.tcp://MonitorActorSystem@127.0.0.1:8091/user/JobMonitor") is a good option. It would be better to send a direct actor ref there (but you'd need to stop using DI for it)
also if messages are dead lettered, it would be great to see what dead letters are logging here
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 10:29
@Horusiath Thank you very much for your feedback. I'll refactor my code to use the built in logger and I'll try to log those dead lettered messages so that I can show them to you. Just one more question for now: Do you consider DI a bad thing using Akka.Net? In our industry we are very used to use DI all over in desktop application and even in web backends. I understand that if I want to pass a IActorRef to the JobCoordinatorActor and the WorkerActor that I cannot use DI anymore.
W.Zwiers
@woozer
Dec 13 2016 10:49
@Horusiath wich version of akka does your sample use? IsRecovering in the UpdateState method is an invalid parameter
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 10:51
Sorry to ask this question, but... is it not possible to log into a file on a local hard drive with the built-in logger? Seems like I have to use NLog or another library to log to the filesystem, right?
W.Zwiers
@woozer
Dec 13 2016 10:53
@claudiobernasconi I use DI to inject my own logger in my actors
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 10:54
@woozer this is exactly what I tid before, but @Horusiath suggest to use the built-in logger support, so I decided to go down this road. I'll give NLog a try then. If its setup correctly I'll get many more features so I won't complain =)
W.Zwiers
@woozer
Dec 13 2016 10:55
True ;) good luck, should work, i had to use our existing logger
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 10:58
@woozer. Thanks. Seems like i currently run into every possible problem. I'll need some luck. After installing the Akka.Logger.NLog NuGet package I get a warning The referenced component Akka.Logger.NLog could not be found. Any other packages work fine.
Arjen Smits
@Danthar
Dec 13 2016 11:02
@claudiobernasconi @woozer dont use DI to inject your custom logger. Use one of the many integrations available for akka.net.
W.Zwiers
@woozer
Dec 13 2016 11:03
Will put that on my backlog
Arjen Smits
@Danthar
Dec 13 2016 11:03
@woozer And even if you have an custom logger, its not that hard to build your own integration for it. Just look at one of the existing integrations
if done right, it should not be more then 10 lines of code
sort of speak :P
W.Zwiers
@woozer
Dec 13 2016 11:04
i'm using a local nuget package for logging, i'll have to look into it
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 11:04
@Danthar thanks. I'll will definitely go down that road. All I want is a log on my filesystem. NLog seems to have problems with my Visual studio solution right now. I'll have to try another one.
Arjen Smits
@Danthar
Dec 13 2016 11:05
@claudiobernasconi I prefer serilog myself because of the ease of configuration.
In regards to your question about using DI and Akka.
To be perfectly honest, i wish we never provided those integrations.
Conceptually if you want to properly support DI in Akka in a way that is idiomatic
it means quite a few DI frameworks simply wont make the cut
and
you are going to clash on an conceptual level with how an Actor framework works
The first place where you notice this mismatch between concepts, is when you find yourself wanting to inject an IActorRef
I still use DI in some places, but when i do, its via the ServiceLocator pattern
and in such a way that im either: not responsible for the lifecycle of the dependency, regardless what my actor does. Or i take full responsibility
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 11:12
@Danthar Seems pretty reasonable to me. I'll try to avoid DI for my Prototype then. What is the preffered way to provide constructor arguments like IActorRef if I do not use DI? Actors are created by the Props mechanism and the call to system.ActorOf() method so how do I provide the actual arguments for the actors being created?
Arjen Smits
@Danthar
Dec 13 2016 11:13
Via the Props object. Its in the docs. But ill write a quick sample. hang on
        Props.Create(() => new MyActor("test", true, 3));
        Props.Create<MyActor>("test", true, 3);
        system.ActorOf<MyActor>();
Its all done via Props. The last line is simply showing off an convenience method for when your actor has no constructor args
the same way im passing an string, bool and int as constructor args, obviously you can use this same way to pass an IActorRef
It works the same with child actors. only then your calling Context.ActorOf(childprops) instead of system.ActorOf()
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 11:17
Are the first and the second line resulting in the same behaviour? I guess I'll use the first one so that I can use tools like ReSharper to find any constructor usages.
Arjen Smits
@Danthar
Dec 13 2016 11:23
Yes they are the same
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 11:31
@Danthar Thank you very much. At the moment I am trying to integrate log4net as my Logger. How do I have to provide the configruation for log4net? Seems like I cannot find any documentation. In the HOCON file? If so, how? Which are the available settings? Sorry for questions like these, it just seems like I am walking backwards since a few days. :-/
Arjen Smits
@Danthar
Dec 13 2016 11:31
The same as any logger integration. As explained here: http://getakka.net/docs/Logging
That documentation is not complete, as in, it does not list every logging integration there is, but its the same for each logger
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 11:44
@woozer Eventsourced.IsRecovering - every PersistentActor (also AtLeastOnceDeliveryActor) have this build in
W.Zwiers
@woozer
Dec 13 2016 11:45
@Horusiath i meant: .With<MessageSent<T>>(sent => Deliver(sent.Recipient.Path, deliveryId => new Delivery<T>(sent.Payload, deliveryId), IsRecovering))
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 11:45
eh, my bad - this parameter is not necessary inside Deliver
it's recognized implicitly
W.Zwiers
@woozer
Dec 13 2016 11:45
perhaps in an early version ;P
Kris Schepers
@schepersk
Dec 13 2016 12:06
hmmm.. I don't like seeing this in my production logs.. Idea's?
2016-12-12 21:16:10.3435|WARN|Akka.Persistence.Journal.ReplayFilter|Invalid replayed event [3] in wrong order from writer [] with PersistenceId [LoonbonVerwerkingAR@7f7af3f0-9aa8-4ac3-ab1f-047b74d2a13d]|
2016-12-12 21:16:10.3435|WARN|Akka.Persistence.Journal.ReplayFilter|Invalid replayed event [2] in wrong order from writer [] with PersistenceId [LoonbonVerwerkingAR@7f7af3f0-9aa8-4ac3-ab1f-047b74d2a13d]|
@Aaronontheweb Well, the issues aren't solved yet, but I followed @Horusiath 's advice and added logging for the deadletter messages. Since then, the system behaved as expected. So we're waiting for te problem to reappear :-)
Kris Schepers
@schepersk
Dec 13 2016 12:11
But if anything else, the cluster is a very fragile thing.. falls apart regularily. Might be load on the underlying infrastructure, or network issues others are experiencing also.
Team members are losing confidence though..
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 12:16
@schepersk what are the reasons of cluster failures?
Kris Schepers
@schepersk
Dec 13 2016 12:22
@Horusiath This is only part of our problems at the moment, but cluster failures seem to be mostly related to load on servers and unexpected node shutdowns.
But I'm more worried about the persistence errors above..
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 12:30
@schepersk this may be a solution for your problem: akkadotnet/akka.net#2374
the possible reason is that your events are replayed out of order from the database, while this order must be maintained
this will be fixed in upcoming release (also the new faster batchin sql journal should be out by then), but for now you can create your custom journal, override query executor's ByPersistenceIdSql property aby adding ORDER BY SequenceNr ASC at the end of it.
Kris Schepers
@schepersk
Dec 13 2016 12:37
Hmm, seems like a big fat bug then.. Might also explain why I'm seeing lots of these out of order errors when restarting a node with cluster sharing..
So, in many cases all goes well, and sometimes they are out of order.. weird
Custom journal, sure, but that hit me in the face more then once :-)
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 12:39
I've proposed temporary solution, until the next release kicks in ;P
Kris Schepers
@schepersk
Dec 13 2016 12:40
@Horusiath Don't understand me wrong eh, your help is much appreciated!
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 12:41
don't worry ;)
Kris Schepers
@schepersk
Dec 13 2016 13:03
:blush:
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 13:03

Allright guys I know I ask many questions, but I will contribute to the community and write a few blog posts for other beginners afterwards. Since 2 hours I try to implement log4net as my logger. my configuration looks like: loggers = ["Akka.Logger.log4net.Log4NetLogger, Akka.Logger.log4net"] and the section of my log4net configuration looke like:

<log4net>
<root>
<level value="DEBUG" />
<appender-ref ref="Prototype"/>
</root>
<appender name="Prototype" type="log4net.Appender.RollingFileAppender" >
<file value="D:\\temp\\Prototype.log" />
<appendToFile value="true" />
<lockingModel type="log4net.Appender.FileAppender+MinimalLock" />
<rollingStyle value="Date" />
<maximumFileSize value="2048KB" />
<datePattern value="yyyyMMdd" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%date [%thread] %level %logger - %message%newline" />
</layout>
</appender>
</log4net>

Problem: No log file written, no error, no clue where to look at.

Kris Schepers
@schepersk
Dec 13 2016 13:03
any idea when this new release will be released? ;-)
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 13:08
I checked that the logger is found. If I change the name of the Logger-Class in the HOCON config it throws an exception as expected. So the configuration looks ok. If I remove the logger configuration in the HOCON everything loggs to the console as expected as well. But I cannot log to a file which would be necessary to provide detail for my problem with the dead lettered messages.
Kris Schepers
@schepersk
Dec 13 2016 13:20
@claudiobernasconi Did you specify the loglevel in HOCON?
akka {
loglevel = DEBUG
loggers = ["Akka.Logger.NLog.NLogLogger, Akka.Logger.NLog"]
}
Can you log from your own code to the log file?
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 13:22
@schepersk this is my App.config containing both the Akka.net HOCON configuration and the Log4net configuration sections: https://gist.github.com/claudiobernasconi/4acbe62a7bd290e41efd1cabc1d09d4c
Any custom log messages appear on the console, if I remove the "loggers" entry of the HOCON configuration but do not show up in the file (because there won't be a file written to disc).
Kris Schepers
@schepersk
Dec 13 2016 13:33
From what I remember when configuring logging (using NLog) is that when your messages still appear in the console when you've not defined a console "target", the "custom" logger isn't being used..
Have you imported all the required nuget packages?
Claudio Bernasconi
@claudiobernasconi
Dec 13 2016 13:46
@schepersk Yes I have installed Akka.Logger.log4net which brings all the dependencies with it. The log messages disappear in the console which indicates that they go "somewhere", but the expected log file won't be written to the disc.
W.Zwiers
@woozer
Dec 13 2016 13:54
@Horusiath makeing progress, but i have an error when i start my service, akka can;t access my snapsot file, it is in use according to the logging.... clues?
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 13:56
@woozer have you configured snapshot store?
W.Zwiers
@woozer
Dec 13 2016 13:57
no, i left the config default, it is trying to write to my debug directory
i see some snapshot files there tough
e.g. snapshot-AtLeastOnce-599-636172336353221307
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 13:59
maybe you've run a process several times?
W.Zwiers
@woozer
Dec 13 2016 14:00
no, it it running in debug mode, no multiple instances
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 14:01
does error says it cannot find a snapshot file, or it can't read from it/
W.Zwiers
@woozer
Dec 13 2016 14:01
can't read, it is in use by another procces,........ windows
some thread issue perhaps?
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 14:03
it's strange
W.Zwiers
@woozer
Dec 13 2016 14:06
does not crash , service continuous after error
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 14:08
if I remember correctly there was some sort of a fallback mechanism, that if the latest snapshot won't be recovered, it will try a previous one - but that's also a fragile case, as if you removed some events from a journal just before the latests confirmed snapshot, you may miss some data
In general I wouldn't advise using memory journal or local snapshot store for other things than dev/testing
Bartosz Sypytkowski
@Horusiath
Dec 13 2016 14:19
I can think of 1 potential race that can happen, although I'd need to check if that's the case:
  1. Actor A sends a SaveSnapshot request
  2. Snapshot store receives a SaveSnapshot request
  3. Snapshot store creates a new file and starts writing snapshot data to it
  4. In the meantime Actor A restarts
  5. Actor a starts recovery procedure by sending LoadSnapshot request
  6. Snapshot store receives LoadSnapshot request
  7. At this point writing data to disk has not yet ended. Yet a snapshot file is already created, and data is being written to it. But read/write may happens in parallel.
  8. Snapshot store sees that the latest snapshot is the one he currently writes to. But it's not checking if writing has ended. It tries to read a file, that is locked for write. This is what can cause failure.~~
I'm not 100% sure, the issue works that way. But if it is, a change in the impl would be needed. A most reliable solution would be simply writing into a temporary file (i.e. add a .tmp to the end of the snapshot file name) and replace it into valid path once the write has been confirmed. Whelp: it already works that way xD
W.Zwiers
@woozer
Dec 13 2016 14:26
Thanks for looking, i'm debugging to see if i can find the error
Sean Gilliam
@sean-gilliam
Dec 13 2016 18:58
@claudiobernasconi Does the application have permissions to write to the directory specified in the config (D:\temp) ?
if log4net can't write to the file, then it will drop the message
Sean Farrow
@SeanFarrow
Dec 13 2016 20:26
@all, a quick pipeto query, if I the result of a task back to Self, does the piped to message go to the front of the mail box or can other sent message be processed before the task result?
Aaron Stannard
@Aaronontheweb
Dec 13 2016 20:54
other messages can be processed @SeanFarrow
big part of the idea behind PipeTo is that the actor can be free to process other types of work while it waits for a response to come back
the result of the PipeTo operation is queued onto the back of the actor's mailbox
Andrew Young
@ayoung
Dec 13 2016 21:32
@Aaronontheweb can you send me some information about how you guys are testing performance? specifically how you ensure that changes made to code isn't negatively impacting performance.
Aaron Stannard
@Aaronontheweb
Dec 13 2016 21:33
@ayoung we use NBench for this https://github.com/petabridge/NBench
Andrew Young
@ayoung
Dec 13 2016 21:33
awesome. thanks.
Aaron Stannard
@Aaronontheweb
Dec 13 2016 21:33
basically we set a floor for the performance and generate a test failure if we miss the performance target
Andrew Young
@ayoung
Dec 13 2016 21:34
how do you come up with that initial target value?
Aaron Stannard
@Aaronontheweb
Dec 13 2016 21:34
observed how fast it ran and set a baseline I guess
NBench produces output reports that we save as artifacts on our build server
you can see all of the markdown files with raw data, trace events, averages, and per-second measurements
so you can run NBench without an assertion
which is how most of our performance specs run
and use that to gather a rough baseline
Andrew Young
@ayoung
Dec 13 2016 21:37
let me play around with that a bit.
Andrew Young
@ayoung
Dec 13 2016 21:45
@Aaronontheweb the target values must be tied to a specific environment setup right?
Aaron Stannard
@Aaronontheweb
Dec 13 2016 21:46
yeah
ours are calibrated to our build server
Andrew Young
@ayoung
Dec 13 2016 21:46
k
Aaron Stannard
@Aaronontheweb
Dec 13 2016 21:46
you'll get different results on machines that have different workloads and hardware
Andrew Young
@ayoung
Dec 13 2016 21:46
do you ever get false positives?
Aaron Stannard
@Aaronontheweb
Dec 13 2016 21:46
eventually we're going to put a SaaS service behind NBench that factors in hardware profiles and relative change
yeah, we do
Andrew Young
@ayoung
Dec 13 2016 21:47
how do you deal with those
Aaron Stannard
@Aaronontheweb
Dec 13 2016 21:47
found an NBench bug that's responsible for generating really low benchmark values
in some edge cases
(long story)
the best way to deal with false positives is to try to increase the number of times the benchmark is run
so you get an average
benchmarking concurrent code is difficult
Andrew Young
@ayoung
Dec 13 2016 21:47
well, i guess i'm saying that even between multiple runs, you might get slightly different results.
i see.
Aaron Stannard
@Aaronontheweb
Dec 13 2016 21:48
because you can't completely eliminate some of the unpredictability of the OS
you can specify on the NBench attributes how many times you want it to run
I recommend using a prime number in the double digits for benchmarks that are fairly simple
single digits for benchmarks that run for a long period of time
basically you want to use the law of averages to help smooth out the graph over time
the NBench bug I found this week that was giving us false positives basically screwed up the duration of the benchmark for RunMode = Throughput benchmarks
the code that estimates how many iterations of a benchmark it takes to fill one second, roughly, was way underestimating things
because of JIT overhead and other issues
so it never set the threshold high enough to be measured properly
going to fix that soon
basically need to discard the first estimate due to JIT
and then average 2-3 more
before I get the final value we'll use for the actual runs that affect the assertions
Andrew Young
@ayoung
Dec 13 2016 21:51
makes sense.
how do you determine what types of routines you should be benching? or what tests would provide the most value? so like in a line-of-business application, would you make a test to follow a message in the creation of an entity?
Aaron Stannard
@Aaronontheweb
Dec 13 2016 21:53
I've done a few different types
over the past year since I wrote NBench
benchmarking a framework is different than benchmarking an application
Andrew Young
@ayoung
Dec 13 2016 21:53
right.
Aaron Stannard
@Aaronontheweb
Dec 13 2016 21:53
but some of the same ideas apply
for me, I always want to have benchmarks to safeguard performance-critical areas
like Akka.Remote throughput
because if you don't measure that, you don't control it
that's really a type of integration test
because Akka.Remote touches a ton of different components
including our socket library
our dispatcher and mailbox systems
our actors
that's a lot closer to what you would benchmark in an application
end-to-end client-to-server response time, for instance
it measures all of the accumulated changes made to the underlying system
Andrew Young
@ayoung
Dec 13 2016 21:55
ok got it. so basically we want to set up a staging environment with all the components
then run the performance testing there
Aaron Stannard
@Aaronontheweb
Dec 13 2016 21:56
if a significant performance drop happens there, alarm bells should go off before that pull request is ever merged
yeah, exactly
use a consistent environment for benchmarking
one of our problems is we use Azure auto-scaling agents for our builds
those throw false positives all the time
because even the same generation of hardware on Azure has lots of subtle changes
and those will show up in our results
Andrew Young
@ayoung
Dec 13 2016 21:57
so do you just tell it to run the tests again?
Aaron Stannard
@Aaronontheweb
Dec 13 2016 21:57
so we'll likely change that in the future to use a piece of dedicated hardware for our official benchmarks
yeah we do
we've had issues before where we got consistent benchmark failures
one recently was that a change that was made to how Ask works resulted in a 100s of MB increase in log messages
and the resulting GC (I assume) from that crushed our Akka.Streams benchmarks
that's NBench doing its job right there
wouldn't have caught the problem otherwise
Andrew Young
@ayoung
Dec 13 2016 21:59
yep. we want to get our test suite to a point where we can catch things like that.
from my talk about it at .NET Fringe this year
explains the rationale, best practices, etc
there's a lot more here in this area I want to do to make it even better
but most .NET developers (or really, any developers) aren't even running performance tests of any kind, let alone doing it on every pull request as part of a continuous integration workflow
so by that standard this is lightyears ahead of what most are doing
Andrew Young
@ayoung
Dec 13 2016 22:09
thanks.