These are chat archives for exceptionless/Discuss

21st
Oct 2015
Blake Niemyjski
@niemyjski
Oct 21 2015 02:27 UTC
ES6 Overview in 350 Bullet Points
dang..
I want that now
should have wrote our spa app in typescript
lol
Chris Proud
@chrisproud
Oct 21 2015 19:19 UTC
just tried to replicate exceptionless/Exceptionless.Net#62 on a warm spare (read exactly the same as production) but I wasn't able too repo. I hit the server with 300 threads per second to create 404 exceptions, then recycled and even killed the process to force it to restart. Worked fine!
Blake Niemyjski
@niemyjski
Oct 21 2015 19:20 UTC
:(
I’d check the event logs on prod and enable client logging to see what is throwing the exception
maybe it will give some insight
Chris Proud
@chrisproud
Oct 21 2015 19:21 UTC
the recent exceptions are inthe memory dump- i'll pm it over now
Chris Proud
@chrisproud
Oct 21 2015 19:28 UTC

the client logging doesn't say much - not sure how that could help

DefaultEventQueue: Sent 50 events to "http://....";

Blake Niemyjski
@niemyjski
Oct 21 2015 19:29 UTC
but when an exception was thrown.. did it include it in the log
Chris Proud
@chrisproud
Oct 21 2015 19:29 UTC
not the 404
Eric J. Smith
@ejsmith
Oct 21 2015 19:30 UTC
What version of the app are you running?
We recently did a round of perf updates and some of them involved tight loops with queue maintenance.
Chris Proud
@chrisproud
Oct 21 2015 19:31 UTC
client lib is 3.1.1405
Blake Niemyjski
@niemyjski
Oct 21 2015 19:32 UTC
his app is crashing
web app not our app
Eric J. Smith
@ejsmith
Oct 21 2015 19:32 UTC
ahh
Blake Niemyjski
@niemyjski
Oct 21 2015 19:34 UTC
@chrisproud what was the server memory usage at?
Chris Proud
@chrisproud
Oct 21 2015 19:37 UTC
at the point it went nuts yesterday, 71% usage out of 96G
Eric J. Smith
@ejsmith
Oct 21 2015 19:40 UTC
so you are thinking that our client is causing this?
Blake Niemyjski
@niemyjski
Oct 21 2015 19:40 UTC
so there def is some kind of multithreaded issue there.. but if you look at the report.. it says this:
This thread is waiting for .net garbage collection to finish. Thread 67 triggered the garbage collection.The gargage collector thread wont start doing its work till the time the threads which have pre-emptive GC disabled have finished executing. The following threads have pre-emptive GC disabled 85,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,124,125,126,127,129,130,131,132,133,134,135,136,137,138,139,140,141,142,144,145,146,147,149,150,151,152,153,157,
Function
[[HelperMethodFrame]]
System.Diagnostics.NtProcessInfoHelper.GetProcessInfos()+51
System.Diagnostics.Process.EnsureState(State)+f5
System.Diagnostics.Process.get_PrivateMemorySize64()+15
NewRelic.Agent.Core.Samplers.MemorySampler+ProcessMemoryReporter.GetMemory()+37
NewRelic.Agent.Core.Samplers.MemorySampler.Run(System.Object, Boolean)+47
NewRelic.Agent.Core.Samplers.Sampler.Callback()+15
NewRelic.Agent.Core.Threading.ScheduledExecutor+Future.DoWorkOnAgentWorkQueue(System.Object)+60
NewRelic.Agent.Core.Threading.ProcessingQueue`1[[System.__Canon, mscorlib]].ProcessQueueWork()+1a0
System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+232
System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+9
System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+4f
System.Threading.ThreadHelper.ThreadStart()+4d
[[GCFrame]]
[[DebuggerU2MCatchHandlerFrame]]
[[ContextTransitionFrame]]
[[DebuggerU2MCatchHandlerFrame]]
new relic is getting the process info which I’m guessing is really expensive and it’s triggering a GC
so from that text, .net is putting this thread to sleep while waiting for the GC to start
clr!CLREventBase::WaitEx+70
clr!SVR::gc_heap::wait_for_gc_done+55
clr!SVR::GCHeap::GarbageCollectGeneration+f5
clr!SVR::gc_heap::try_allocate_more_space+2be26d
clr!SVR::gc_heap::allocate_more_space+40
clr!SVR::gc_heap::allocate_large_object+96
clr!SVR::GCHeap::Alloc+32e40f
clr!JIT_NewArr1+46a
[[HelperMethodFrame]]
System.Diagnostics.NtProcessInfoHelper.GetProcessInfos()+51
System.Diagnostics.Process.EnsureState(State)+f5
seems like all those threads are iterating our collection
Function
System.Collections.Generic.Dictionary2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].FindEntry(System.__Canon)+cd System.Collections.Generic.Dictionary2[[System.Canon, mscorlib],[System.Canon, mscorlib]].ContainsKey(System.__Canon)+9
Exceptionless.ExceptionlessConfigurationExtensions.GetInstallId(Exceptionless.ExceptionlessConfiguration)+86
Exceptionless.Plugins.Default.EnvironmentInfoPlugin.Run(Exceptionless.Plugins.EventPluginContext)+cd
Exceptionless.Plugins.EventPluginManager.Run(Exceptionless.Plugins.EventPluginContext)+14b
Exceptionless.ExceptionlessClient.SubmitEvent(Exceptionless.Models.Event, Exceptionless.Plugins.ContextData)+238
System.Web.HttpApplication.RaiseOnError()+87
System.Web.HttpApplication.RecordError(System.Exception)+381
System.Web.HttpApplication.RecordError(System.Exception)+2c9
Eric J. Smith
@ejsmith
Oct 21 2015 19:43 UTC
System.Canon… what the heck is that?
Blake Niemyjski
@niemyjski
Oct 21 2015 19:45 UTC
interesting looking through this log.. looks like NewRelic is really causing a lot of GC hits..
@ejsmith notsure
ou should not read anything in the method argument type name. System.Canon is an implementation detail related to the way generics are implemented in the CLR. I don't know the exact usage for it but strongly suspect it is used by Ngen.exe, the optimization tool in .NET that pre-jits assemblies. Generics are a problem in pre-jitting since the concrete type is instantiated at runtime. You'll get multiple copies of a method that takes an argument type of a type parameter. Just one method that handles any reference type, additional methods for each value type, if any. System.Canon can be the substitute type that's the place holder for any reference type, allowing Ngen.exe to prejit that method even though it cannot guess what actual type will be used at runtime. Something like that.
Eric J. Smith
@ejsmith
Oct 21 2015 19:48 UTC
interesting
Blake Niemyjski
@niemyjski
Oct 21 2015 19:49 UTC
@chrisproud can you remove or disable newrelic and see if this issue goes away?
Eric J. Smith
@ejsmith
Oct 21 2015 19:49 UTC
@chrisproud yeah, need to try to isolate the problem.
Blake Niemyjski
@niemyjski
Oct 21 2015 19:49 UTC
if it does go away then at least I know that I need to configure new relic to get this to work and I can work with their support
Eric J. Smith
@ejsmith
Oct 21 2015 19:49 UTC
thinking it might be newrelic and exceptionless combo?
Blake Niemyjski
@niemyjski
Oct 21 2015 19:50 UTC
maybe.. there were a ton of gc frames in the stack traces and it was always new relic calling these methods like get process info, get file info things like that
thinking all of those are really expensive and he said his app startup was causing this issue.. more than likely his app is throwing exceptions and new relic is taking some hits on startup, collecting meta data, etc (us as well handling these errors)
Chris Proud
@chrisproud
Oct 21 2015 19:53 UTC
i'll see if I can disable new relic for the offending app - we use NR for metrics on lots of apps so I can't really disable it for all.
Eric J. Smith
@ejsmith
Oct 21 2015 19:54 UTC
so if you disable exceptionless you don’t have the issue?
Chris Proud
@chrisproud
Oct 21 2015 19:56 UTC
I've only had the issue once so far (but it was pretty bad all but killed the server) - there are 45 exceptionless threads that the debug tool marks as suspect
Eric J. Smith
@ejsmith
Oct 21 2015 19:56 UTC
45 exceptionless threads?
why would we ever have a bunch of threads @niemyjski
Blake Niemyjski
@niemyjski
Oct 21 2015 19:57 UTC
We shouldn’t ever launch any new threads. would only happen if asp.net had an error on every request thread
but they were all paused because the GC was collected and it was waiting for some of ours to finish before resuming
I wrote a test that calls this plugin 10000 times
and never once did it block
I’m looking at all of our code
would be nice if we had a concurrent dictionary
might help but I still don’t know what’s causing it other than a call to the GC
Blake Niemyjski
@niemyjski
Oct 21 2015 20:02 UTC
[Fact]
public void EnvironmentInfo_CanRunInParallel() {
var client = new ExceptionlessClient();
var ev = new Event { Type = Event.KnownTypes.SessionStart };
var plugin = new EnvironmentInfoPlugin();
        Parallel.For(0, 10000, i => {
            if (i % 20 == 0)
                Task.Factory.StartNew(GC.Collect);

            var context = new EventPluginContext(client, ev);
            plugin.Run(context);
            Assert.Equal(1, context.Event.Data.Count);
            Assert.NotNull(context.Event.Data[Event.KnownDataKeys.EnvironmentInfo]);
        });
    }
passes
We are calling ProcessQueueAsync().Wait() which kicks off a new thread with Task.Factory.StartNew()
that’s only triggered on app exit
Blake Niemyjski
@niemyjski
Oct 21 2015 20:08 UTC
hmm
Chris Proud
@chrisproud
Oct 21 2015 20:11 UTC
Hey guys, want to put this to one side until it happens again?
i can also follow up with new relic if you think there might be a conflict of sorts?
Blake Niemyjski
@niemyjski
Oct 21 2015 20:15 UTC
yeah, I’ll keep an eye out for this as well.
but if it happens again can you check see if turning off new relic makes it go away
also
just forgot what I was going to say
oh
can you log onto the server and go through the event viewer application logs for yesterday or when it happened and see what kind of errors were being logged
this might be a huge help
just found a good foundatio bug
System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
at System.Collections.Concurrent.ConcurrentDictionary`2.get_Item(System.Collections.Concurrent.TKey key) at offset 16
at Foundatio.Caching.InMemoryCacheClient.<DoMaintenanceAsync>d__37.MoveNext() at offset 77
        foreach (string key in _memory.Keys) {
            var expiresAt = _memory[key].ExpiresAt;
Chris Proud
@chrisproud
Oct 21 2015 20:21 UTC
just checked the event log - only event in system is the process being recycled. In the app log there is the event for the profiler loading successfully then timeouts from newrelic and perfcounters. no exceptions logged.
lets park this - thanks for all your time, really appreciate it. Get in touch if you need anything.
Blake Niemyjski
@niemyjski
Oct 21 2015 20:26 UTC
no problem
if you see this again
please jump on here and message me
Chris Proud
@chrisproud
Oct 21 2015 20:26 UTC
shall I close the issue in github?
Blake Niemyjski
@niemyjski
Oct 21 2015 20:26 UTC
also feel free to shoot me an email bniemyjski@gmail.com if it’s after hours
and I’ll try my best to jump on a remote session and debug it live with you
Chris Proud
@chrisproud
Oct 21 2015 20:27 UTC
will do - thanks
Blake Niemyjski
@niemyjski
Oct 21 2015 20:27 UTC
sure, lets reopen it if it happens again, or someone else sees it. crappy thing is I show where it’s seems like it’s stalled but we have a serious amount of multi threaded code in exceptionless and many deployments and never saw this.. so it has me perplexed
If I get some free time this weekend. I’ll try configuring an app with new relic and see if I can reproduce
We hope you continue to use exceptionless and see benefit from it.
really really sucks that our module might of had soemthing to do with your prod app crashing..
Chris Proud
@chrisproud
Oct 21 2015 20:33 UTC
we will be - exceptionless is a great tool so no plans to change.
Blake Niemyjski
@niemyjski
Oct 21 2015 20:40 UTC
exceptionless/Foundatio@2372dcc
:)
@chrisproud :+1:
@ejsmith check out my last commit..
@chrisproud I just added my test case for that.. I’ll add to it if I find anything else.
Eric J. Smith
@ejsmith
Oct 21 2015 20:47 UTC
cool