@Avasil Here are my notes! https://gist.github.com/c0e731eefd0cc8c82bbbfd9197d08474 Again, massive credit to @rossabaker for doing all the real legwork on figuring this stuff out and benchmarking it. I basically just distilled and came up with some requirements.
So the raw mechanics of how you get the trace boils down to
new Throwable().getStackTrace(), which will give you an
Array[StackTraceElement]. That's where the easy part ends. Doing this is very expensive, and you need to do it eagerly inside the definition of
async (at a minimum). Naively implemented, this would result in a lot of overhead every time you call these functions.
So the idea is to not do it naively. If you call
getClass on the
Function1 passed to
map, you're going to get something which is reflective of the definition site of the function passed to the method in question. Note that there are some caveats with this:
val f: Int => Boolean = _ % 2 == 0 ioa.map(f) .map(f)
If you trace that with this technique, both
map calls will have the same "call site". I really think that's fine though, and arguably even more useful than the alternative. Anyway…
You need to figure out which stack frame entry actually represents the tru call site, and this is where things get very tricky with Cats Effect because the
f in question may be threaded through some other methods, such as monad transformers, libraries like fs2, etc. My spec suggests applying some heuristics to the name of the class you get from the
Function1 to take an educated guess, and then go with first best fit. Note that these heuristics can be somewhat expensive at runtime if you need them to be, because you're only doing it once!
Class as a cache key in a global (static) cache. Note that the size of this cache is bounded by the number of distinct call sites in the program, which is not really that many when you think about it. Cache misses are expensive, but they only happen once. Cache lookups are very, very fast. Note that
ConcurrentHashMap is heavily read-optimized. The spec references a "slug" mode to tracing, which would basically disable caching entirely. The reason to disable caching is so that you can capture more than one stack frame, which would allow us to give really robust traces when people really need to dig into things. Like we can say things like, "the last few constructors which generated this
IO were this
map, which had this stack trace, and this
flatMap, which had this other stack trace, etc". I imagine this being printed as like a nested bullet list, but hopefully you see where I'm going with this. Slug mode would have a lot of overhead and obviously would only be used when debugging in a dev environment, but that's still really useful! And giving these kinds of robust traces would eliminate the information loss that we would otherwise suffer from with just a single stack frame per call site.
Trace information should be stored in the
IO constructors themselves, which avoids the need to maintain a separate data structure representing a "backtrace". In a sense, it's abusing
ArrayStack to represent the trace indirectly. This is cool, but unfortunately
Map fusion completely defeats it. I note this in my spec, and I have hypothesized that map fusion in
IO is entirely pointless in practice and probably doesn't result in any measurable performance gains. Turning it off in
IO and then running a sophisticated benchmark suite (like fs2's or Monix's) on top of
IO without map fusion, and then again with, should be sufficient evidence to decide. If we can't remove map fusion, then (annoyingly!) we either need to have a separate nested stack structure inside of the
IOMap node, or we need to only trace the top-most fused
map. Either is probably okay, but not as nice as the unfused alternative. Needs measuring.
@Avasil The biggest problem with all this is configuration. You can't just thread its configuration through the runloop because some of these calls happen before the
IO is actually running. (for example,
val ioa = pure(42).flatMap(f), the
flatMap call site must run before the
IO starts executing) So that kicks out some things, unfortunately. ZIO does this nice thing where they have like a
notrace function or something (I can't remember what it's called) which presumably threads through the run loop, but there's no possible way it can disable tracing for
val examples like this one.
So my spec proposes a two-pronged solution: runloop-threaded configuration, with global defaults set by a system property. So you can still disable tracing entirely if you need to, but the default way you interact with it is via the runloop based configuration (which has nice lexical properties and a better API).
@Avasil Oh, one final bit of trickiness: you can't call
getClass on a thunked value in Scala and expect to get the class of the thunk, which is what you need. For example, to trace
>>. So you're probably going to need to implement a tiny helper in Java that has the class metadata to trick scalac into passing the
thunk along without evaluation. In other words, what scalac does when it sees the following:
def foo(s: => String) = bar(s) def bar(s: => String) = ...
bar gets the raw thunk that was passed to
foo, without re-wrapping. If you can define
bar in Java, then you can take that thunk (which will be of type
scala.Function0) and call
getClass on it without forcing. You don't have that option in Scala, since calling
s.getClass will force the thunk and give you the
Class of its contents.
is Simon Marlow's note on maskUninterruptable: every use should be viewed with extreme suspicion.
well, we can say: every use of uncancelable that ignores restore should be ....
maskUninterruptable still has
restore as well, and Simon suggests it be viewed with unconditional suspicion.
The "no special cases" thing is very appealing, and in a sense
cancelable would create an unbounded number of special cases. I strongly suspect though that the only people who would use it would be authors of things like
Deferred, and most code would just ignore it. That would certainly be the best practice.
Refbut I'm not sure how I could make the wait part. Is there is that look like that which exists or is there good exemple that can help me to make this?
F[A]that would be potentially problematic
Deferredyou actually already made one !