@Avasil Here are my notes! https://gist.github.com/c0e731eefd0cc8c82bbbfd9197d08474 Again, massive credit to @rossabaker for doing all the real legwork on figuring this stuff out and benchmarking it. I basically just distilled and came up with some requirements.
So the raw mechanics of how you get the trace boils down to new Throwable().getStackTrace()
, which will give you an Array[StackTraceElement]
. That's where the easy part ends. Doing this is very expensive, and you need to do it eagerly inside the definition of flatMap
, map
, delay
, and async
(at a minimum). Naively implemented, this would result in a lot of overhead every time you call these functions.
So the idea is to not do it naively. If you call getClass
on the Function1
passed to flatMap
/map
, you're going to get something which is reflective of the definition site of the function passed to the method in question. Note that there are some caveats with this:
val f: Int => Boolean = _ % 2 == 0
ioa.map(f)
.map(f)
If you trace that with this technique, both map
calls will have the same "call site". I really think that's fine though, and arguably even more useful than the alternative. Anyway…
You need to figure out which stack frame entry actually represents the tru call site, and this is where things get very tricky with Cats Effect because the f
in question may be threaded through some other methods, such as monad transformers, libraries like fs2, etc. My spec suggests applying some heuristics to the name of the class you get from the Function1
to take an educated guess, and then go with first best fit. Note that these heuristics can be somewhat expensive at runtime if you need them to be, because you're only doing it once!
Use the Class
as a cache key in a global (static) cache. Note that the size of this cache is bounded by the number of distinct call sites in the program, which is not really that many when you think about it. Cache misses are expensive, but they only happen once. Cache lookups are very, very fast. Note that ConcurrentHashMap
is heavily read-optimized. The spec references a "slug" mode to tracing, which would basically disable caching entirely. The reason to disable caching is so that you can capture more than one stack frame, which would allow us to give really robust traces when people really need to dig into things. Like we can say things like, "the last few constructors which generated this IO
were this map
, which had this stack trace, and this flatMap
, which had this other stack trace, etc". I imagine this being printed as like a nested bullet list, but hopefully you see where I'm going with this. Slug mode would have a lot of overhead and obviously would only be used when debugging in a dev environment, but that's still really useful! And giving these kinds of robust traces would eliminate the information loss that we would otherwise suffer from with just a single stack frame per call site.
Trace information should be stored in the IO
constructors themselves, which avoids the need to maintain a separate data structure representing a "backtrace". In a sense, it's abusing ArrayStack
to represent the trace indirectly. This is cool, but unfortunately Map
fusion completely defeats it. I note this in my spec, and I have hypothesized that map fusion in IO
is entirely pointless in practice and probably doesn't result in any measurable performance gains. Turning it off in IO
and then running a sophisticated benchmark suite (like fs2's or Monix's) on top of IO
without map fusion, and then again with, should be sufficient evidence to decide. If we can't remove map fusion, then (annoyingly!) we either need to have a separate nested stack structure inside of the IOMap
node, or we need to only trace the top-most fused map
. Either is probably okay, but not as nice as the unfused alternative. Needs measuring.
@Avasil The biggest problem with all this is configuration. You can't just thread its configuration through the runloop because some of these calls happen before the IO
is actually running. (for example, val ioa = pure(42).flatMap(f)
, the flatMap
call site must run before the IO
starts executing) So that kicks out some things, unfortunately. ZIO does this nice thing where they have like a notrace
function or something (I can't remember what it's called) which presumably threads through the run loop, but there's no possible way it can disable tracing for val
examples like this one.
So my spec proposes a two-pronged solution: runloop-threaded configuration, with global defaults set by a system property. So you can still disable tracing entirely if you need to, but the default way you interact with it is via the runloop based configuration (which has nice lexical properties and a better API).
@Avasil Oh, one final bit of trickiness: you can't call getClass
on a thunked value in Scala and expect to get the class of the thunk, which is what you need. For example, to trace delay
or >>
. So you're probably going to need to implement a tiny helper in Java that has the class metadata to trick scalac into passing the thunk
along without evaluation. In other words, what scalac does when it sees the following:
def foo(s: => String) = bar(s)
def bar(s: => String) = ...
bar
gets the raw thunk that was passed to foo
, without re-wrapping. If you can define bar
in Java, then you can take that thunk (which will be of type scala.Function0
) and call getClass
on it without forcing. You don't have that option in Scala, since calling s.getClass
will force the thunk and give you the Class
of its contents.
@SystemFw
is Simon Marlow's note on maskUninterruptable: every use should be viewed with extreme suspicion.
well, we can say: every use of uncancelable that ignores restore should be ....
Fair, though maskUninterruptable
still has restore
as well, and Simon suggests it be viewed with unconditional suspicion.
The "no special cases" thing is very appealing, and in a sense cancelable
would create an unbounded number of special cases. I strongly suspect though that the only people who would use it would be authors of things like Deferred
, and most code would just ignore it. That would certainly be the best practice.
Ref
but I'm not sure how I could make the wait part. Is there is that look like that which exists or is there good exemple that can help me to make this?
F[A]
that would be potentially problematic
Ref
+ Deferred
you actually already made one !
Bracket
instance
Bracket
for any MonadError