Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 19:14

    avi-stripe on develop

    Generated new Bazel BUILD files… (compare)

  • Jan 31 2019 19:14
    avi-stripe closed #322
  • Jan 30 2019 23:36
    areese-stripe review_requested #322
  • Jan 30 2019 23:36
    areese-stripe opened #322
  • Jan 30 2019 23:33

    areese-stripe on areese_bazel_update

    Generated new Bazel BUILD files… (compare)

  • Jan 30 2019 23:33
    areese-stripe added as member
  • Jan 27 2019 02:42

    avibryant on distributeconfig

    wip (compare)

  • Jan 23 2019 19:40

    avi-stripe on develop

    Fix compute bug with batched ob… (compare)

  • Jan 23 2019 19:40
    avi-stripe closed #321
  • Jan 23 2019 19:34
    mio-stripe labeled #321
  • Jan 22 2019 18:24
    mg-stripe added as member
  • Jan 19 2019 11:26
    mg-stripe removed as member
  • Jan 17 2019 20:02

    avi-stripe on develop

    stack safe Packer (#273) (compare)

  • Jan 17 2019 20:02
    avi-stripe closed #273
  • Jan 17 2019 20:01
    avi-stripe synchronize #321
  • Jan 17 2019 20:01

    avi-stripe on targettest

    fix sbcbenchmark (compare)

  • Jan 17 2019 19:57
    avi-stripe assigned #321
  • Jan 17 2019 19:57
    avi-stripe opened #321
  • Jan 17 2019 18:58

    avi-stripe on targettest

    fix bug with batchBits output o… (compare)

  • Jan 14 2019 22:36

    avi-stripe on targettest

    failing test for fit gamma targ… (compare)

aikn
@aikn
No problem @avibryant 3p is good too. Thanks
Avi Bryant
@avibryant
@aikn ok wonderful. 3pm tomorrow at that starbucks.
Andrew Valencik
@valencik
I'd like to use Rainier on Scala 2.13, last I checked I think tut was a holdback. Should I just wait for Rainier 0.3? Should I try and help migrate Rainier from tut to mdoc? It looks like 0.3 changes a lot of the docs.
Andrew Valencik
@valencik
Actually it looks like evilplot is another issue. I think I'll open a tracking issue
Darren Wilkinson
@darrenjw
Happy New Year to all Rainier folks! I decided to end the year by seeing if I could figure out how to use the compute graph. I made some progress, but I'm pretty confused. I have a few questions, but to start off, I'm getting incorrect (and inconsistent) gradients for the first example that I've tried. It's probably just my confusion, but I'm not sure.
import com.stripe.rainier.compute._
// Create a function of three variables
// Example from my APTS notes (p.42 of main notes):
// https://www.staff.ncl.ac.uk/d.j.wilkinson/teaching/apts-sc/
// which is in fact from Nocedal and Wright (2006)...
val x0 = Real.variable()
val x1 = Real.variable()
val x2 = Real.variable()
val y = ( x0*x1*(x2.sin) + (x0*x1).exp )/x2
println(y)
// evaluate using an evaluator (slow)
val eval = new Evaluator(Map(x0 -> 1.0, x1 -> 2.0, x2 -> math.Pi/2.0))
val ey = eval.toDouble(y)
println(ey)
// compile the function for fast evaluation
val cy = Compiler.default.compile(List(x0, x1, x2), y)
val eyc = cy(Array(1.0, 2.0, math.Pi/2.0)) // fast
println(eyc)
// gradients
println(y.gradient.map(eval.toDouble(_))) // WRONG?! BUG?! *******
// compiled gradients
val cg = Compiler.withGradient("y", y, List(x0, x1, x2))
// have gradient functions, but not actually compiled?!
val cg0 = cg.head // function
val cgt = cg.tail // gradients
println(eval.toDouble(cg0._2)) // slow evaluation?
println(cgt.map(e => eval.toDouble(e._2))) // slow evaluation (but correct)?
// now compile the gradient functions?
val cg0c = Compiler.default.compile(List(x0, x1, x2), cg0._2)
println(cg0c(Array(1.0, 2.0, math.Pi/2.0))) // fast
val cgtc = cgt.map(e => Compiler.default.compile(List(x0, x1, x2), e._2))
println(cgtc.map(_(Array(1.0, 2.0, math.Pi/2.0)))) // fast (and correct)
The issue is that if I use .gradient to get the gradient vector of my function of three variables, the first two gradients seem to be switched. But if I compile the gradients, using Compiler.withGradient, they seem to be correct. The example is easy enough to do by hand, but it's actually a well-known example that I already have notes on, if anyone is interested.
The above code is for the latest version on GitHub. For 0.2.3, just replace Real.variable() with new Variable, and the behaviour is exactly the same.
Avi Bryant
@avibryant
@darrenjw sorry, I just saw this! Responded on github issues, but for the record here, y.gradient should match the ordering of y.variables, but is otherwise unspecified.
Darren Wilkinson
@darrenjw
Excellent - thanks for the clarification.
Darren Wilkinson
@darrenjw
To follow-up, I'm interested in the usual case, where I want repeated function and gradient evaluation to be as fast as possible, and so I'm happy to pay a one-off compilation cost. The docs on this are a bit sparse. Is the way I'm doing it above approximately correct? By trial-and-error, I'm using Compiler.withGradient to get gradient functions, each of which I then compile again with Compiler.default.compile to get compiled versions that I can then evaluate efficiently? Somehow this feels like it isn't quite how reverse-mode AD is supposed to work. Surely I shouldn't be evaluating the components of the gradient separately? Shouldn't I get the full gradient vector in one go? Am I missing a function/method somewhere?
Avi Bryant
@avibryant
@darrenjw yes, but there's not a super convenient API at the moment
what you want, I think, is Compiler.default.compile(List(x0,x1,x2), Compiler.withGradient("y", y, List(x0, x1, x2))
which will give you a CompiledFunction
say, cf
then you need to allocate val globalBuf = new Array[Double](cf.numGlobals)
and finally, you need to make sure to call cf.output(input, globalBuf, i) in order for i=0..3
to get your density and your 3 gradient elements
Avi Bryant
@avibryant
this gets more complicated still when you have a vector of observations that you want to run through (vs having everything inlined into the compute graph)
Darren Wilkinson
@darrenjw
exactly - what if I have a new List(x0,x1,x2) that I want the value and gradient for? Presumably I don't need to re-compile?
Avi Bryant
@avibryant
er, wait - x0, x1, x2 are already variables, right? you certainly don't need to re-compile for each Array(1.0, 2.0, math.Pi/2.0)
that becomes the input you provide to cf.output
what I was referencing above is when you want a mini-batch (or full batch) of data that you want to sum the density over
Darren Wilkinson
@darrenjw
OK - that's fine. But what triggers the forward-backward sweep of the AD? How does cf.output "know" that the input has changed and it has to re-do the sweep? When it's called for i=0?
Avi Bryant
@avibryant
calling it with i=n will store values in globalsBuf that will be needed by i>n
so what you need to do to guarantee correctness is to call it with the same input and globalsBuf, for each i in order
and then you can change the values input and do that again (fine to reuse the globalsBuf since it'll be overwritten)
Darren Wilkinson
@darrenjw
OK - that makes sense. I'll give it a go. Thanks!
Avi Bryant
@avibryant
let me know if it works!
Darren Wilkinson
@darrenjw
Will do!
Darren Wilkinson
@darrenjw
It works. Thanks again.
Kai(luo) Wang
@kailuowang
hi @avibryant Just noticed that rainier-cats was removed in stripe/rainier#441 along with rainier-scalacheck. The PR didn't give much specifics about this removal. I am asking because we use them in our project and am happy to help with maintaining the cats module if the burden of which is the main reason for its removal. Thanks!
Avi Bryant
@avibryant
@kailuowang yes, sorry about doing that without warning. I'd be happy to bring it back if someone were willing to maintain it. I did think that since RandomVariable is gone, which was the most central monad, that maybe it wasn't that useful anymore.
Kai(luo) Wang
@kailuowang
Thanks. That's good to know. I just started migration to 0.3. I will find out if a cats integration is still useful for our projects soon and get back to you.
Darren Wilkinson
@darrenjw
Relatedly, no flatMap on Real?! :-O It's taking me a while to get my head around 0.3...
Avi Bryant
@avibryant
@darrenjw it's not needed! And there are real ergonomic advantages in having the Real values available unwrapped to be referenced in generators etc without everything having to be a monolithic for-comprehension.
the only ergonomic/type-safety disadvantage is that it will compile if you use latent (née param) in places that are inappopriate, like during posterior prediction where we've already done the sampling and so the prior will be ignored. But that can still cause runtime errors. And the trade-off is more than worth it IMO.
Kai(luo) Wang
@kailuowang

One thing RandomVariable provides was the ability to generalize the composition of Models and their variables, which is useful when writing generalized libraries that is somewhat model agnostic. Is there a replacement for supporting such generalized composition?
To better illustrate my thinking, It’s tempting for me, based on my own use cases, to write a replacement of RandomVariable as something like

 case class RandomVariable[A](v: A, model: Model) {
    def mapWith[B, T](that: RandomVariable[B])(f: (A, B) => T): RandomVariable[T] =
        RandomVariable(f(v, that.v), model.merge(that.model)  
}

Would such a thing make sense with the new design?

Darren Wilkinson
@darrenjw
I have a related question. The docs suggest that there is an overhead associated with merging models. Is this significant? It seems like folds over collections of models will be necessary for many hierarchical models.
Avi Bryant
@avibryant
sorry that I didn't see these messages earlier, I need to figure out how to get my gitter notifications right. (meta: is gitter the right venue? happy to move to anything)
@kailuowang I agree that higher level wrappers like you're illustrating there would be useful. Model is, deliberately, a little bit lower level. But I'm hoping that you'll find it easier and more flexible to define those now. The problem we were having before had to do with the (mandatory) stack of RandomVariable[Generator[T]]. It's quite awkward to force people into this kind of monad transformer stack. I'd rather see what kinds of abstractions (monadic or otherwise) people come up with for a bit; maybe down the line we can officially upstream one.
@darrenjw the overhead isn't new, and is effectively just the question of how vectorizable the code is.
if you have a model that does some kind of recursive fold over the data points, then necessarily each data point will be considered singularly
Avi Bryant
@avibryant
if you can express it instead as a map rather than a fold, that can be more tractable
in particular, this is to do with the emitted code size: folds get "unrolled" in the DAG, whereas maps don't have to be
and if your DAG gets too big that can cause problems all down the line (autodiff, compilation, execution)
however, it's not obvious to me that hierarchical models will necessarily require folds, so a concrete example would be helpful.
Darren Wilkinson
@darrenjw
Your tutorial example (Vectors and Variables) illustrates it quite well. I think it's most natural to fit each group on the data for that group, then merge the models. Sure, you can unroll everything and do the conditioning with one Model.observe, but that doesn't seem so compositional, and won't necessarily adapt easily to more complex scenarios/data structures. I'm also thinking about DLMs, where it would probably be most natural to condition each state on it's observation, then foldLeft over the resulting sequence of models. But again, you could build the full prior model and condition on all of the data in one go. Am I right in thinking that for good performance it is best to build the full prior model and then condition on all of the data in one go with a single Model.observe in order to avoid merging models, if possible?
Kai(luo) Wang
@kailuowang
@avibryant thanks for the reply. I feel that it’s an improvement that the current API removes the mandatory monadical composition for most users. And it seems easier to define compositional constructs than what I saw inside the old RandomVariable. I am going to play with some constructs in my projects and see if anything turns out worth upstreaming.
Avi Bryant
@avibryant
@darrenjw yes, though to be clear: merging, say, 10s of models is fine. Merging probably hundreds or certainly thousands of models is going to cause performance problems, and to the extent that you can express it in a single Model.observe, it will be better.