Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 19:14

    avi-stripe on develop

    Generated new Bazel BUILD files… (compare)

  • Jan 31 2019 19:14
    avi-stripe closed #322
  • Jan 30 2019 23:36
    areese-stripe review_requested #322
  • Jan 30 2019 23:36
    areese-stripe opened #322
  • Jan 30 2019 23:33

    areese-stripe on areese_bazel_update

    Generated new Bazel BUILD files… (compare)

  • Jan 30 2019 23:33
    areese-stripe added as member
  • Jan 27 2019 02:42

    avibryant on distributeconfig

    wip (compare)

  • Jan 23 2019 19:40

    avi-stripe on develop

    Fix compute bug with batched ob… (compare)

  • Jan 23 2019 19:40
    avi-stripe closed #321
  • Jan 23 2019 19:34
    mio-stripe labeled #321
  • Jan 22 2019 18:24
    mg-stripe added as member
  • Jan 19 2019 11:26
    mg-stripe removed as member
  • Jan 17 2019 20:02

    avi-stripe on develop

    stack safe Packer (#273) (compare)

  • Jan 17 2019 20:02
    avi-stripe closed #273
  • Jan 17 2019 20:01
    avi-stripe synchronize #321
  • Jan 17 2019 20:01

    avi-stripe on targettest

    fix sbcbenchmark (compare)

  • Jan 17 2019 19:57
    avi-stripe assigned #321
  • Jan 17 2019 19:57
    avi-stripe opened #321
  • Jan 17 2019 18:58

    avi-stripe on targettest

    fix bug with batchBits output o… (compare)

  • Jan 14 2019 22:36

    avi-stripe on targettest

    failing test for fit gamma targ… (compare)

Avi Bryant
@avibryant
if you have a model that does some kind of recursive fold over the data points, then necessarily each data point will be considered singularly
if you can express it instead as a map rather than a fold, that can be more tractable
in particular, this is to do with the emitted code size: folds get "unrolled" in the DAG, whereas maps don't have to be
and if your DAG gets too big that can cause problems all down the line (autodiff, compilation, execution)
however, it's not obvious to me that hierarchical models will necessarily require folds, so a concrete example would be helpful.
Darren Wilkinson
@darrenjw
Your tutorial example (Vectors and Variables) illustrates it quite well. I think it's most natural to fit each group on the data for that group, then merge the models. Sure, you can unroll everything and do the conditioning with one Model.observe, but that doesn't seem so compositional, and won't necessarily adapt easily to more complex scenarios/data structures. I'm also thinking about DLMs, where it would probably be most natural to condition each state on it's observation, then foldLeft over the resulting sequence of models. But again, you could build the full prior model and condition on all of the data in one go. Am I right in thinking that for good performance it is best to build the full prior model and then condition on all of the data in one go with a single Model.observe in order to avoid merging models, if possible?
Kai(luo) Wang
@kailuowang
@avibryant thanks for the reply. I feel that it’s an improvement that the current API removes the mandatory monadical composition for most users. And it seems easier to define compositional constructs than what I saw inside the old RandomVariable. I am going to play with some constructs in my projects and see if anything turns out worth upstreaming.
Avi Bryant
@avibryant
@darrenjw yes, though to be clear: merging, say, 10s of models is fine. Merging probably hundreds or certainly thousands of models is going to cause performance problems, and to the extent that you can express it in a single Model.observe, it will be better.
I guess I can imagine eventually adding something like Model.fold
but for the moment my brain hurts thinking about how to do that.
Darren Wilkinson
@darrenjw
That makes sense. Is Model.merge associative? ie. would it make sense to define a SemiGroup instance for |+| syntax?
Avi Bryant
@avibryant
yes, associative and commutative
Model is just keeping track of a set of likelihood functions
(and the associated observations)
Avi Bryant
@avibryant
so, if you think of one model as i=1nL(θ;xi)\sum_{i=1}^n L(\theta;x_i)
then merged models are something like
m=1ki=1nLm(θ;xmi)\sum_{m=1}^k \sum_{i=1}^n L_m(\theta;x_{mi})
Avi Bryant
@avibryant
and because each LmL_m has to be compiled separately, it's better to have small kk and large nn than vice versa
Darren Wilkinson
@darrenjw
Yes. I have been getting some warnings about stuff being too big to JIT. I guess this is why.
Avi Bryant
@avibryant
so in general the compiler breaks things into individual methods that are under theJIT threshold,
but there is an issue right now where if n_models * n_parameters gets large enough, there's a top-level dispatch method that gets generated too large to JIT
I'll try to fix that soon
Avi Bryant
@avibryant
BTW: I have been working on multivariate normal support, as well as mass matrix adaptation. As far as I can tell all the math is right, but I'm having a lot of problems with poor mixing when I try to actually fit a covariance matrix from data.
Darren Wilkinson
@darrenjw
Have you had a look at how they do it in Stan, or other libraries? People often just fit a diagonal mass matrix. If you are going to learn a full covariance matrix, I would use a shrinkage estimator to try and keep all of the eigenvalues away from zero.
Avi Bryant
@avibryant
I modeled it after what pymc3 does. I know stan at least optionally does a full mass matrix. but I'd love to hear more about the shrinkage estimator if you have any links.
it's a good point though that I should try starting with a diagonal mass matrix adaptation
which I haven't tried
Avi Bryant
@avibryant
thanks
I should have been more precise, though. My mixing problems are when trying to sample the covariance matrix of multivariate normal data, using an LKJ prior. I had hoped that learning the mass matrix would help but it's not obvious that it does
(one thing that's confusing to think about when debugging is that the mass matrix then ends up being the covariances of the elements in the [cholesky decomposition of the] covariance matrix of the data)
anyway it feels like some kind of numerical instability, maybe, which is always hard to track down
Kai(luo) Wang
@kailuowang
@avibryant quick question: the ability to plot in repl is removed right? The only way to plot now is in notebook right?
Avi Bryant
@avibryant
yeah, though again that could be easily reintroduced and probably should be. It was just easier for me to delete with a broad brush and reintroduce as needed. Do you actually use that?
@kailuowang ^
Darren Wilkinson
@darrenjw
I would use that, too!
ie. have "show" use EvilPlot's "displayPlot" method to pop up the plot from the Scala REPL. Does that make sense?
Avi Bryant
@avibryant
oh I see, rather than ascii plots like we had before?
yes that would be great
created stripe/rainier#488, I probably won't work on this right now but PRs very welcome
(the original motivation for the ascii plots was for tut docs, but mdoc makes it easier to support images)
Avi Bryant
@avibryant
BTW do you guys use ammonite or sbt console or what, as a REPL?
Darren Wilkinson
@darrenjw
I typically just use sbt console (old-school, I know)
Avi Bryant
@avibryant
ok I asked this to some others via DM but posting here in case maybe @darrenjw has thoughts:
Screen Shot 2020-02-25 at 11.05.33 AM.png
this is me trying to understand something basic about mass matrix adaptation and HMC
Darren Wilkinson
@darrenjw
I'm not sure I completely understand the problem. But when thinking about this stuff conceptually, I often find it helpful to think about the dynamics of the process in continuous time. In continuous time there is no leap frog, no step size, and there is no step-size adaptation, but there are still good and bad mass matrices, and still good and bad integration times, though these could be related.
Avi Bryant
@avibryant
yes, agreed. But in the discrete world, I guess one way of framing my question is: is there any reason we don't normalize mass matrices (eg, for a diagonal matrix, such that the elements are all <= 1?)
I realize this is a bit hand wavy but it seems like that's beneficial numerically to the leapfrog
but it's clearly not common practice so I fear I'm missing something.