##### Activity
• Jan 31 2019 19:14

avi-stripe on develop

Generated new Bazel BUILD files… (compare)

• Jan 31 2019 19:14
avi-stripe closed #322
• Jan 30 2019 23:36
areese-stripe review_requested #322
• Jan 30 2019 23:36
areese-stripe opened #322
• Jan 30 2019 23:33

areese-stripe on areese_bazel_update

Generated new Bazel BUILD files… (compare)

• Jan 30 2019 23:33
areese-stripe added as member
• Jan 27 2019 02:42

avibryant on distributeconfig

wip (compare)

• Jan 23 2019 19:40

avi-stripe on develop

Fix compute bug with batched ob… (compare)

• Jan 23 2019 19:40
avi-stripe closed #321
• Jan 23 2019 19:34
mio-stripe labeled #321
• Jan 22 2019 18:24
mg-stripe added as member
• Jan 19 2019 11:26
mg-stripe removed as member
• Jan 17 2019 20:02

avi-stripe on develop

stack safe Packer (#273) (compare)

• Jan 17 2019 20:02
avi-stripe closed #273
• Jan 17 2019 20:01
avi-stripe synchronize #321
• Jan 17 2019 20:01

avi-stripe on targettest

fix sbcbenchmark (compare)

• Jan 17 2019 19:57
avi-stripe assigned #321
• Jan 17 2019 19:57
avi-stripe opened #321
• Jan 17 2019 18:58

avi-stripe on targettest

fix bug with batchBits output o… (compare)

• Jan 14 2019 22:36

avi-stripe on targettest

failing test for fit gamma targ… (compare)

Avi Bryant
@avibryant
$\sum_{m=1}^k \sum_{i=1}^n L_m(\theta;x_{mi})$
and because each $L_m$ has to be compiled separately, it's better to have small $k$ and large $n$ than vice versa
Darren Wilkinson
@darrenjw
Yes. I have been getting some warnings about stuff being too big to JIT. I guess this is why.
Avi Bryant
@avibryant
so in general the compiler breaks things into individual methods that are under theJIT threshold,
but there is an issue right now where if n_models * n_parameters gets large enough, there's a top-level dispatch method that gets generated too large to JIT
I'll try to fix that soon
Avi Bryant
@avibryant
BTW: I have been working on multivariate normal support, as well as mass matrix adaptation. As far as I can tell all the math is right, but I'm having a lot of problems with poor mixing when I try to actually fit a covariance matrix from data.
Darren Wilkinson
@darrenjw
Have you had a look at how they do it in Stan, or other libraries? People often just fit a diagonal mass matrix. If you are going to learn a full covariance matrix, I would use a shrinkage estimator to try and keep all of the eigenvalues away from zero.
Avi Bryant
@avibryant
I modeled it after what pymc3 does. I know stan at least optionally does a full mass matrix. but I'd love to hear more about the shrinkage estimator if you have any links.
it's a good point though that I should try starting with a diagonal mass matrix adaptation
which I haven't tried
Darren Wilkinson
@darrenjw
Avi Bryant
@avibryant
thanks
I should have been more precise, though. My mixing problems are when trying to sample the covariance matrix of multivariate normal data, using an LKJ prior. I had hoped that learning the mass matrix would help but it's not obvious that it does
(one thing that's confusing to think about when debugging is that the mass matrix then ends up being the covariances of the elements in the [cholesky decomposition of the] covariance matrix of the data)
anyway it feels like some kind of numerical instability, maybe, which is always hard to track down
Kai(luo) Wang
@kailuowang
@avibryant quick question: the ability to plot in repl is removed right? The only way to plot now is in notebook right?
Avi Bryant
@avibryant
yeah, though again that could be easily reintroduced and probably should be. It was just easier for me to delete with a broad brush and reintroduce as needed. Do you actually use that?
@kailuowang ^
Darren Wilkinson
@darrenjw
I would use that, too!
ie. have "show" use EvilPlot's "displayPlot" method to pop up the plot from the Scala REPL. Does that make sense?
Avi Bryant
@avibryant
oh I see, rather than ascii plots like we had before?
yes that would be great
created stripe/rainier#488, I probably won't work on this right now but PRs very welcome
(the original motivation for the ascii plots was for tut docs, but mdoc makes it easier to support images)
Avi Bryant
@avibryant
BTW do you guys use ammonite or sbt console or what, as a REPL?
Darren Wilkinson
@darrenjw
I typically just use sbt console (old-school, I know)
Avi Bryant
@avibryant
ok I asked this to some others via DM but posting here in case maybe @darrenjw has thoughts:
this is me trying to understand something basic about mass matrix adaptation and HMC
Darren Wilkinson
@darrenjw
I'm not sure I completely understand the problem. But when thinking about this stuff conceptually, I often find it helpful to think about the dynamics of the process in continuous time. In continuous time there is no leap frog, no step size, and there is no step-size adaptation, but there are still good and bad mass matrices, and still good and bad integration times, though these could be related.
Avi Bryant
@avibryant
yes, agreed. But in the discrete world, I guess one way of framing my question is: is there any reason we don't normalize mass matrices (eg, for a diagonal matrix, such that the elements are all <= 1?)
I realize this is a bit hand wavy but it seems like that's beneficial numerically to the leapfrog
but it's clearly not common practice so I fear I'm missing something.
I guess maybe, more correctly, such that the diagonal elements treated as a vector have a length of 1?
Darren Wilkinson
@darrenjw
Or trace 1, would have been my instinct (total variance).
But it feels like it shouldn't be necessary
Avi Bryant
@avibryant
so apart from empirical observations, which could be related to implementation bugs etc, the reason it seems theoretically useful to me is that
(thinking out loud a bit here)
Darren Wilkinson
@darrenjw
It feels like in the case of an MVN target, the mass matrix should reflect the covariance/precision matrix, and not some normalised version. But this isn't something I've implemented in practice, so I could easily be missing something important...
Avi Bryant
@avibryant
in the leap frog, we advance the parameters by stepSize * (momentum * variance)
and then we advance the momentum by stepSize * gradient(parameters)
where the gradient is scaled to a delta of 1 on the parameters
Darren Wilkinson
@darrenjw
Yes it's certainly true that bad choices can make the numerics stiffer, but I feel like the good choices make the numerics better.
Avi Bryant
@avibryant
and I guess the thing that's getting me right now is that if variance is very far from 1,
does that make a good choice of stepSize for the parameter full step,
a bad choice of stepSize for the momentum half step?
I guess I can't really justify why normalizing it would be any better or worse.
Darren Wilkinson
@darrenjw
But my feeling is that you would only choose a variance far from one if different step sizes for the parameters and momentum are appropriate. But I'm seriously hand-waving at this point!
Avi Bryant
@avibryant
ok, I can see the intuition there, I think: if a given parameter has low variance, then its component of the gradient will also be small