- Join over
**1.5M+ people** - Join over
**100K+ communities** - Free
**without limits** - Create
**your own community**

can we make this "smarter" then just implementing individual combinations of XYZ

regarding how to proceed, @hyiltiz : you could read up a little on mlr3 and R6 and help adding in estimators? Then you get the cross-validation selection etc for free from mlr3 core benchmarking

i.e., help implementing some MLE, MAP etc; we show you the ropes how to do it, and show how out of that you get an easy cross-validation benchmark selection as you describe using mlr3 benchmark functionality in a few lines

might get you to your goal easier and quicker than hoping the functionality magically appears (we already spent our 3 wishes, on the mlr3proba interface design)

I like @fkiraly's idea on specifying the learner as the full composition of parameter, distribution selection and fitting (XYZ stuff above), and as @RaphaelS1 also suggested, should definitely try out mlr3 at some point. I am happy to help implement some of the estimators. There is a balance that is a bit hard to master well: if we go full-blown optimization over parameters, distribution families and even CV over learners (hierchical model fitting then prediction then selection), a) we step into what RStan already does quite well and is specialized to do so, and b) could also end up either writing too many equations from Wikipedia into the package, or even implementing a rudimentary symbolic math engine. I think overlapping with the functionalities provided by Stan or a symbolic engine probably should be strictly beyond the scope. On the other hand, we could keep popular packages and workflows in mind (RStan, glm's family option, ggplot's stat_ function/family options etc.) so this package can easily plug into those.

Makes sense - perhaps direct implementations for the "simple" choices such as MLE for distribution X, and "vendor interfaces" for XYZ compositions via RStan etc - that is, estimators that are generic , "eat" RStan code and produce an mlr estimator. The latter is probably substantially harder than the former (no precendent so far for Bayesian packages), so maybe it´s best to start with hard-coded "simple" ones?

I think we have been distracted from the original distr6 discussion. And I think, that any discussions about mlr3proba implementation are premature as we will need to first internally structure and clearly document the difference between our own Bayesian learners, and vendor interfaces to RStan etc. However I do think that this conversation has enforced Franz's original point that we could (and maybe should) fully abstract estimation to mlr3proba, as @hyiltiz has been our first open use-case and clearly requires methods that distr6 alone cannot produce. Having said that, if we could offer MLE in distr6, which is a feature that has been requested several times, then we could consider distr6 complete for the majority of users, who will not feel comfortable with the ML interface of mlr3proba. I don't see the harm in adding one function (yes function not object), with defaults, and in the documentation write "for more advanced control, see mlr3proba"

would it be just dispatch in your design, @RaphaelS1 ?

S3?

or custom dispatch?

Not even dispatch, just a simple function, see the sample code here: https://github.com/alan-turing-institute/distr6/issues/4#issuecomment-591857477

you don´t need to optimize etc

this just always optimizes

I agree with @RaphaelS1 that we may be discussing prematurely (thx Sir. Knuth). As a user, having MLE in distr6 makes a huge difference. It should be correct and work, i.e. correctly handle p(x)=0 or p(x)=1 cases. As for efficient MLE using properties of specific distributions as @fkiraly is suggesting (via S3 or dispatch etc.), that probably can wait and can even be delegated to

`mlr3proba`

to handle.
@fkiraly

but in many cases the MLE is very explicit

Yes but I don't necessarily think having a sub-standard more numerical version is a huge problem in `distr6`

as long as we clearly point to `mlr3proba`

. We don't need dispatch as we can always call `$pdf(log = TRUE)`

however I do think adding an analytical `logpdf`

to `ExoticStatistics`

is overdue.

@hyiltiz

As a user, having MLE in distr6 makes a huge difference

This is useful to hear thanks :)

thx Sir. Knuth

Out of interest what specific reference/quote is this to?

Sir. Knuth

No idea - Donald Knuth's algorithmics books perhaps?

Out of interest what specific reference/quote is this to?

https://wiki.c2.com/?PrematureOptimization

Premature optimization is the root of all evil.

If it were p.o., one must implicitly assume that the gradient descent version and the explicit version are identical except for performance.

This is in general false! Gradient minimizers have in general completely different statistical properties.

Often they run in local minima and are incorrect.

I think it is a case of requirements primarily, and thinking about which mathematical objects you want.

A common example of "bad architecture" (or anti-pattern) is the "god object" or "god function" that does everything.

though in many cases the more efficient way to compute the MLE is also the more accurate one...

By accurate, if you mean algorithm correctness (no systematic error or bias in the fits etc.), I do agree that it is always needed; if you meant higher precision of a correctly estimated value, I think something as poor as 6 digits should be quite enough for most applications. If we can detect local optimization (e.g. if -logL surface isn't convex) it is better to give out a warning (e.g. Estimation may be a local minimum; try better optimization procedures).

it should not take 10 min and give the "right" result in the sense of being 2 digits or so accurate

So it seems we can add the sample implementation, after fixing the cases for p=0 and p=1? https://github.com/alan-turing-institute/distr6/issues/4#issuecomment-591857477

@fkiraly Continuing that Github issue #4 here: what's the log likelihood of a coin toss coming up "neither heads nor tails"? Without ridiculing ourselves with a coin toss standing tall on its side or never falling on the ground etc., that probability is simply 0 yet log() is undefined for 0. log(p=1) is not really a property of some specific distribution; there always exists events for any distribution that can give you a p=0 event and that is not really rare in real-world data. E.g. In an letter typing experiment, I ask people to type a letter that shows up on the screen very briefly (show 3 letters per second so they need to press 3 letters per second). It is possible that the letter "V" was never pressed (simply due to it never being sampled to be shown, or due to the person missing every V that was presented). This kind of "no observation" cases are quite frequent in real-world data, and can easily give rise to p=0 event. A general purpose MLE based on logL should be able to solve them all; the hack I provided is very common in behavioral sciences. I am keen to learn a non-hacky correct and general way to deal with this if you can propose an alternative.

Just to confirm something quickly, base R deals with this case by returning *all* distributions will need a custom-written

`log(p) = -Inf`

, for p = 0. Do you both think this is a good or bad idea? Yours answers will make a very big difference in the distr6 implementation. Because if you think it is a bad idea, then we cannot use base R `log = TRUE`

, and `logPdf`

. If you think it's a good idea then we either need to scrap the distr6 work around of calling `log`

on `pdf`

when `log = TRUE`

and just call `log = TRUE`

if the internal function allows it, and otherwise return error.
Dawid AP, Musio M (2014) - Theory and Applications of Proper Scoring Rules

section 4

@RaphaelS1

Just to confirm something quickly, base R deals with this case by returning log(p) = -Inf, for p = 0. Do you both think this is a good or bad idea?

I think it is a good idea since it is correct.

I don´t think we should scrap the design - replacing the default log in cases where there is an efficient alternative is th way to go i.m.o.

@fkiraly The current design doesn't use analytical base R. Basically there are three choices: 1.Have a new method

`logPdf`

which is either in all distributions or added in `CoreStatistics`

, 2. use `CoreStatistics`

decorator to replace `pdf`

when possible with a function that includes analytical `log`

expression. 3. change default behaviour of `log = TRUE`

so that instead of it calling the pdf and wrapping this in `log()`

it instead calls an analytical expression first if provided, this can be quite a quick search using a private flag, `.log = logical(1)`

Just to confirm something quickly, base R deals with this case by returning

`log(p) = -Inf`

, for p = 0. Do you both think this is a good or bad idea?

Personally, I think it is a very good idea as it is the (mathematically) correct one.

Empirically speaking, for a single sample for which p=0, I'd expect log(p) to go -inf without losing much predictive power anyway; it becomes truly bad when such a single sample breaks down the whole estimation process with more samples most of which aren't edge cases. The need to "fix" comes from an empirical perspective when estimating parameters/selecting models, so probably need only to fix it empirically, rather than changing the fundamentals.

In which case as we are all in agreement about what is mathematically (and analytically) correct, then this is the only way to proceed in the core interface. If we choose to include some basic MLE in distr6 then we can provide an optional

`eps`

argument for handling edge-cases but the default will be to ignore this. We can handle this better in `mlr3proba`

but this is outside the remit of `distr6`