Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    fkiraly
    @fkiraly
    There´s also an interesting "composition" problem: estimate parameters X of distribution Y using method Z is a full learner
    can we make this "smarter" then just implementing individual combinations of XYZ
    regarding how to proceed, @hyiltiz : you could read up a little on mlr3 and R6 and help adding in estimators? Then you get the cross-validation selection etc for free from mlr3 core benchmarking
    i.e., help implementing some MLE, MAP etc; we show you the ropes how to do it, and show how out of that you get an easy cross-validation benchmark selection as you describe using mlr3 benchmark functionality in a few lines
    might get you to your goal easier and quicker than hoping the functionality magically appears (we already spent our 3 wishes, on the mlr3proba interface design)
    fkiraly
    @fkiraly
    also you could be the first person to make a proper object oriented parametric estimation interface in R for distributions that´s interoperable with mlr
    Hörmet Yiltiz
    @hyiltiz
    I like @fkiraly's idea on specifying the learner as the full composition of parameter, distribution selection and fitting (XYZ stuff above), and as @RaphaelS1 also suggested, should definitely try out mlr3 at some point. I am happy to help implement some of the estimators. There is a balance that is a bit hard to master well: if we go full-blown optimization over parameters, distribution families and even CV over learners (hierchical model fitting then prediction then selection), a) we step into what RStan already does quite well and is specialized to do so, and b) could also end up either writing too many equations from Wikipedia into the package, or even implementing a rudimentary symbolic math engine. I think overlapping with the functionalities provided by Stan or a symbolic engine probably should be strictly beyond the scope. On the other hand, we could keep popular packages and workflows in mind (RStan, glm's family option, ggplot's stat_ function/family options etc.) so this package can easily plug into those.
    fkiraly
    @fkiraly
    Makes sense - perhaps direct implementations for the "simple" choices such as MLE for distribution X, and "vendor interfaces" for XYZ compositions via RStan etc - that is, estimators that are generic , "eat" RStan code and produce an mlr estimator. The latter is probably substantially harder than the former (no precendent so far for Bayesian packages), so maybe it´s best to start with hard-coded "simple" ones?
    Raphael Sonabend
    @RaphaelS1
    I think we have been distracted from the original distr6 discussion. And I think, that any discussions about mlr3proba implementation are premature as we will need to first internally structure and clearly document the difference between our own Bayesian learners, and vendor interfaces to RStan etc. However I do think that this conversation has enforced Franz's original point that we could (and maybe should) fully abstract estimation to mlr3proba, as @hyiltiz has been our first open use-case and clearly requires methods that distr6 alone cannot produce. Having said that, if we could offer MLE in distr6, which is a feature that has been requested several times, then we could consider distr6 complete for the majority of users, who will not feel comfortable with the ML interface of mlr3proba. I don't see the harm in adding one function (yes function not object), with defaults, and in the documentation write "for more advanced control, see mlr3proba"
    fkiraly
    @fkiraly
    hm, if a lot of people want that feature from distr6, I agree - there is a case to make for "simple" estimators to ship with it. Though let´s think carefully whether this should be function, method, or class. I currently don´t like adding it as a method to the distrs.
    would it be just dispatch in your design, @RaphaelS1 ?
    S3?
    or custom dispatch?
    Raphael Sonabend
    @RaphaelS1
    Not even dispatch, just a simple function, see the sample code here: https://github.com/alan-turing-institute/distr6/issues/4#issuecomment-591857477
    fkiraly
    @fkiraly
    but in many cases the MLE is very explicit
    you don´t need to optimize etc
    this just always optimizes
    Hörmet Yiltiz
    @hyiltiz
    I agree with @RaphaelS1 that we may be discussing prematurely (thx Sir. Knuth). As a user, having MLE in distr6 makes a huge difference. It should be correct and work, i.e. correctly handle p(x)=0 or p(x)=1 cases. As for efficient MLE using properties of specific distributions as @fkiraly is suggesting (via S3 or dispatch etc.), that probably can wait and can even be delegated to mlr3proba to handle.
    Raphael Sonabend
    @RaphaelS1

    @fkiraly

    but in many cases the MLE is very explicit

    Yes but I don't necessarily think having a sub-standard more numerical version is a huge problem in distr6 as long as we clearly point to mlr3proba. We don't need dispatch as we can always call $pdf(log = TRUE) however I do think adding an analytical logpdf to ExoticStatistics is overdue.

    @hyiltiz

    As a user, having MLE in distr6 makes a huge difference

    This is useful to hear thanks :)

    thx Sir. Knuth

    Out of interest what specific reference/quote is this to?

    fkiraly
    @fkiraly
    regarding logpdf - I think this should be in the core, not in exotic, since it is very important in many applications

    Sir. Knuth

    No idea - Donald Knuth's algorithmics books perhaps?

    Hörmet Yiltiz
    @hyiltiz

    Out of interest what specific reference/quote is this to?

    https://wiki.c2.com/?PrematureOptimization
    Premature optimization is the root of all evil.

    fkiraly
    @fkiraly
    I don't think this is a case of p.o. - I care more about correctness and accurate representation.
    If it were p.o., one must implicitly assume that the gradient descent version and the explicit version are identical except for performance.
    This is in general false! Gradient minimizers have in general completely different statistical properties.
    Often they run in local minima and are incorrect.
    I think it is a case of requirements primarily, and thinking about which mathematical objects you want.
    Hörmet Yiltiz
    @hyiltiz
    No I meant whether thru dispatch or S3 may be too early to decide now that we do not even have the feature implemented and not quite sure where bottlenecks are
    fkiraly
    @fkiraly
    But that too is not primarily a question of efficiency, it is rather one of architecture.
    A common example of "bad architecture" (or anti-pattern) is the "god object" or "god function" that does everything.
    Hörmet Yiltiz
    @hyiltiz
    yes but the architecture under discussion may not even be needed; a not-so-efficient but correct MLE would probably suffice and a pointer in docs
    fkiraly
    @fkiraly
    exactly, we haven't even settled on "what's needed", this is what I mean with "requirements"
    though in many cases the more efficient way to compute the MLE is also the more accurate one...
    Hörmet Yiltiz
    @hyiltiz
    By accurate, if you mean algorithm correctness (no systematic error or bias in the fits etc.), I do agree that it is always needed; if you meant higher precision of a correctly estimated value, I think something as poor as 6 digits should be quite enough for most applications. If we can detect local optimization (e.g. if -logL surface isn't convex) it is better to give out a warning (e.g. Estimation may be a local minimum; try better optimization procedures).
    fkiraly
    @fkiraly

    correctness (no systematic error or bias in the fits etc.)

    yes, thats what I mean

    it should not take 10 min and give the "right" result in the sense of being 2 digits or so accurate
    Hörmet Yiltiz
    @hyiltiz
    So it seems we can add the sample implementation, after fixing the cases for p=0 and p=1? https://github.com/alan-turing-institute/distr6/issues/4#issuecomment-591857477
    Hörmet Yiltiz
    @hyiltiz
    @fkiraly Continuing that Github issue #4 here: what's the log likelihood of a coin toss coming up "neither heads nor tails"? Without ridiculing ourselves with a coin toss standing tall on its side or never falling on the ground etc., that probability is simply 0 yet log() is undefined for 0. log(p=1) is not really a property of some specific distribution; there always exists events for any distribution that can give you a p=0 event and that is not really rare in real-world data. E.g. In an letter typing experiment, I ask people to type a letter that shows up on the screen very briefly (show 3 letters per second so they need to press 3 letters per second). It is possible that the letter "V" was never pressed (simply due to it never being sampled to be shown, or due to the person missing every V that was presented). This kind of "no observation" cases are quite frequent in real-world data, and can easily give rise to p=0 event. A general purpose MLE based on logL should be able to solve them all; the hack I provided is very common in behavioral sciences. I am keen to learn a non-hacky correct and general way to deal with this if you can propose an alternative.
    Raphael Sonabend
    @RaphaelS1
    Just to confirm something quickly, base R deals with this case by returning log(p) = -Inf, for p = 0. Do you both think this is a good or bad idea? Yours answers will make a very big difference in the distr6 implementation. Because if you think it is a bad idea, then we cannot use base R log = TRUE, and all distributions will need a custom-written logPdf. If you think it's a good idea then we either need to scrap the distr6 work around of calling log on pdf when log = TRUE and just call log = TRUE if the internal function allows it, and otherwise return error.
    fkiraly
    @fkiraly
    minimize empirical loss or regularized empirical loss for a strictly proper scoring rule suitable for mixed distributions, e.g., the integrated Brier score
    Dawid AP, Musio M (2014) - Theory and Applications of Proper Scoring Rules
    section 4
    @RaphaelS1

    Just to confirm something quickly, base R deals with this case by returning log(p) = -Inf, for p = 0. Do you both think this is a good or bad idea?

    I think it is a good idea since it is correct.

    I don´t think we should scrap the design - replacing the default log in cases where there is an efficient alternative is th way to go i.m.o.
    Raphael Sonabend
    @RaphaelS1
    @fkiraly The current design doesn't use analytical base R. Basically there are three choices: 1.Have a new method logPdf which is either in all distributions or added in CoreStatistics, 2. use CoreStatistics decorator to replace pdf when possible with a function that includes analytical log expression. 3. change default behaviour of log = TRUE so that instead of it calling the pdf and wrapping this in log() it instead calls an analytical expression first if provided, this can be quite a quick search using a private flag, .log = logical(1)
    fkiraly
    @fkiraly
    I think no.3 is least error prone for the user (no additional line of code or knowledge of another method required), and no.1 would be most consistent with the interface philosophy. But I would be happy with either.
    Hörmet Yiltiz
    @hyiltiz

    Just to confirm something quickly, base R deals with this case by returning log(p) = -Inf, for p = 0. Do you both think this is a good or bad idea?

    Personally, I think it is a very good idea as it is the (mathematically) correct one.

    Empirically speaking, for a single sample for which p=0, I'd expect log(p) to go -inf without losing much predictive power anyway; it becomes truly bad when such a single sample breaks down the whole estimation process with more samples most of which aren't edge cases. The need to "fix" comes from an empirical perspective when estimating parameters/selecting models, so probably need only to fix it empirically, rather than changing the fundamentals.

    Raphael Sonabend
    @RaphaelS1
    In which case as we are all in agreement about what is mathematically (and analytically) correct, then this is the only way to proceed in the core interface. If we choose to include some basic MLE in distr6 then we can provide an optional eps argument for handling edge-cases but the default will be to ignore this. We can handle this better in mlr3proba but this is outside the remit of distr6
    Hörmet Yiltiz
    @hyiltiz
    Sure, that sounds good enough :D