Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Jim Pivarski
    @jpivarski
    Names like __xyz__ are reserved in C.
    Jonas Eschle
    @mayou36

    Ok, I was not wrong with the above: from the docs

    to quote (about __*__)

    System-defined names. These names are defined by the interpreter and its implementation (including the standard library). (...) More will likely be defined in future versions of Python. Any use of __*__ names, in any context, that does not follow explicitly documented use, is subject to breakage without warning.

    Jim Pivarski
    @jpivarski
    Interesting. This is at odds with what libraries do, including Numpy.
    Although the surrounding context for this quote is talking about what modules import, and therefore names defined in module scope. Above, we're talking about names defined in a class's static scope (a nested namespace).
    Jonas Eschle
    @mayou36
    So to me there are two points here: the high level discussion which results in: _function is not good and __function__ is not good either.
    Then there is the practical solution:
    • having the pattern "public-calls-private", documenting that well (like scipy, tensorflow probability does) and saying explicitly that this is part of the internal api is fine
    • the same goes for "fancy enough names" __*__ methods (as e.g. numpy does): python won't have (most likely) a __array_function__ and if so, they slightly rename it (given the importance of numpy). But realistically, this won't happen anyway

    Although the surrounding context for this quote is talking about what modules import, and therefore names defined in module scope. Above, we're talking about names defined in a class's static scope (a nested namespace).

    Not really. a) it's basically the same in Python :) and b), the one leading underscore below talks explicitliy about class private names

    I think there is no real distinction made about class/modules there, right?
    But I wonder now more about the rational behind numpys decision to go for that tbh
    Jim Pivarski
    @jpivarski
    This is why I think it's good to put the library's name in the name of the protocol: __awkward_serialize__. It would be bad if two protocols picked the same name.
    Jonas Eschle
    @mayou36
    Anyway, for practical usecases, it's at least as safe, (if not safer IMHO) to add a guarantee where there is actually none (saying these _* methods are stable) instead of using a non-guaranteed feature. But realistically, I agree: adding the name of the library (__*__) really makes things "safe" again.
    Jim Pivarski
    @jpivarski
    Well, disregarding the issue of __*__ (I've been trying to think of other protocols, not defined by the standard library and not defined by Numpy, but by others...), it would be wrong to expect users to override _*.
    __*__ might be a rule defined by the Python docs but disregarded by the community, but _* is in agreement: it's for internal APIs with no guarantee of stability.
    Thanks for pointing me to the documentation about __*__. If I can't think of any examples beyond the standard library and Numpy (__array__, __array_interface__, __array_ufunc__, __array_function__!), then perhaps Numpy has special status, it's "extended standard library..."
    Jonas Eschle
    @mayou36
    Hm, I don't quite agree: how many libraries really disrespect the (__*__)? Many?
    And the same goes for the (*_): scipy and TensorFlow (Probability) that use it, so it's also not "respected"

    Thanks for pointing me to the documentation about __*__. If I can't think of any examples beyond the standard library and Numpy (__array__, __array_interface__, __array_ufunc__, __array_function__!), then perhaps Numpy has special status, it's "extended standard library..."

    Yes, numpy special status could go on... I don't know any other and remember to had long discussions already about it

    But I agree with your concern on the _*, it's not "golden"
    Jim Pivarski
    @jpivarski
    I haven't been able to think of any non-standard library, non-Numpy examples, which is what's making me rethink that. But about _* (underscore first): what does SciPy and TensorFlow expect you to override?
    Jonas Eschle
    @mayou36
    The public method with a leading underscore. The exact same usecase as in zfit
    like _pdf etc
    Jim Pivarski
    @jpivarski
    I mean, what are the names of the methods in SciPy and TensorFlow so that I can look them up?
    Jonas Eschle
    @mayou36

    The links get you directly there: Scipy and TensorFlow Probability (and TensorFlow itself)

    E.g. (TFP): prob (_prob), log_prob,...
    (scipy): _logpdf, _cdf, _logcdf, _ppf, _rvs, _isf, _sf, _logsf

    Jim Pivarski
    @jpivarski
    And this (though I'd like to find official documentation; all I'm finding so far are StackOverflow instructions).
    Jonas Eschle
    @mayou36

    The advantage of this method is for the case where you need "preprocessing":

    • the naming is clear. You could have another convention (user_pdfor whatever), but why not _*?
    • it is private in the class API. So it is clear from the outside not to call it.

    I think the question of _* boils down to whom should it be private to. The subclasser or the invoker. Saying that a user can only override public methods implies that a users method is always exposed (except you add another layer of indirection of course). The question that remains: is there a practical, equivalent way to do it?

    (Not talking about e.g. registering integrals etc., this mechanisms are complementary to me)

    The links get you directly there: Scipy and TensorFlow Probability (and TensorFlow itself)

    exactly. The official documentation is following the link above

    Jim Pivarski
    @jpivarski
    Yes, there is this issue with multiple levels of "privateness."
    Chris Burr
    @chrisburr
    To add another way of doing it: IPython uses single underscores on both sides
    Jim Pivarski
    @jpivarski
    There you go: those links do it.

    So SciPy and TensorFlow really do require "people who are defining new probability distributions" (one kind of user) to override methods starting with a single underscore so that "people who use those probability distributions" (another kind of user) know that they're not supposed to call them directly. It's exactly the pattern I was advocating the __*__ for, but there's a rule against that in the central Python documentation.

    I stand corrected.

    It's a use of "privateness" that is unrelated to stability of API: the protocols defined by _* can't change for the same reason that protocols defined by __*__ can't change.
    Jonas Eschle
    @mayou36

    Yes, there is this issue with multiple levels of "privateness."

    Yes, a "private but stable" basically is missing

    @chrisburr that's interesting, so basically a convention for "private but stable"

    It's a use of "privateness" that is unrelated to stability of API: the protocols defined by _* can't change for the same reason that protocols defined by __*__ can't change.

    Fully agree!

    Jim Pivarski
    @jpivarski
    Yeah, maybe _*_ should fill this gap.
    And maybe I should start using it: _awkward_serialize_.
    And I should stop thinking of Numpy as a library and consider it part of the standard library, as far as these kinds of conventions go.
    Henry Schreiner
    @henryiii
    Note: Numpy is large enougth that they can be sure Python never adds a __*__ name matching the one they are using; but that’s only becuase they are numpy - some parts of the python syntax, like @ and , were added for Numpy. Other users should never add a name like that (at least they are not supposed to)
    I think _*_ might work for what you want to do.
    (And following Tensorflow, etc. is fine too, though that seems a bit odd to me)
    Jonas Eschle
    @mayou36

    I think _*_ might work for what you want to do.

    This is though an undocumented convention, right? Does except of IPython someone else use it, do you happen to know that by chance?

    Chris Burr
    @chrisburr
    I'm not aware of anyone else but it feels like something that should be standardised (regardless of how)
    I wonder if the __*__ rule could be relaxed to allow libraries to implement __*_*__
    Jonas Eschle
    @mayou36

    Tbh, I am not convinced of the __*_*__ in general. So for the numpy or uproot use case, that looks good, but for the general subclassing case (as scipy, zfit,....), this seems too (unnecessary) cumbersome for a user, where overriding by the end user is a usual case. _*_ sounds better to me and is, AFAIK, not taken. To be defined as "private but stable" to make a distinction to "_*".

    The _*_ is also independent of any library name and better for any parser tool (autodocs?)

    But yes, it seems worth to standardise

    Jim Pivarski
    @jpivarski
    _*_ has the virtue of being completely outside Python's reserved namespace (whereas __*_*__ is only out of the likely namespace) and it connotes the right thing: it resembles the __*__ that users (for some special definition of "user") are accustomed to overriding. _* looks like a protected method, which looks odd, even when SciPy and TensorFlow do it. With IPython's _*_ methods, the intent is clear.
    Hans Dembinski
    @HDembinski

    Crazy, it is possible to replace a base class dynamically in Python:

    class Bar(object):
        def __init__(self):
            self.bar_init = 1
        def __str__(self):
            return "Bar %i" % self.bar_init
    
    class Baz(object):
        def __init__(self):
            self.baz_init = 2
        def __str__(self):
            return "Baz %i" % self.baz_init
    
    class Foo(Bar):
        def __init__(self, use_baz):
            if use_baz:
                Foo.__bases__ = (Baz,)
            super(Foo, self).__init__()
    
    print(Foo(0))
    print(Foo(1))

    Output:

    Bar 1
    Baz 2
    Henry Schreiner
    @henryiii
    PSA: don’t do this. This will probably break things. Classes are objects there is just one “instance”
    >>> a = Foo(0)
    >>> b = Foo(1)
    >>> print(a)
    Baz 2
    >>> print(b)
    Baz 2
    Henry Schreiner
    @henryiii
    (There are special cases where this might be useful, btw, like adding a base class only if installed, just use very carefully)
    Hans Dembinski
    @HDembinski
    Ok, this doesn't work :(
    Jim Pivarski
    @jpivarski
    You can also assign to self.__class__ to dynamically change a class's type after it has been created. Classes in Python mean much less than they do in a statically compiled language: they're really just namespaces for methods and "static" member data if Python fails to find a name on the object itself.