Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Angus Hollands
    @agoose77:matrix.org
    [m]
    It feels like this is a wider conversation that we need to be having as a community. I know that Sébastien is quite opinionated (and in general that's probably a good thing in the packaging space), but as more people use poetry, things are going to get gnarlier. (makes mental note to start unpinning my libraries)
    Henry Schreiner
    @henryiii
    It also forces the library to update rapidly, as the they have to do a new release as soon as a new major version of any dependency comes out. Often, it’s beyond the resources of most open-source projects. It’s really a disaster.
    Angus Hollands
    @agoose77:matrix.org
    [m]
    @henryiii: yeah, that was my planned approach before we had this conversation
    Henry Schreiner
    @henryiii
    This is an old discussion that has come up on Poetry before - I think they are not interested in changing the recomendations or the defaults. https://github.com/python-poetry/poetry/discussions/3757
    Poetry even pins pytest by default! If you want to support Python 3.10, you have to update pytest, there was a Python change that broke Pytest (I think there’s some discussion about possibly reverting it, but the idea is still true)

    Nice, hadn’t read this from the discussion:

    I agree with Henry's point of view, default capping is a bad idea. To be specific, the worst thing about defaulting to a cap is that there is no way to distinguish between affirmative knowledge that a package fails above a certain version (I'm thinking pydantic and python 3.10) and a cap based on untested presumptions.

    Jonas Eschle
    @mayou36

    It also forces the library to update rapidly, as the they have to do a new release as soon as a new major version of any dependency comes out. Often, it’s beyond the resources of most open-source projects. It’s really a disaster.

    This is a two-sided sword however: In general I agree to let it free if fine (and definitely restrict if needed, I had multiple releases of old versions that just broke, I very well agree with @HDembinski point here and am restricting always a couple of dependencies, recently even Jinja broke the doc builds...). But if letting it freely float is the norm, libraries also may tend less to do breaking changes, and that would sometimes be a good thing while also sometimes problematic (Python 2 vs 3 is definitely a good thing but yeah, problematic transition...). Expecting in general version compatibility for all versions inherently limits the ability to get reasonably rid of legacy code or behavior.

    Just to say: I really like the uproot approach here! That is a pretty explicit pinnig of version but allowing the co-existance of multiple versions to avoid any dependency conflict, like having the good of both worlds. If this can be formalised better and we see a import iminuit1, iminuit2, that seems quite favorable to me.

    25 replies
    Angus Hollands
    @agoose77:matrix.org
    [m]
    Hmm, actually I don't really know the reason behind separating the uproot packages at the package level. I assume it was a decision on the repo level to avoid merging issues etc from a different code-base (and then using a new PyPI package to make the 1-1 mapping between PyPI and GitHub more clear). Does anyone know why this was done (including Jim)
    Henry Schreiner
    @henryiii
    To be clear, if you know there is or very likely will be a problem with a major update, yes, limit, that’s fine. I’m just saying you should never “pin by default”; if you don’t use a library’s internals heavily, etc, definitely you should not add an upper limit. (For a library - not such a big problem with an application)
    Henry Schreiner
    @henryiii
    “ideal” SemVer provides deprecation warnings in patch versions that then become removed items in later versions (Python mostly does this, “minor” versions are really major versions in SemVer terms), so as long as you keep up to date in CI, you can correct problems before a major version is released, and smoothly transition. “I’m going to break everyone today” upgrades are painful no matter what.
    Another problem - if you limit your major versions, you’ll be delayed in fixing and supporting the new major version when it comes out. If you do break on click 8, but you’ve pinned click 7, you might a) not know it breaks, and b) might forget to update the pin, until users start opening issues asking why their solve returned “no solve possible” or something worse with old verisons and missing wheels, etc. CI will still be grabbing the old version since you’ve pinned it.
    Miguel Crispim Romao
    @romanovzky
    Hi all. I have a project where I need to use uproot to read tabular TTrees, but I also wanted to use uproot-methods to do 4-momentum manipulation. However, uproot-methods is only dependent on the old awkward. Should I then use the deprecated versions, or is there an uproot-methods compatible with the newer awkward?
    Chris Burr
    @chrisburr
    Miguel Crispim Romao
    @romanovzky
    Thanks, I will look into it. Does it have helper functions like uproot_methods.TLorentzVectorArray.from_ptetaphim? This type of functionality is what I really loved about uproot-methods
    Eduardo Rodrigues
    @eduardo-rodrigues
    Yep.
    Miguel Crispim Romao
    @romanovzky
    Gorgeous stuff, cheers
    Eduardo Rodrigues
    @eduardo-rodrigues
    (The ecosystem is evolving and the blanks are being filled ...)
    I mean in terms of required functionality.
    Jim Pivarski
    @jpivarski
    This is not an argument against 100% test coverage (100 is better than any other number), but it's still possible for API-breaking changes to occur with 100% test coverage. For one thing, there's extreme value handling (NaN and Inf), though the hypothesis library looks like it does a good job of generating tests for those cases, above and beyond simple line-coverage. But you ca
    (Sorry—should be in the thread—moving there.)
    (I don't know how to use Gitter on a phone.)
    3 replies
    Jim Pivarski
    @jpivarski
    There are API "choices" that are arbitrarily subtle to define, and you can have changes in things you didn't even know were rules. Here's one that I am aware of: in an Awkward ListArray, having starts[i] == stops[i] is defined to mean that list i has zero length. Okay, but what if either starts[i] or stops[i] is negative or greater than or equal to the length of the content? In any other case, that would be an error. Should it be an error when starts and stops are equal? I decided that it should not be, and therefore I have to include tests where they're equal but greater than any value in the content. If I hadn't thought of it, no line coverage or even extreme value tester like hypothesis would have caught it, because it involves a dependency between two array elements that the automated tools don't know might be related. It requires human attention and can't be fully guaranteed by any automated system. THAT'S the sort of thing that can slip though version changes despite the vigilance of the testers.
    12 replies
    Matthew Feickert
    @matthewfeickert
    @HDembinski you might see some more monolens :star: s or traffic in the coming month as monolens was mentioned on the PythonBytes podcast. :)
    3 replies
    Matthew Feickert
    @matthewfeickert
    out of curiosity who here is using relative imports vs. explicit imports in their libraries (I guess I could just look, but my guess is there might be discussion)? This was motivated in my brain today by https://youtu.be/uwtupH7LJco
    Angus Hollands
    @agoose77:matrix.org
    [m]

    @matthewfeickert in what has become a bit of a trend for me at the moment, I'm re-evaluating all of this again myself.

    I prefer writing relative imports but I enjoy reading absolute imports (where only the modules are imported rather than their contents)

    I'm sticking with 'bare' (import functions rather than modules) relative imports myself, though. Mainly because I'm writing numerical code, and the qualified name hurts readability.
    I'm still not sure exactly whether I'll end up changing my mind, but my current reasoning is that any modern IDE/Web viewer for github has the code symbols feature, so it's not hard to find out where some symbol comes from. This motivates me to prioritise readability with unqualified names
    Henry Schreiner
    @henryiii
    Depends on what package I’m in. Some packages require all fully qualified absolute imports in their style, like build. Some require relative imports (often due to vendorability, I think, though also redability). Importing objects from a package instead of the package itself can cause circular dependency issues, so I’m leaning more toward importing modules if I write something new. Either as relative imports or absolute.
    So from . import module or import package.module. For things in the stdlib, I’m fine to pull objects in, as you can’t get into a circular sitation there, and the names are commonly known. It makes typing so much more readable, so I’m a bit torn on importing local classes.
    Searching and cross-linking and such is handled by IDEs and even GitHub to a degree, so not too worried there. You can do a smart search-and-replace in PyCharm, for example; it doesn’t matter how you’ve imported things for that.
    Angus Hollands
    @agoose77:matrix.org
    [m]
    Something I've come to realise is that writing good non numeric/scikit libraries is quite different to writing something a little more 'traditional'. I learned python outside of hep, and I've had to re-evaluate my idea of best practices a fair amount
    Matthew Feickert
    @matthewfeickert
    Yeah, I know that pyhf has some circular import warnings that haven't seemed to bite us yet, but similar to @agoose77:matrix.org I'm thinking of just switching everything over to absolute imports now that the repository structure isn't undergoing much change anymore
    Cédric Hernalsteens
    @chernals
    Getting started now with aghast, boost::histograms and hist ... high hopes and expectations :D
    Cédric Hernalsteens
    @chernals
    Why do we need to convert boost histograms to numpy first within agast ? Like boost histogram ==> root or vice versa requires the intermediate numpy step right ?
    Jim Pivarski
    @jpivarski

    @chernals Before you get too excited about Aghast, it's a nearly deprecated project (just haven't mustered the courage to pull the trigger yet). It hasn't been touched in a long time. The intention was to unify histograms into a common serialization, and histograms still do need better interoperability on a serialized level (we want to be able to save Boost histograms without losing their special features by converting them into ROOT v6 histograms, etc.), but the Aghast model won't allow for irregular data, such as you might get in sparse or categorical histograms. Nick made a good point here: scikit-hep/aghast#10 which could be solved by replacing FlatBuffers as the serialization with Awkward Array or Arrow. Since that issue was in 2019, Awkward Array needed more development to make that happen (Arrow, too, that long ago). Now it could be done.

    I'm only warning you against getting too attached to Aghast—Boost histogram and hist are very active, mature, up-to-date projects.

    Cédric Hernalsteens
    @chernals
    Thanks @jpivarski for all the info.

    That's a bit sad indeed but now I'm prepared.

    The features I'm mostly interested in are :

    • adding histograms like aghast was providing
    • reading root histograms from a root file and then manipulating them as pyboost histograms or "hist" histograms in Python
    I imagine this is supported by boost histogramd and by hist ?
    Cédric Hernalsteens
    @chernals
    For my second point this is trivial with uproot4 and the to_hist method :)
    Cédric Hernalsteens
    @chernals
    And this is a zero-copy operation, correct ?
    Jim Pivarski
    @jpivarski

    @chernals Aghast was intended as a superset of all histogram features, but that's nearly equal to boost-histogram's features, so your first bullet point should be satisfied by that.

    As for zero-copy, the bin contents are either a zero-copy view or a direct memcpy-style copy, but this doesnt

    matter much because histograms are small. You have to get into multi-MB scales before you start to notice the throughput of the copy, compared to the cost of translating the metadata.
    Henry Schreiner
    @henryiii
    Making a hist/boost-histogram from something else does always make a copy (usually small, so usually not a problem), but boost-histogram has to allocate it’s own memory (currently). I haven’t put the effort into making it work using an existing peice of memory, since you’d still have to have correctly structured memory, which is very unlikly due to the flow bins. And a slice will need to be a new piece of memory anyway, also due to the flow bins.
    Andrzej Novak
    @andrzejnovak
    To be fair having a from_root/from_numpy would be really convenient in hist
    Cédric Hernalsteens
    @chernals
    Thanks @jpivarski and @henryiii ! I'm a modest user so in any case it will be all good for my user cases.
    And certainly a major improvement compared to the quadrupole for loop one of our student used to copy from root to boost-histogram ;)
    Henry Schreiner
    @henryiii
    A from_root would be useful, and maybe from_numpy (though that could possibly be solved just as well with good examples?) - maybe @amangoel185 can be talked into it. :) Also, the new feature in PR to UHI will make this nicer, too
    3 replies
    Henry Schreiner
    @henryiii
    Adding flit and trampolim PEP 621 support to scikit-hep/cookie :slight_smile: scikit-hep/cookie#21
    Henry Schreiner
    @henryiii
    UHI 0.3.0 is out, if you use the converter utility it provides, that now supports PyROOT. :)
    Andrzej Novak
    @andrzejnovak
    :confetti_ball: