Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    revkarol
    @revkarol

    Hi @all We are looking for PhD students in in physics, computer science, and data science to attend a three-day OpenHack in September to analyze real physics data from the LHCb experiment at CERN using Microsoft AI technologies.

    An OpenHack is challenge rather than instruction-based. Students will work directly with physicists from CERN and Cloud Advocates from Microsoft. They will progress through these challenges to analyze data from LHCb and search for the “unexpected” in particle collisions:

    Data exploration and visualization
    Classification and anomaly detection
    Source control and automation
    AML experimentation
    AML for hyperparameter tuning
    Real-world application of data

    The OpenHack will be held Sept. 11-13 in northern Italy at Fondazione Bruno Kessler, a scientific research institute affiliated with CERN. Students need pay only for their travel and lodging – there is no registration fee for the OpenHack itself. We will help find lodging.

    The registration form is here. Please encourage your students to attend this unique training event and to contact monicar@microsoft.com with any questions.

    Eduardo Rodrigues
    @eduardo-rodrigues
    Hi @revkarol,FYI I've just sent this information to the HSF forum mailing list, and it got through (was afraid that it bounced as with my previous attempt).
    Are there any other contacts apart from the one above, from Microsoft?
    Eduardo Rodrigues
    @eduardo-rodrigues

    To @all:
    Registration for the PyHEP 2019 workshop has been extended to September 15th.

    As a reminder, the registration fees for the 2.5 days has been set at £80. It includes the venue, lunches, dinners, and refreshments.
    We still have rooms available at Cosener’s House, the venue, available on a first-come-first-served basis.

    The agenda is also shaping up with talks confirmed on topics ranging from histogramming, statistical methods, distributed workflows,
    visualisation, and even GPU-programming. Two speakers from industry are confirmed, including our keynote speaker on the PyViz visualisation project.

    Since the PyHEP series is all about growing a community, this year we’re also including a session of lighting talks
    where 30 people can present any topic of their choosing for 3 minutes, with a single slide, as a way for everyone,
    especially newcomers and early careers researchers, to introduce themselves.

    Community members can also propose presentations on any topic (email: pyhep2019-organisation@cern.ch).
    We are particularly interested in new(-ish) packages of broad relevance.

    Note that partial travel support for some U.S. participants (in particular, students and early-career postdocs)
    may be available from the IRIS-HEP institute. Please contact Peter Elmer (Peter.Elmer@cern.ch) to enquire about details.

    More details can be found on the indico page https://indico.cern.ch/e/PyHEP2019
    or from the PyHEP WG homepage http://hepsoftwarefoundation.org/activities/pyhep.html.
    You can also join the PyHEP WG Gitter channel (https://gitter.im/HSF/PyHEP) and/or
    the HSF forum (https://groups.google.com/forum/#!forum/hsf-forum) to get more information about the workshop and community.

    Hope to see you there!
    Eduardo Rodrigues & Ben Krikler, for the organising committee

    Eduardo Rodrigues
    @eduardo-rodrigues

    HSF PyHEP WG topical meeting on fitting tools, Sep. 11th @ 17h CET

    Dear Python enthusiasts,

    The HSF PyHEP WG is restarting activities post-Summer with topical meetings (not to be confused with the workshop in the UK ;-)).

    The first one will be on the hot and important topic of fitting (tools)! It will take place on Wednesday September 11th at 17h CET.
    The agenda, which you can find at https://indico.cern.ch/event/834210/, contains 2 presentations,
    one from HEP, and one from an astroparticle physics community colleague:

    • The zfit project, Jonas Eschle (Universitaet Zuerich)
    • Numpy-based Python fitting frameworks Astropy & Sherpa, Christoph Deil (MPI for Nuclear Physics, Heidelberg)

    Take this opportunity of cross-exchange to come and discuss needs, technical design, functionality requirements, etc.!

    Hoping to see you there!
    Eduardo, for the PyHEP WG conveners

    P.S.: Note that a second topical meeting on fitting tools will likely happen as a follow-up.

    benkrikler
    @benkrikler
    Has anyone here ever been involved with Hacktoberfest: https://hacktoberfest.digitalocean.com/ ?
    Luke Kreczko
    @kreczko

    Has anyone here ever been involved with Hacktoberfest: https://hacktoberfest.digitalocean.com/ ?

    have two t-shirts that say "yes, I have"

    Pratyush Das
    @reikdas
    @benkrikler Yes :)
    benkrikler
    @benkrikler
    Cool! I'm definitely going to join this year. And promise not to contribute to just my own project :p
    benkrikler
    @benkrikler
    I've just heard through the UK's Software Sustainability Institute of the US' Better Scientific Software community. They have a fellowship scheme for researchers that are affiliated to a US institute which lasts for a year and provides funds for specific activities. The application for 2020 is now open until mid-October: https://bssw.io/. Share around and let's see if we can't get some particle physicists on it :)
    benkrikler
    @benkrikler
    It's also open for all career stages from PhD students to senior professionals
    Jim Pivarski
    @jpivarski
    I reloaded my PyPI and GitHub statistics (notebook here) and there are a few interesting take-aways: this seems to be the year of Python for HEP.
    pip-installations on Scientific Linux distributions (i.e. a subset dominated by physicists):
    Same restriction (Scientific Linux distributions only), but now consider all PyPI packages:
    Something happened in 2017, but it wasn't Numpy-based and wasn't sustained like this year.
    Now instead of identifying physicists by choosing "Scientific Linux" as the distribution for PyPI, choose a subsets of physicists by looking at the GitHub users who forked CMSSW. What languages are their non-fork repositories in?
    Jim Pivarski
    @jpivarski
    When CMSSW went on GitHub (presumably May 2013), most of these users were writing C/C++, but now it's an even mix with Python. To clarify, let's normalize this stacked time histogram:
    C/C++ went from 60% in May 2013 to about 20% now. To make this easier to read, let's focus on three cases: "C/C++", "Python", and "Jupyter".
    The Python fraction actually hasn't been increasing; it's primarily Jupyter. Jupyter notebooks can be any language, but in another study I downloaded them all and counted instances of "include" (C/C++) and "import" (Python), and those Jupyter notebooks are overwhelmingly Python.
    Luke Kreczko
    @kreczko
    @jpivarski thanks for the overview, this is interesting to see.
    Eduardo Rodrigues
    @eduardo-rodrigues
    This is very interesting. Thanks @jpivarski for sharing.
    Tai Sakuma
    @TaiSakuma
    that is interesting
    Chris Burr
    @chrisburr

    The Python fraction actually hasn't been increasing

    I wonder if this is caused by people making more repositories when using notebooks. For example looking at @jpivarski's GitHub there are 9 classified as Python and 17 as Jupyter

    benkrikler
    @benkrikler
    It could be nice to put something like this on the PyHEP web-page?
    When you look at the language of a repo, do you just consider the dominant language for that repository or do you add all languages used in that repo, weighted by fraction of the repo, or by the number of lines of code, etc?
    Eduardo Rodrigues
    @eduardo-rodrigues
    Why not. At least this deserves a little report at the next HSF coord meeting, as PyHEP WG input.
    benkrikler
    @benkrikler
    It would be really interesting to study language per commit as well. If a repository is 50 / 50 C++ / python, but activity on the python side has picked up in the last few months (without changing the python line count much) that would be nice to see. I realise that's a lot more work to unpick though, since you need to check commit diffs but would give an additional angle on this trend
    Henry Schreiner
    @henryiii
    I would also keep in mind Scientific Linux is disappearing
    Jim Pivarski
    @jpivarski

    I would also keep in mind Scientific Linux is disappearing

    Right, which is too bad, given this one useful feature of being able to identify physicists in pip downloads data!

    @benkrikler The "language" is whatever GitHub decides the dominant language is, according to its algorithms. You can see that a large chunk, maybe 15% (yellow) is "(unknown)". These might be mixed repos. In the JSON response to the curl request for all of a user's repositories, it provides a "languages_url" with a "percent by file" breakdown of a repo's files by language, which could be used to do a more fine-grained study, at the cost of more curl requests. (An authenticated user gets 5000 per hour; I'd have to divide that over a few hours.) In the original study in March, I did that—but the results were not much different from the coarse-grained study, so I didn't go into that detail again.
    Jim Pivarski
    @jpivarski
    @chrisburr By language, my repos are or will be predominantly "TeX/LaTeX", since every talk is a separate repo. This is a pattern I got into with Overleaf, and when Overleaf stopped hosting its own git repository, deferring us to GitHub instead, now I'm filling up my GitHub account with lots of tiny TeX repos.
    @eduardo-rodrigues I had been preparing this for PyHEP at Oxford (I wanted one plot for uproot/awkward usage), but this digression into Python usage overall would get off-point for that talk. I could present this at a coordination meeting, allowing both talks to be more on point. I missed the first coordination meeting (the one with zfit), so I'd like to get these on my calendar. When is the next one?
    Eduardo Rodrigues
    @eduardo-rodrigues
    Hi @jpivarski, the date of the next WG meeting is not yet decided. To be discussed …
    benkrikler
    @benkrikler
    Does anyone have or know of a good entry-level numpy and / or uproot tutorial for a student who is transitioning from C++ to python?
    Hans Dembinski
    @HDembinski
    Hey all, I did some experiments with allocation in pybind11, that I want to share:
    https://github.com/HDembinski/pybind11_allocation_cost_demo
    It is all the in readme, feel free to check out the code and try the benchmarks for yourself.
    tl;dr: avoid allocating temporary objects from the Heap if you can, but don't worry about it too much. In most cases, it won't make a big difference.
    Jim Pivarski
    @jpivarski
    benkrikler
    @benkrikler
    Thanks Jim, that's great! Love the "why python" bit of the first notebook too. Can I re-use some of that material (with appropriate citations) in some up-coming talks of mine?
    Jim Pivarski
    @jpivarski
    @benkrikler You can we use the material without citations. Except for the parts from Jake Vanderpl
    Jake Vanderplas—which I cited as "stolen from" because I didn't explicitly get permission from him. (I think you can do the same, with citation.) They are good graphics, though.
    (Gitter submits a message when you go into another app to check on the spelling of something...)
    Hans Dembinski
    @HDembinski
    I have a contributor for boost::histogram who wants to remain anonymous. It turns out that this is surprising difficult.
    https://opensource.stackexchange.com/questions/7147/declaring-copyright-anonymously?newreg=5ec7ccfb79d947bd8cdbd3208f2b880c
    Eduardo Rodrigues
    @eduardo-rodrigues
    To avoid the pain, can you now ask him to give you full ownership? Unless he is making a major contributor, I would not bother to handle this special case, being pragmatic.
    Hans Dembinski
    @HDembinski
    To my understanding, this doesn't solve the problem, because there has to be a legally valid documented way of transferring the copyright. Any legally valid transfer of ownership also requires the other party to be identified - to my understanding.
    Hans Dembinski
    @HDembinski
    Putting things in the public domain is not easy, because many jurisdictions by default give full copyright to the creator if no statement is made. One has to actively declare to waive the rights, but such a statement remains ambiguous if the person who waives their rights remains anonymous.
    If this was valid, one could copy some copyrighted code, then publish this anonymously in the public domain. It would still be illegal to do this, but there would no one to prosecute and hold accountable.
    Any party who uses such code would be at risk of getting charged with lawsuits still, without being able to shift the blame elsewhere. So companies would not risk to use such code.
    Eduardo Rodrigues
    @eduardo-rodrigues
    I understand that route is tricky. Brute force: if I email and give you a piece of code for you to commit as yourself, basically, then that's the end of it, no, since all seems effectively to be yours, and I'm agreeing? The fact that it came from me is irrelevant by construction. Otherwise it gets vicious, I take your comments ...
    Hans Dembinski
    @HDembinski
    If you send me some code that I publish for you under my name, I could be liable for copyright infringement. You could decide to later sue me or worse, your code could be intellectual property of a third party, which could then sue me.
    I can prevent the first case by maintaining a legal record of that transfer of the copyright, but this is additional hassle for me. I would still get in trouble for the second thing, even though I could defend myself with that legal record of transfer of copyright