Luke Kreczko
@kreczko
Comparing the two solutions:
yours: 28366.0 microseconds
stackoverflow: 10999.0 microseconds
@JelleAalbers Where does the 10000 come from?
it's interesting how many solutions exist for the same operation
Jelle Aalbers
@JelleAalbers
It's just a placeholder, you can put in whatever the maximum value you expect to be is. If your numbers are e.g. arbitrary-sized floats I guess this solution doesn't work. Though you can probably replace 'squash' with some other function (if you want to go really overboard you can use some cryptographic hash function, though then forget about speed :-)
Jonas Eschle
@mayou36

Comparing the two solutions:
yours: 28366.0 microseconds
stackoverflow: 10999.0 microseconds

So essentially same speed for this exact problem. At this point, it matters, if: you call it once or a million times? How big is your array really? That's when things like presorting can make the difference. My advice: use which ever method you understand/like better (not from the speed, from the concept) and try only to improve on it if it proves to be a bottleneck.

Luke Kreczko
@kreczko
speed is important :). Your solution sped up the function by almost a factor 300 :)
benkrikler
@benkrikler
Exciting news everyone:

Registration is now open for PyHEP 2019, in Abingdon, UK, from the 16th to 18th of October! The registration fee for the 2.5 days has been set at £80; it includes the venue, lunches, dinners, and refreshments. We also have about 46 rooms at Cosener’s House, available on a first-come-first-served basis. The actual payment system will not be online for a few more days, however, so you’ll only be able to complete registration then including the room booking.

The agenda is also shaping up with talks confirmed on topics ranging from histogramming, statistical methods, distributed workflows, visualisation, and even GPU-programming. Several speakers from industry are confirmed, including our keynote speaker on the PyViz library.

Since the PyHEP series is all about growing a “Python in High Energy Physics” community, this year we’re also including a session of lighting talks where 30 people can present any topic of their choosing for 3 minutes with a single slide as a way for everyone, especially newcomers and early careers researchers, to introduce themselves.

Community members can also propose presentations on any topic (email: pyhep2019-organisation@cern.ch). We are particularly interested in new(-ish) packages of broad relevance.

More details can be found on the indico page (https://indico.cern.ch/e/PyHEP2019) or from the PyHEP WG homepage http://hepsoftwarefoundation.org/activities/pyhep.html. You can also join the HSF forum (https://groups.google.com/forum/#!forum/hsf-forum) to get more information about the workshop and community

Help us spread the word! :slight_smile:
Hans Dembinski
@HDembinski
pyhepmc-ng 0.4.2 was released today
Hans Dembinski
@HDembinski
@benkrikler I tried to complete my registration today, but during checkout I was not offered any payment options. There is a combobox which is supposed to show the options, but it is empty in my case. Is this a problem on my end or ...?
Chris Burr
@chrisburr
we are still finalising the payment system and will let know when this is available at the email address you use to register
Hans Dembinski
@HDembinski
Yes, but that was 12 days ago :) And Ben said: "however, so you’ll only be able to complete registration then including the room booking."
Not true in my case...
I can't book any rooms.
benkrikler
@benkrikler
Thanks Hans and Chris. Chris is correct. The payment system is still not set up (the company we've had to use have been extremely difficult to work with, I've been trying to reach them by the phone every working day for the last week). We'll let you know straight away, I expect to have it in the next day or two.
Hans Dembinski
@HDembinski
I am sorry to hear that Ben :(
It is no problem for me, I was just surprised, thanks for clearing this up!
Eduardo Rodrigues
@eduardo-rodrigues
Kind reminder on the PyHEP workshop: the available slots are being filled up at a nice pace, so don't delay registration too much, if you intend to come and participate - we hope you do! See https://indico.cern.ch/e/PyHEP2019
Chris Tunnell
@tunnell
@/all Interested in a Pythonic Postdoc? The XENON Dark Matter experiment software stack is in Python and there's a job to work in that direction (with ML component): https://jobs.rice.edu/postings/20856
If you know somebody who might be interested, feel free to share.
revkarol
@revkarol

Hi @all We are looking for PhD students in in physics, computer science, and data science to attend a three-day OpenHack in September to analyze real physics data from the LHCb experiment at CERN using Microsoft AI technologies.

An OpenHack is challenge rather than instruction-based. Students will work directly with physicists from CERN and Cloud Advocates from Microsoft. They will progress through these challenges to analyze data from LHCb and search for the “unexpected” in particle collisions:

Data exploration and visualization
Classification and anomaly detection
Source control and automation
AML experimentation
AML for hyperparameter tuning
Real-world application of data

The OpenHack will be held Sept. 11-13 in northern Italy at Fondazione Bruno Kessler, a scientific research institute affiliated with CERN. Students need pay only for their travel and lodging – there is no registration fee for the OpenHack itself. We will help find lodging.

The registration form is here. Please encourage your students to attend this unique training event and to contact monicar@microsoft.com with any questions.

Eduardo Rodrigues
@eduardo-rodrigues
Hi @revkarol,FYI I've just sent this information to the HSF forum mailing list, and it got through (was afraid that it bounced as with my previous attempt).
Are there any other contacts apart from the one above, from Microsoft?
Eduardo Rodrigues
@eduardo-rodrigues

To @all:
Registration for the PyHEP 2019 workshop has been extended to September 15th.

As a reminder, the registration fees for the 2.5 days has been set at £80. It includes the venue, lunches, dinners, and refreshments.
We still have rooms available at Cosener’s House, the venue, available on a first-come-first-served basis.

The agenda is also shaping up with talks confirmed on topics ranging from histogramming, statistical methods, distributed workflows,
visualisation, and even GPU-programming. Two speakers from industry are confirmed, including our keynote speaker on the PyViz visualisation project.

Since the PyHEP series is all about growing a community, this year we’re also including a session of lighting talks
where 30 people can present any topic of their choosing for 3 minutes, with a single slide, as a way for everyone,
especially newcomers and early careers researchers, to introduce themselves.

Community members can also propose presentations on any topic (email: pyhep2019-organisation@cern.ch).
We are particularly interested in new(-ish) packages of broad relevance.

Note that partial travel support for some U.S. participants (in particular, students and early-career postdocs)

More details can be found on the indico page https://indico.cern.ch/e/PyHEP2019
or from the PyHEP WG homepage http://hepsoftwarefoundation.org/activities/pyhep.html.
You can also join the PyHEP WG Gitter channel (https://gitter.im/HSF/PyHEP) and/or

Hope to see you there!
Eduardo Rodrigues & Ben Krikler, for the organising committee

Eduardo Rodrigues
@eduardo-rodrigues

HSF PyHEP WG topical meeting on fitting tools, Sep. 11th @ 17h CET

Dear Python enthusiasts,

The HSF PyHEP WG is restarting activities post-Summer with topical meetings (not to be confused with the workshop in the UK ;-)).

The first one will be on the hot and important topic of fitting (tools)! It will take place on Wednesday September 11th at 17h CET.
The agenda, which you can find at https://indico.cern.ch/event/834210/, contains 2 presentations,
one from HEP, and one from an astroparticle physics community colleague:

• The zfit project, Jonas Eschle (Universitaet Zuerich)
• Numpy-based Python fitting frameworks Astropy & Sherpa, Christoph Deil (MPI for Nuclear Physics, Heidelberg)

Take this opportunity of cross-exchange to come and discuss needs, technical design, functionality requirements, etc.!

Hoping to see you there!
Eduardo, for the PyHEP WG conveners

P.S.: Note that a second topical meeting on fitting tools will likely happen as a follow-up.

benkrikler
@benkrikler
Has anyone here ever been involved with Hacktoberfest: https://hacktoberfest.digitalocean.com/ ?
Luke Kreczko
@kreczko

Has anyone here ever been involved with Hacktoberfest: https://hacktoberfest.digitalocean.com/ ?

have two t-shirts that say "yes, I have"

Pratyush Das
@reikdas
@benkrikler Yes :)
benkrikler
@benkrikler
Cool! I'm definitely going to join this year. And promise not to contribute to just my own project :p
benkrikler
@benkrikler
I've just heard through the UK's Software Sustainability Institute of the US' Better Scientific Software community. They have a fellowship scheme for researchers that are affiliated to a US institute which lasts for a year and provides funds for specific activities. The application for 2020 is now open until mid-October: https://bssw.io/. Share around and let's see if we can't get some particle physicists on it :)
benkrikler
@benkrikler
It's also open for all career stages from PhD students to senior professionals
Jim Pivarski
@jpivarski
I reloaded my PyPI and GitHub statistics (notebook here) and there are a few interesting take-aways: this seems to be the year of Python for HEP.
pip-installations on Scientific Linux distributions (i.e. a subset dominated by physicists):
Same restriction (Scientific Linux distributions only), but now consider all PyPI packages:
Something happened in 2017, but it wasn't Numpy-based and wasn't sustained like this year.
Now instead of identifying physicists by choosing "Scientific Linux" as the distribution for PyPI, choose a subsets of physicists by looking at the GitHub users who forked CMSSW. What languages are their non-fork repositories in?
Jim Pivarski
@jpivarski
When CMSSW went on GitHub (presumably May 2013), most of these users were writing C/C++, but now it's an even mix with Python. To clarify, let's normalize this stacked time histogram:
C/C++ went from 60% in May 2013 to about 20% now. To make this easier to read, let's focus on three cases: "C/C++", "Python", and "Jupyter".
The Python fraction actually hasn't been increasing; it's primarily Jupyter. Jupyter notebooks can be any language, but in another study I downloaded them all and counted instances of "include" (C/C++) and "import" (Python), and those Jupyter notebooks are overwhelmingly Python.
Luke Kreczko
@kreczko
@jpivarski thanks for the overview, this is interesting to see.
Eduardo Rodrigues
@eduardo-rodrigues
This is very interesting. Thanks @jpivarski for sharing.
Tai Sakuma
@TaiSakuma
that is interesting
Chris Burr
@chrisburr

The Python fraction actually hasn't been increasing

I wonder if this is caused by people making more repositories when using notebooks. For example looking at @jpivarski's GitHub there are 9 classified as Python and 17 as Jupyter

benkrikler
@benkrikler
It could be nice to put something like this on the PyHEP web-page?
When you look at the language of a repo, do you just consider the dominant language for that repository or do you add all languages used in that repo, weighted by fraction of the repo, or by the number of lines of code, etc?
Eduardo Rodrigues
@eduardo-rodrigues
Why not. At least this deserves a little report at the next HSF coord meeting, as PyHEP WG input.
benkrikler
@benkrikler
It would be really interesting to study language per commit as well. If a repository is 50 / 50 C++ / python, but activity on the python side has picked up in the last few months (without changing the python line count much) that would be nice to see. I realise that's a lot more work to unpick though, since you need to check commit diffs but would give an additional angle on this trend
Henry Schreiner
@henryiii
I would also keep in mind Scientific Linux is disappearing
Jim Pivarski
@jpivarski

I would also keep in mind Scientific Linux is disappearing

Right, which is too bad, given this one useful feature of being able to identify physicists in pip downloads data!

@benkrikler The "language" is whatever GitHub decides the dominant language is, according to its algorithms. You can see that a large chunk, maybe 15% (yellow) is "(unknown)". These might be mixed repos. In the JSON response to the curl request for all of a user's repositories, it provides a "languages_url" with a "percent by file" breakdown of a repo's files by language, which could be used to do a more fine-grained study, at the cost of more curl requests. (An authenticated user gets 5000 per hour; I'd have to divide that over a few hours.) In the original study in March, I did that—but the results were not much different from the coarse-grained study, so I didn't go into that detail again.