Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Jim Pivarski
    @jpivarski
    https://github.com/jpivarski/jupyter-performance-studies/blob/master/github-physicists.ipynb has the analysis but not the raw data or the GitHub REST calls; I need to package those up. (It was a work in progress.)
    But I'd like to know about other seed repos to round out the definition of "physicist GitHub user". This sample is pretty close to 100% CMS, and maybe there's a peculiarity in that sample.
    benkrikler
    @benkrikler
    From your last plot there, I'd say the fraction of "python" has been stable, but c++ has been losing to "Jupyter notebooks". I wonder then how much this has been Jupyter with C++ /ROOT kernel, and Jupyter with the python kernel. But yeah this is really nice!
    The only two other collaborations I've been involved with use private gitlab instances for their main repos, so I don't think that would help! You could seed with all repositories that have keywords like Dark Matter / LHC / particle physics / high-energy physics in the title or their descriptions and at least N (=10?) forks.
    Nicholas Smith
    @nsmith-
    does gitlab @ cern publish stats like this?
    Jim Pivarski
    @jpivarski
    @benkrikler That's a good point; Python has been growing slowly and Jupyter has been growing quickly. Determining if those Jupyter Notebooks contain C++ code or Python code would require cloning the repos or otherwise getting the content of the files. Maybe. Let me think about that. (Maybe I can do a GitHub search by file language and contains "import" vs "include"?)
    Raymond Ehlers
    @raymondEhlers
    @jpivarski We also fork our collaboration software at ALICE: https://github.com/alisw/AliPhysics/ (You could also look at alisw/AliRoot, but I think you'll get a better snapshot of analyzers with AliPhysics)
    Jim Pivarski
    @jpivarski
    @raymondEhlers Thanks! I'll include that.

    Meanwhile, I think I've answered the question about Jupyter Notebooks: they seem to be exclusively Python. I can do searches through the API, though they're more rate-limited and I have to wait longer, so I have the gathering script print out results as it goes.

    For each repo that GitHub labels as "Jupyter Notebook", I do two searches: one for the word "include" and the other for the word "import". If imports outnumber includes, I label it as Python. I've manually followed up a few cases with non-zero "includes"; they've all been in markdown cells.

    FredStober/sandbox                                 0 vs 10        Python
    michelif/bayesian_opt_skopt                        0 vs 1        Python
    michelif/HHbbgg_ETH                                1 vs 18        Python
    michelif/quickMLTests                              0 vs 0        ???
    terrenceedmonds/titanic                            0 vs 2        Python
    hbakhshi/Analysis13TeV                             0 vs 0        ???
    clint-richardson/BU-TheBus                         0 vs 1        Python
    clint-richardson/NBA-Data                          0 vs 0        ???
    clint-richardson/X53AnalysisDemo                   0 vs 2        Python
    zhangzc11/Pi0Net                                   0 vs 1        Python
    joseph-taylor/LjLabBook                            0 vs 5        Python
    mukundvarma/kaggle-instacart                       1 vs 2        Python
    A-lxe/study-csc-daq-rate                           0 vs 1        Python
    vlimant/summer15-ArashJofrehei                     0 vs 61        Python
    vlimant/summer15-Irene                             0 vs 12        Python
    vlimant/summer15-MarinaKolosova                    1 vs 45        Python
    vlimant/summer15-SahandSeif                        0 vs 25        Python
    vlimant/summer16-NikolausHowe                      0 vs 24        Python
    vlimant/surf17-tutorial                            0 vs 4        Python
    vlimant/surf18-tutorial                            0 vs 7        Python
    davidlange6/gsocStudentSolutions                   0 vs 3        Python
    davidlange6/toy_notebooks                          2 vs 14        Python
    jbueghly/hzg_analysis                              0 vs 5        Python
    kaylanb/skinapp                                    0 vs 0        ???
    kaylanb/thesis_code                                0 vs 2        Python
    lihux25/Projects                                   1 vs 3        Python
    ArnabPurohit/Machine-Learning-applications-in-HEP  0 vs 1        Python
    nhaubrich/biophysics                               0 vs 8        Python
    emc5ud/rosalind-solutions                          1 vs 6        Python
    nmehrle/echelle                                    0 vs 1        Python
    zaixingmao/retina                                  1 vs 2        Python
    fmanteca/ImageClassification                       0 vs 1        Python
    alkaplan/jupyter-notebooks                         2 vs 6        Python
    patrykel/multi-tracking-notebooks                  0 vs 6        Python
    patrykel/MultitrackingMasterProject                0 vs 2        Python
    cfangmeier/HHC                                     0 vs 2        Python
    cfangmeier/Small                                   1 vs 4        Python
    cfangmeier/TTTT                                    0 vs 1        Python
    cfangmeier/UNL-Gantry-Encapsulation-Monitoring     0 vs 2        Python
    jiafulow/L1TMuonDocsNov2018                        1 vs 1        ???
    jiafulow/L1TMuonSimulationsMar2017                 3 vs 18        Python
    jiafulow/UF-slurm                                  0 vs 0        ???
    monttj/computational-physics                       1 vs 12        Python
    NJManganelli/TaggerTest                            0 vs 1        Python
    sciencecw/cmsjupyter                               1 vs 14        Python
    lecriste/first-binder                              0 vs 1        Python
    mzanetti79/LaboratoryOfComputationalPhysics        0 vs 11        Python
    mzanetti79/ML-INFN                                 2 vs 34        Python
    mzanetti79/MLCC18                                  0 vs 23        Python
    bpenning/jupyter_repo                              0 vs 22        Python
    bencammett/ML_project_Comp2                        6 vs 12        Python
    hbprosper/ENHEP                                    1 vs 8        Python
    hbprosper/eshep_tutorials                          0 vs 13        Python
    Jonas Eschle
    @jonas-eschle
    So we crossed the point already and physics analysis is dominated by python? :)
    Jim Pivarski
    @jpivarski
    Since it looks like we can add the Jupyter count to the Python column, yes. That just happened.

    Actually, in my slow-moving scan of Jupyter notebooks, I've finally come across two legitimate C++ Jupyter repos: https://github.com/gudrutis/jupyter-book-tutorials/search?utf8=%E2%9C%93&q=include&type= and https://github.com/javadebadi/learning_cpp_again/search?q=include&unscoped_q=include

    These are the first 2 out of 91.

    Henry Schreiner
    @henryiii
    This won’t work in C++20. :)
    Jim Pivarski
    @jpivarski
    Nope. That other analysis that identifies physicists by having "Scientific" in their Linux distribution name will fail in the near future, too.
    @raymondEhlers I have results from Alice, and it's quite different. Alice is considerably more C/C++ than Python.
    github-alice-lin.png
    github-alice-log.png
    github-alice-frac.png

    It would be very interesting to find out what the other collaborations are doing. I've looked into the GitLab API—it functions on gitlab.cern.ch, but I wasn't able to repeat any of these queries without figuring out its (different) authentication mechanism. And even then, I might have to be a member of a collaboration to see its users. If there's a culture of "in-development analysis is private, even from other members of the collaboration," then there might not be anything any one user can do to get a global picture.

    Does anyone have any other suggestions? (GitHub preferred; I already have the scripts.)

    Luke Kreczko
    @kreczko

    @jpivarski Collaborations like LZ use mostly C++ (private GitLab), Xenon1T mostly Python (on Github).

    You can always try to get representatives from the collaboration to run a script to give you the breakdown if you want the exact numbers

    Doug Davis
    @douglasdavis
    I would guess ATLAS is considerably more C++ than Python, but the balance is shifting.
    Hard to measure with ATLAS heavily using gitlab.cern.ch and most members defaulting to private repos
    Raymond Ehlers
    @raymondEhlers
    @jpivarski Thanks for sharing! That's about what I would have guessed. I've tried to encourage python, but with only some success :-)
    Luke Kreczko
    @kreczko
    @raymondEhlers "Python is slow", huh? ;)
    Martin Ritter
    @daritter
    @jpivarski I would not dare to judge how the distribution is in Belle2. I can say we only teach python (pandas,mpl) for beginners but there's a large fraction of people coming over from Belle and they have a very high inertia and prefer to use "ROOT macros". However I can tell you that any Belle2 member that would put their analysis on github/gitlab would be definitely the ones using python for analysis so I'd expect a heavy bias there.
    Hans Dembinski
    @HDembinski
    @jpivarski Thank you for this awesome analysis. According to the voluntary survey 2018 that I conducted within the LHCb collaboration, half of the LHCb members use mainly Python. It is similar to your CMS results.
    Hard data (even with caveats, perhaps Python users prefer Github??) such as yours is even more convincing than personal statements
    Jim Pivarski
    @jpivarski
    @HDembinski Could you point me to that survey?
    I am trying to dig up the URL of the actual poll now...
    Some of the free-form text answers are quite interesting :)
    Matthew Feickert
    @matthewfeickert

    @HDembinski As I've been trying to figure out the issues that pyhf is having with iminuit this weekend I've run into a problem where installing iminuit in a Unbuntu 18.04 Docker image with Python 3.6.8 installed from source on it fails. I have a short Gist that describes what's going on, and if you have any thoughts on what to think about with regards to what is going wrong that would be great:

    https://gist.github.com/matthewfeickert/284cbddc4a60aca2dcda29c354189b35

    Martin Ritter
    @daritter
    @matthewfeickert sounds like the so would have been compiled with gcc instead of g++. Your install_python.sh passes gcc as with-cxx-main, maybe that should be g++?
    Matthew Feickert
    @matthewfeickert
    @daritter Thanks for the quick reply. That's a very interesting point. Let me flip that and see how things go (I'll report back morning CST time). Thanks!
    Matthew Feickert
    @matthewfeickert
    @daritter Actually, just tried swapping out my CXX_VERSION="$(which gcc)" for CXX_VERSION="$(which g++)" and tried to rebuild the Docker image, but this just results in CPython complaining loudly and then failing during the build. So unless I'm doing something stupid I guess that needs to be gcc.
    Martin Ritter
    @daritter
    Then I have no clear idea. I usually don't specify with-cxx-main flag. What does your sysconfig.get_config_var('CXX') and LDCXXSHARED say?
    Henry Schreiner
    @henryiii
    From what I understand here, this will also be picked up from CXX if not given, so it really seems like it should be g++. Never tried passing it explicitly either.
    Henry Schreiner
    @henryiii
    Looks like the problem is a bug in Python: https://bugs.python.org/issue23644 - it seems to be trying to build with stdatomic, which is C++ only.
    Matthew Feickert
    @matthewfeickert

    From the looks of it @henryiii's and @daritter 's suggestion of removing the with-cxx-main flag seems to do the trick. I rebuilt and was able to build through iminuit in the container. :+1: I'll need to do some experimentation with the configure options, but I found the following from the old SVN Python 2.7 trunk very helpful

    --with-cxx-main=<compiler>: If you plan to use C++ extension modules, then -- on some platforms -- you need to compile python's main() function with the C++ compiler. With this option, make will use <compiler> to compile main() and to link the python executable. It is likely that the resulting executable depends on the C++ runtime library of <compiler>. (The default is --without-cxx-main.)

    There are platforms that do not require you to build Python with a C++ compiler in order to use C++ extension modules. E.g., x86 Linux with ELF shared binaries and GCC 3.x, 4.x is such a platform. We recommend that you configure Python --without-cxx-main on those platforms because a mismatch between the C++ compiler version used to build Python and to build a C++ extension module is likely to cause a crash at runtime.

    The Python installation also stores the variable CXX that determines, e.g., the C++ compiler distutils calls by default to build C++ extensions. If you set CXX on the configure command line to any string of non-zero length, then configure won't change CXX. If you do not preset CXX but pass --with-cxx-main=<compiler>, then configure sets CXX=<compiler>. In all other cases, configure looks for a C++ compiler by some common names (c++, g++, gcc, CC, cxx, cc++, cl) and sets CXX to the first compiler it finds. If it does not find any C++ compiler, then it sets CXX="".

    Similarly, if you want to change the command used to link the python executable, then set LINKCC on the configure command line.

    Kinda unfortunate that I can't find that level of detail in modern Python, but maybe I'm not searching hard enough through CPython's GitHub

    Matthew Feickert
    @matthewfeickert

    @henryiii This information on https://bugs.python.org/issue23644 is also very nice. Thanks for taking the time to go find it!

    I don't think that CPython can be built by g++. - STINNER Victor

    This was exactly why I originally had it set to which gcc, but it seems that not explicitly setting compiler flags is the way to go.

    Matthew Feickert
    @matthewfeickert

    @daritter @henryiii Thanks to your help the problem is now resolved: matthewfeickert/Docker-Python3-Ubuntu#3

    @HDembinski Please ignore my ping as the issue no longer exists.

    Henry Schreiner
    @henryiii
    I have just released a series of posts over Azure DevOps, ending with a tutorial on building wheels for a non-trivial binary package (boost-histogram). Start here if you are interested!
    Matthew Feickert
    @matthewfeickert
    These look beautifully written @henryiii!
    Patrick Bos
    @egpbos
    impressive posts @henryiii!
    benkrikler
    @benkrikler
    @henryiii I tried setting up Azure CI with github for a project a few days ago and looked at your work in scikit-hep/particle for some guidance. I couldn't get llvmlite to install for linux and Python2 however, have you faced this issue at all?
    Henry Schreiner
    @henryiii
    I haven’t tried (I guess this is for Numba?), but llvmlight does at least has a manylinux1 python2.7 wheel, so I would have expected it to download that and “Just Work”. What problem were you seeing?
    benkrikler
    @benkrikler
    Yes, thats right, it was for Numba. I was installing that from pip, but I didn't try conda, actually; I suppose that would be more reliable. In the default linux agent it was only finding LLVM 3.8, but llvmlite requires 7.0 or greater.
    Henry Schreiner
    @henryiii
    The easiest way would be to add a container image and use docker - should be a one line addition. I'll check a pip installer later. Python 3 worked, I guess from your comment?