Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
    Jim Pivarski
    I combined C and C++ into a single category (I don't think GitHub can clearly distinguish them), and we see that C/C++ and Python are reaching a crossing point right now. Also, we see that a lot of physicists have TeX repositories (makes sense; I have a lot of those, too), and Jupyter Notebooks are on the rise as well.
    To generalize this beyond CMSSW, does anyone have a suggestion of other repositories to use as a seed, other than cms-sw/cmssw? Do the other collaborations encourage all their members to fork the collaboration software?
    Jonas Eschle
    actually, if you would extract how many jupter notebooks are run with Python, we may have even crossed this already
    But very nice, quite interesting
    Jim Pivarski
    GitHub doesn't give that information. It just says "Jupyter Notebook", which they must do some weighting for because in the "languages by number of lines" breakdown, Jupyter wins because its JSON is verbose.
    https://github.com/jpivarski/jupyter-performance-studies/blob/master/github-physicists.ipynb has the analysis but not the raw data or the GitHub REST calls; I need to package those up. (It was a work in progress.)
    But I'd like to know about other seed repos to round out the definition of "physicist GitHub user". This sample is pretty close to 100% CMS, and maybe there's a peculiarity in that sample.
    From your last plot there, I'd say the fraction of "python" has been stable, but c++ has been losing to "Jupyter notebooks". I wonder then how much this has been Jupyter with C++ /ROOT kernel, and Jupyter with the python kernel. But yeah this is really nice!
    The only two other collaborations I've been involved with use private gitlab instances for their main repos, so I don't think that would help! You could seed with all repositories that have keywords like Dark Matter / LHC / particle physics / high-energy physics in the title or their descriptions and at least N (=10?) forks.
    Nicholas Smith
    does gitlab @ cern publish stats like this?
    Jim Pivarski
    @benkrikler That's a good point; Python has been growing slowly and Jupyter has been growing quickly. Determining if those Jupyter Notebooks contain C++ code or Python code would require cloning the repos or otherwise getting the content of the files. Maybe. Let me think about that. (Maybe I can do a GitHub search by file language and contains "import" vs "include"?)
    Raymond Ehlers
    @jpivarski We also fork our collaboration software at ALICE: https://github.com/alisw/AliPhysics/ (You could also look at alisw/AliRoot, but I think you'll get a better snapshot of analyzers with AliPhysics)
    Jim Pivarski
    @raymondEhlers Thanks! I'll include that.

    Meanwhile, I think I've answered the question about Jupyter Notebooks: they seem to be exclusively Python. I can do searches through the API, though they're more rate-limited and I have to wait longer, so I have the gathering script print out results as it goes.

    For each repo that GitHub labels as "Jupyter Notebook", I do two searches: one for the word "include" and the other for the word "import". If imports outnumber includes, I label it as Python. I've manually followed up a few cases with non-zero "includes"; they've all been in markdown cells.

    FredStober/sandbox                                 0 vs 10        Python
    michelif/bayesian_opt_skopt                        0 vs 1        Python
    michelif/HHbbgg_ETH                                1 vs 18        Python
    michelif/quickMLTests                              0 vs 0        ???
    terrenceedmonds/titanic                            0 vs 2        Python
    hbakhshi/Analysis13TeV                             0 vs 0        ???
    clint-richardson/BU-TheBus                         0 vs 1        Python
    clint-richardson/NBA-Data                          0 vs 0        ???
    clint-richardson/X53AnalysisDemo                   0 vs 2        Python
    zhangzc11/Pi0Net                                   0 vs 1        Python
    joseph-taylor/LjLabBook                            0 vs 5        Python
    mukundvarma/kaggle-instacart                       1 vs 2        Python
    A-lxe/study-csc-daq-rate                           0 vs 1        Python
    vlimant/summer15-ArashJofrehei                     0 vs 61        Python
    vlimant/summer15-Irene                             0 vs 12        Python
    vlimant/summer15-MarinaKolosova                    1 vs 45        Python
    vlimant/summer15-SahandSeif                        0 vs 25        Python
    vlimant/summer16-NikolausHowe                      0 vs 24        Python
    vlimant/surf17-tutorial                            0 vs 4        Python
    vlimant/surf18-tutorial                            0 vs 7        Python
    davidlange6/gsocStudentSolutions                   0 vs 3        Python
    davidlange6/toy_notebooks                          2 vs 14        Python
    jbueghly/hzg_analysis                              0 vs 5        Python
    kaylanb/skinapp                                    0 vs 0        ???
    kaylanb/thesis_code                                0 vs 2        Python
    lihux25/Projects                                   1 vs 3        Python
    ArnabPurohit/Machine-Learning-applications-in-HEP  0 vs 1        Python
    nhaubrich/biophysics                               0 vs 8        Python
    emc5ud/rosalind-solutions                          1 vs 6        Python
    nmehrle/echelle                                    0 vs 1        Python
    zaixingmao/retina                                  1 vs 2        Python
    fmanteca/ImageClassification                       0 vs 1        Python
    alkaplan/jupyter-notebooks                         2 vs 6        Python
    patrykel/multi-tracking-notebooks                  0 vs 6        Python
    patrykel/MultitrackingMasterProject                0 vs 2        Python
    cfangmeier/HHC                                     0 vs 2        Python
    cfangmeier/Small                                   1 vs 4        Python
    cfangmeier/TTTT                                    0 vs 1        Python
    cfangmeier/UNL-Gantry-Encapsulation-Monitoring     0 vs 2        Python
    jiafulow/L1TMuonDocsNov2018                        1 vs 1        ???
    jiafulow/L1TMuonSimulationsMar2017                 3 vs 18        Python
    jiafulow/UF-slurm                                  0 vs 0        ???
    monttj/computational-physics                       1 vs 12        Python
    NJManganelli/TaggerTest                            0 vs 1        Python
    sciencecw/cmsjupyter                               1 vs 14        Python
    lecriste/first-binder                              0 vs 1        Python
    mzanetti79/LaboratoryOfComputationalPhysics        0 vs 11        Python
    mzanetti79/ML-INFN                                 2 vs 34        Python
    mzanetti79/MLCC18                                  0 vs 23        Python
    bpenning/jupyter_repo                              0 vs 22        Python
    bencammett/ML_project_Comp2                        6 vs 12        Python
    hbprosper/ENHEP                                    1 vs 8        Python
    hbprosper/eshep_tutorials                          0 vs 13        Python
    Jonas Eschle
    So we crossed the point already and physics analysis is dominated by python? :)
    Jim Pivarski
    Since it looks like we can add the Jupyter count to the Python column, yes. That just happened.

    Actually, in my slow-moving scan of Jupyter notebooks, I've finally come across two legitimate C++ Jupyter repos: https://github.com/gudrutis/jupyter-book-tutorials/search?utf8=%E2%9C%93&q=include&type= and https://github.com/javadebadi/learning_cpp_again/search?q=include&unscoped_q=include

    These are the first 2 out of 91.

    Henry Schreiner
    This won’t work in C++20. :)
    Jim Pivarski
    Nope. That other analysis that identifies physicists by having "Scientific" in their Linux distribution name will fail in the near future, too.
    @raymondEhlers I have results from Alice, and it's quite different. Alice is considerably more C/C++ than Python.

    It would be very interesting to find out what the other collaborations are doing. I've looked into the GitLab API—it functions on gitlab.cern.ch, but I wasn't able to repeat any of these queries without figuring out its (different) authentication mechanism. And even then, I might have to be a member of a collaboration to see its users. If there's a culture of "in-development analysis is private, even from other members of the collaboration," then there might not be anything any one user can do to get a global picture.

    Does anyone have any other suggestions? (GitHub preferred; I already have the scripts.)

    Luke Kreczko

    @jpivarski Collaborations like LZ use mostly C++ (private GitLab), Xenon1T mostly Python (on Github).

    You can always try to get representatives from the collaboration to run a script to give you the breakdown if you want the exact numbers

    Doug Davis
    I would guess ATLAS is considerably more C++ than Python, but the balance is shifting.
    Hard to measure with ATLAS heavily using gitlab.cern.ch and most members defaulting to private repos
    Raymond Ehlers
    @jpivarski Thanks for sharing! That's about what I would have guessed. I've tried to encourage python, but with only some success :-)
    Luke Kreczko
    @raymondEhlers "Python is slow", huh? ;)
    Martin Ritter
    @jpivarski I would not dare to judge how the distribution is in Belle2. I can say we only teach python (pandas,mpl) for beginners but there's a large fraction of people coming over from Belle and they have a very high inertia and prefer to use "ROOT macros". However I can tell you that any Belle2 member that would put their analysis on github/gitlab would be definitely the ones using python for analysis so I'd expect a heavy bias there.
    Hans Dembinski
    @jpivarski Thank you for this awesome analysis. According to the voluntary survey 2018 that I conducted within the LHCb collaboration, half of the LHCb members use mainly Python. It is similar to your CMS results.
    Hard data (even with caveats, perhaps Python users prefer Github??) such as yours is even more convincing than personal statements
    Jim Pivarski
    @HDembinski Could you point me to that survey?
    I am trying to dig up the URL of the actual poll now...
    Some of the free-form text answers are quite interesting :)
    Matthew Feickert

    @HDembinski As I've been trying to figure out the issues that pyhf is having with iminuit this weekend I've run into a problem where installing iminuit in a Unbuntu 18.04 Docker image with Python 3.6.8 installed from source on it fails. I have a short Gist that describes what's going on, and if you have any thoughts on what to think about with regards to what is going wrong that would be great:


    Martin Ritter
    @matthewfeickert sounds like the so would have been compiled with gcc instead of g++. Your install_python.sh passes gcc as with-cxx-main, maybe that should be g++?
    Matthew Feickert
    @daritter Thanks for the quick reply. That's a very interesting point. Let me flip that and see how things go (I'll report back morning CST time). Thanks!
    Matthew Feickert
    @daritter Actually, just tried swapping out my CXX_VERSION="$(which gcc)" for CXX_VERSION="$(which g++)" and tried to rebuild the Docker image, but this just results in CPython complaining loudly and then failing during the build. So unless I'm doing something stupid I guess that needs to be gcc.
    Martin Ritter
    Then I have no clear idea. I usually don't specify with-cxx-main flag. What does your sysconfig.get_config_var('CXX') and LDCXXSHARED say?
    Henry Schreiner
    From what I understand here, this will also be picked up from CXX if not given, so it really seems like it should be g++. Never tried passing it explicitly either.
    Henry Schreiner
    Looks like the problem is a bug in Python: https://bugs.python.org/issue23644 - it seems to be trying to build with stdatomic, which is C++ only.
    Matthew Feickert

    From the looks of it @henryiii's and @daritter 's suggestion of removing the with-cxx-main flag seems to do the trick. I rebuilt and was able to build through iminuit in the container. :+1: I'll need to do some experimentation with the configure options, but I found the following from the old SVN Python 2.7 trunk very helpful

    --with-cxx-main=<compiler>: If you plan to use C++ extension modules, then -- on some platforms -- you need to compile python's main() function with the C++ compiler. With this option, make will use <compiler> to compile main() and to link the python executable. It is likely that the resulting executable depends on the C++ runtime library of <compiler>. (The default is --without-cxx-main.)

    There are platforms that do not require you to build Python with a C++ compiler in order to use C++ extension modules. E.g., x86 Linux with ELF shared binaries and GCC 3.x, 4.x is such a platform. We recommend that you configure Python --without-cxx-main on those platforms because a mismatch between the C++ compiler version used to build Python and to build a C++ extension module is likely to cause a crash at runtime.

    The Python installation also stores the variable CXX that determines, e.g., the C++ compiler distutils calls by default to build C++ extensions. If you set CXX on the configure command line to any string of non-zero length, then configure won't change CXX. If you do not preset CXX but pass --with-cxx-main=<compiler>, then configure sets CXX=<compiler>. In all other cases, configure looks for a C++ compiler by some common names (c++, g++, gcc, CC, cxx, cc++, cl) and sets CXX to the first compiler it finds. If it does not find any C++ compiler, then it sets CXX="".

    Similarly, if you want to change the command used to link the python executable, then set LINKCC on the configure command line.

    Kinda unfortunate that I can't find that level of detail in modern Python, but maybe I'm not searching hard enough through CPython's GitHub

    Matthew Feickert

    @henryiii This information on https://bugs.python.org/issue23644 is also very nice. Thanks for taking the time to go find it!

    I don't think that CPython can be built by g++. - STINNER Victor

    This was exactly why I originally had it set to which gcc, but it seems that not explicitly setting compiler flags is the way to go.

    Matthew Feickert

    @daritter @henryiii Thanks to your help the problem is now resolved: matthewfeickert/Docker-Python3-Ubuntu#3

    @HDembinski Please ignore my ping as the issue no longer exists.

    Henry Schreiner
    I have just released a series of posts over Azure DevOps, ending with a tutorial on building wheels for a non-trivial binary package (boost-histogram). Start here if you are interested!
    Matthew Feickert
    These look beautifully written @henryiii!
    Patrick Bos
    impressive posts @henryiii!