Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Nov 25 09:34
    kreczko assigned #117
  • Nov 23 18:25
    kreczko review_requested #142
  • Nov 23 18:20
    kreczko synchronize #142
  • Nov 23 18:16
    kreczko synchronize #142
  • Nov 23 18:15
    kreczko synchronize #142
  • Nov 23 15:10
    kreczko opened #142
  • Nov 15 17:48
    dependabot[bot] labeled #45
  • Nov 15 17:48
    dependabot[bot] opened #45
  • Nov 15 17:48

    dependabot[bot] on pip

    Bump pip from 19.3.1 to 21.1 B… (compare)

  • Nov 09 18:43
    DBAnthony synchronize #70
  • Nov 08 09:53

    kreczko on master

    docs: fixed pip install instruc… (compare)

  • Nov 05 17:59
    DBAnthony synchronize #70
  • Nov 05 17:56
    DBAnthony synchronize #70
  • Nov 05 13:04
    DBAnthony synchronize #70
  • Nov 05 11:51
    DBAnthony synchronize #70
  • Nov 05 11:49
    DBAnthony synchronize #70
  • Sep 28 12:20
    DBAnthony closed #71
  • Aug 10 15:18
    DBAnthony synchronize #70
  • Jul 29 17:54
    DBAnthony synchronize #71
  • Jul 29 17:44
    DBAnthony opened #71
Jim B
@jimbrooke
if we have arrays of per-event objects, we know how to write the code...
Luke Kreczko
@kreczko
then you need to remove a loop at a time. I've done a similar exercise with https://gitlab.cern.ch/fast-hep/public/challenges/-/tree/master/challenges%2Ffast_selection%2Fchallengers (selecting on jets):
2 loops (events, jets) --> vectorised jets (1loop) -->vectorized events (fulylnumpy)
speed changes, ofc (~ 1 mio events)
 | 1 | cpp_for_loops    |   7.45213 |
 | 2 | numpy            |  21.7677  |
 | 3 | simple_for_loop  |  55.3772  |
 | 4 | simple_for_loops | 262.806   |
Jim B
@jimbrooke
OK, thanks.
Will Taylor
@wctaylor_gitlab
@benkrikler Well, I was trying it on LUX data - each event is confined to have 10 pulses. So each field is an N x 10 array, where N is the number of events. In the example above, np.shape() returns (285, 10). It does work on LZ data, which does look to be jagged arrays
So would there be a straightforward way to have it work on LUX data, or that would involve some massive restructuring of things on the backend?
benkrikler
@benkrikler

Hi everyone, we're having an issue using np.diff on a jagged array, anyone have any alternatives for a jagged array or have any advice?

@gr16799, what are you expecting to see here? I would imagine this is a 2D jagged array with a property of some object (e.g. the amplitude or time of a pulse or hit) and a varying number such objects per event. And then do you want to compare the difference between adjacent objects within each event (e.g. the time between consecutive hits within an event)?

If so, Luke's suggestion of flattening the arrays first will only partially work because you'll end up with the difference between the last object of one event and the first object from the next event. Typically, for simple 1D arrays, the difference should return an array with one less element in it that the input. If what you want is what I describe above, then this would translate to be one fewer element per event. Either way, you'd need to be careful with the flattening --> np.diff --> unflattening procedure.
benkrikler
@benkrikler

is there a pattern for looping over the axis=0 indices of a jaggedarray, and accessing the axis=1 contents as 1D arrays ? clearly this will be inefficient, but it might be less of a headstretcher

@jimbrooke I think this is just a single for loop over the original 2D jagged array, e.g.

anArray = awkard.JaggedArray.fromiter([[0, 3], [1], [4, 10, 4, 2], [], [90, 234234, 2]])

for event in anArray:
    print(event)

which should show:

[0, 3]
[1]
[4, 10, 4, 2]
[]
[90, 234234, 2]
But that will definitely not be very performant. You can either think of moving towards vectorised manipulations, or you could write the for loop and use something like numba to just-in-time compile it: https://numba.pydata.org/
benkrikler
@benkrikler

So would there be a straightforward way to have it work on LUX data, or that would involve some massive restructuring of things on the backend?

@wctaylor_gitlab From what you add there, my guess is that it's probably a relatively simple change. I suspect that this is some issue with it being a numpy array rather than a 2D jagged array that happens to have fixed lengths for the second dimension, but I'd need to see more. Do you have a test file and carpenter config I can look at? Feel free to send over privately if you prefer

Will Taylor
@wctaylor_gitlab
Sure, I'll DM it
Jim B
@jimbrooke

But that will definitely not be very performant. You can either think of moving towards vectorised manipulations, or you could write the for loop and use something like numba to just-in-time compile it: https://numba.pydata.org/

Thanks. Yeah, I had realised that i just need to ask the jaggedarray for an iterator

for sure vectorised manipulations would be preferable, but some of numpy functions seem to be incompatible with jagged arrays, which makes it... harder
benkrikler
@benkrikler
Yeah, I think awkward 1.0 will make this sort of thing better. I'm not sure what the current timeline is for that though. Should become official ~soon I believe
Another way that occurred to me for the diff stuff (assuming that my interpretation above was right) is to do:
diff = anArray[:, 1:] - anArray[:, :-1]
Jim B
@jimbrooke
ah, yeah - i'll try that maybe
benkrikler
@benkrikler
For the example array I showed above, I get:
In [6]: diff = anArray[:, 1:] - anArray[:, :-1]                                                                                                                                                                

In [7]: diff                                                                                                                                                                                                   
Out[7]: <JaggedArray [[3] [] [6 -6 -2] [] [234144 -234232]] at 0x7fee39ceba50>
JatGreer
@JatGreer

Hi,
Pieter Keenan has been trying to get event selection with fast-carpenter going but has run into some errors that we don't know how to solve. Would somebody be able to help us try to understand? Forwarding the details, yaml and error below:

stages:
    - end_times: fast_carpenter.Define
    - event_selection: fast_carpenter.selection.CutFlow
    - histogram: fast_carpenter.BinnedDataframe
​
end_times:
    variables:
        - Hit_End_Tick: Hit_Size + Hit_Start_Tick
​
event_selection:
    selection:
        All:
            - Hit_True_GenType == 1
            - Hit_True_MarleyIndex == 0
​
histogram:
    binning:
      - {in: Hit_Start_Tick, out: hitStartTick}
      - {in: Hit_Size, out: hitSize}
      - {in: Hit_End_Tick, out: hitEndTick}

and error:

fast_carpenter singlelist.yml etick.yml 
2020-04-21 14:54:53,411 - alphatwirl.misc.deprecation - WARNING - alphatwirl.misc.deprecation.MultiprocessingDropbox.g(): the option "progressbar" is deprecated. "progressbar=True" is given. use atpbar.disable() instead to turn off progress bars
WARNING:alphatwirl.misc.deprecation:alphatwirl.misc.deprecation.MultiprocessingDropbox.g(): the option "progressbar" is deprecated. "progressbar=True" is given. use atpbar.disable() instead to turn off progress bars
   0.00%                                          |        0 /        1 |:  simulation                       
Process Worker-1:
Traceback (most recent call last):
  File "/software/zj19427/miniconda/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/software/zj19427/miniconda/lib/python3.7/site-packages/mantichora/hubmp.py", line 235, in run
    self._run_tasks()
  File "/software/zj19427/miniconda/lib/python3.7/site-packages/mantichora/hubmp.py", line 255, in _run_tasks
    result = task_func()
  File "/software/zj19427/miniconda/lib/python3.7/site-packages/mantichora/main.py", line 18, in __call__
    return self.task(*self.args, **self.kwargs)
  File "/usersc/zj19427/.local/lib/python3.7/site-packages/alphatwirl/concurrently/CommunicationChannel.py", line 16, in __call__
    return self.task(*self.args, **self.kwargs)
  File "/usersc/zj19427/.local/lib/python3.7/site-packages/alphatwirl/loop/EventLoop.py", line 45, in __call__
    self.reader.event(event)
  File "/usersc/zj19427/.local/lib/python3.7/site-packages/alphatwirl/loop/ReaderComposite.py", line 43, in event
    if reader.event(event) is False:
  File "/software/zj19427/miniconda/lib/python3.7/site-packages/fast_carpenter/selection/stage.py", line 222, in event
    chunk.tree.apply_mask(new_mask)
  File "/software/zj19427/miniconda/lib/python3.7/site-packages/fast_carpenter/masked_tree.py", line 77, in apply_mask
    self._mask = _normalise_mask(new_mask, len(self.tree))
  File "/software/zj19427/miniconda/lib/python3.7/site-packages/fast_carpenter/masked_tree.py", line 100, in _normalise_mask
    raise RuntimeError("mask is not a numpy array, a list, or a tuple")
RuntimeError: mask is not a numpy array, a list, or a tuple

which is where the code hangs. The file works fine if I remove the event_selection stage. I'm working off these examples: https://fast-carpenter.readthedocs.io/en/latest/processing_config.html https://fast-carpenter.readthedocs.io/en/latest/api/fast_carpenter.selection.html .

Jim B
@jimbrooke
not sure i properly understand the error, but one issue with that yaml config is that Hit_True_GenType and Hit_True_MarleyIndex are jagged arrays - so i doubt carpenter understands how to interpret them as an event mask
assuming you want to select events where there is at least one Hit_True_GenType == 0, you can use the 'reduce/formula' paradigm (see the CMS example)
Jim B
@jimbrooke
(not sure i really understand what reduce:0 means though )
pjkeenan
@pjkeenan
Hi All,
I think what I am asking about is 2 tree support in carpenter, but I’m not entirely sure.
I am currently adding 2 trees from a single root file to the file list yaml using curator. I am then performing some action with a scribbler and outputting a new variable for each event, (e.g. finding the minimum hit start time in the event).
As far as I can tell this action is performed for all events in tree 1, and then for all events in tree 2. Is there a way to perform the same action on the equivalent event in both trees at the same time, i.e. find the minimum hit start time in event x across either tree 1 or tree 2?
Luke Kreczko
@kreczko
Hi @pjkeenan, unfortunately not at the moment. I've been working on such an implementation in https://github.com/kreczko/fast-carpenter/tree/kreczko-dataspace-part1 but it is not feature-complete yet.
Your use case should work, but selections applied to multiple trees do not work yet. If you don't mind using experimental code, it might do the job for you.
Jim B
@jimbrooke
@pjkeenan sounds like it is worth trying that, and see if it works. if not, we can combine the two trees in ROOT before running fast
pjkeenan
@pjkeenan
@jimbrooke sounds like a good plan to me
pjkeenan
@pjkeenan
Hi @kreczko, I think I've got your branch and I think I am correctly pointing everything towards it. In order to get carpenter to look through two trees simultaneously (as opposed to tree 1 then tree 2) what will I need to change in my yaml/scribbler files?
Luke Kreczko
@kreczko

@pjkeenan For the data yaml you will need to add the tree paths, e.g.:
https://github.com/kreczko/fast-carpenter/blob/kreczko-dataspace-part1/examples/cms_l1t_data.yml#L10

In the processing YAML you need to use the full variable name (e.g. folder.treename.branchname), e.g.
https://github.com/kreczko/fast-carpenter/blob/kreczko-dataspace-part1/examples/cms_l1t_processing.yml#L18

there l1CaloTowerTree.L1CaloTowerTree.L1CaloTower.iet is folder.trename.branch.subbranch
If you have a data example (e.g. a file with just 1 event) that you can share (e.g. on DICE), then I can have a look and add it to my tests
if @jimbrooke is OK with it, ofc.
I am trying to construct a use-case library to make sure I do not miss something
pjkeenan
@pjkeenan

Hi, @kreczko thanks for all that.

If you have a data example (e.g. a file with just 1 event) that you can share (e.g. on DICE), then I can have a look and add it to my tests
There is an example on sc01 of 1 file with 100 events at /storage/zj19427/MCAnalyserWork/SNNu/sl610000evtSNNuMCC11_BTwide20_ST/30181862_0/MC_SN_trigprim_FPGAHitprodmarley_nue_spectrum_radiological_timedep_hudepohl_11.2M_3perevent_dune10kt_1x2x6_10808352_36_20180814T060920_g4_detsim-16bb4_35046acf-32a6-4727-a565-a3d30d1fd5e4.root

Using your experimental carpenter code I have run into this error that I don't understand (YAMLs below)

fast_carpenter 2treereadlist.yml 2treereadconf.yml 

2020-05-12 15:15:54,360 - alphatwirl.misc.deprecation - WARNING - alphatwirl.misc.deprecation.MultiprocessingDropbox.g(): the option "progressbar" is deprecated. "progressbar=True" is given. use atpbar.disable() instead to turn off progress bars
WARNING:alphatwirl.misc.deprecation:alphatwirl.misc.deprecation.MultiprocessingDropbox.g(): the option "progressbar" is deprecated. "progressbar=True" is given. use atpbar.disable() instead to turn off progress bars
<class 'fast_carpenter.dataspace.DataSpaceView'>
<class 'fast_carpenter.dataspace.DataSpaceView'>
<class 'fast_carpenter.dataspace.DataSpaceView'>
   0.00%                                          |        0 /        1 |:  simulation                       Process WorkerFork-1:
   0.00%                                          |        0 /        1 |:  simulation                       
Traceback (most recent call last):
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/mantichora-0.10.0-py3.8.egg/mantichora/hubmp.py", line 248, in run
    self._run_tasks()
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/mantichora-0.10.0-py3.8.egg/mantichora/hubmp.py", line 268, in _run_tasks
    result = task_func()
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/mantichora-0.10.0-py3.8.egg/mantichora/main.py", line 19, in __call__
    return self.task(*self.args, **self.kwargs)
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/alphatwirl-0.25.2-py3.8.egg/alphatwirl/concurrently/CommunicationChannel.py", line 16, in __call__
    return self.task(*self.args, **self.kwargs)
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/alphatwirl-0.25.2-py3.8.egg/alphatwirl/loop/EventLoop.py", line 45, in __call__
    self.reader.event(event)
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/alphatwirl-0.25.2-py3.8.egg/alphatwirl/loop/ReaderComposite.py", line 43, in event
    if reader.event(event) is False:
  File "/usersc/zj19427/fast-carpenter/fast_carpenter/define/variables.py", line 72, in event
    result = full_evaluate(chunk.tree, expression, fill_missing,
  File "/usersc/zj19427/fast-carpenter/fast_carpenter/define/variables.py", line 143, in full_evaluate
    result = evaluate(tree, expression)
  File "/usersc/zj19427/fast-carpenter/fast_carpenter/expressions.py", line 118, in evaluate
    result = numexpr.evaluate(cleaned_expression, local_dict=adaptor)
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/numexpr-2.7.1-py3.8-linux-x86_64.egg/numexpr/necompiler.py", line 821, in evaluate
    signature = [(name, getType(arg)) for (name, arg) in
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/numexpr-2.7.1-py3.8-linux-x86_64.egg/numexpr/necompiler.py", line 821, in <listcomp>
    signature = [(name, getType(arg)) for (name, arg) in
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/numexpr-2.7.1-py3.8-linux-x86_64.egg/numexpr/necompiler.py", line 703, in getType
    raise ValueError("unknown type %s" % a.dtype.name)
ValueError: unknown type object
pjkeenan
@pjkeenan
  - eventtype: mc
    files:
      - /storage/zj19427/MCAnalyserWork/SNNu/sl610000evtSNNuMCC11_BTwide20_ST/30181862_0/MC_SN_trigprim_FPGAHitprodmarley_nue_spectrum_radiological_timedep_hudepohl_11.2M_3perevent_dune10kt_1x2x6_10808352_36_20180814T060920_g4_detsim-16bb4_35046acf-32a6-4727-a565-a3d30d1fd5e4.root
    name: simulation
    nevents:
      mcanalyser20/MCTree: 100
      mcanalyser10ind/MCTree: 100
    nfiles: 1
    tree:
      - mcanalyser20/MCTree
      - mcanalyser10ind/MCTree
  - BasicVars: fast_carpenter.Define
  - HitVars: 2tree-analysis4.HitVars
  - IndNorm: fast_carpenter.summary.BinnedDataframe
  - ColNorm: fast_carpenter.summary.BinnedDataframe

BasicVars:
  variables:
    - isColMarley0: (mcanalyser20.MCTree.Hit_True_GenType == 1) & (mcanalyser20.MCTree.Hit_True_MarleyIndex==0) & (mcanalyser20.MCTree.Hit_View == 2)
    - isIndMarley0: (mcanalyser10ind.MCTree.Hit_True_GenType == 1) & (mcanalyser10indHit_True_MarleyIndex==0) & (mcanalyser10indHit_View < 2)
    - isAr39: (mcanalyser20.MCTree.Hit_True_GenType == 4)

HitVars:
          #{mask: isAr39, out_var: Norm_Start_Tick}
          {mask1: isIndMarley0, mask2: isColMarley0, out_var1: Norm_Start_Tick_Ind, out_var2: Norm_Start_Tick_Col}

#diff:
#    variables:
#        - : ""

IndNorm:
  dataset_col: True  
  binning:
      - {in: Norm_Start_Tick_Ind, out: norm_start_tick_ind}

ColNorm:
  dataset_col: True
  binning:
      - {in: Norm_Start_Tick_Col, out: norm_start_tick_col}
I am also using a scribbler which produces the Norm_Start_Tick variables - let me know if it is useful to see this... Thanks in advance
pjkeenan
@pjkeenan

Hi @kreczko - a couple more things, Jim had the idea that the multitree carpenter code might not work with expressions of the above type (e.g. the isColMarley0 line) in the processing yaml, or that there might be a differnt syntax. Is this the case?

Secondly, to get around this I set the masks directly in the scribbler but this gave the error

$ fast_carpenter 2treereadlist.yml 2treereadconf.yml 
2020-05-14 15:34:30,257 - alphatwirl.misc.deprecation - WARNING - alphatwirl.misc.deprecation.MultiprocessingDropbox.g(): the option "progressbar" is deprecated. "progressbar=True" is given. use atpbar.disable() instead to turn off progress bars
WARNING:alphatwirl.misc.deprecation:alphatwirl.misc.deprecation.MultiprocessingDropbox.g(): the option "progressbar" is deprecated. "progressbar=True" is given. use atpbar.disable() instead to turn off progress bars
   0.00%                                          |        0 /        1 |:  simulation                       Process WorkerFork-1:
   0.00%                                          |        0 /        1 |:  simulation                       
Traceback (most recent call last):
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/mantichora-0.10.0-py3.8.egg/mantichora/hubmp.py", line 248, in run
    self._run_tasks()
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/mantichora-0.10.0-py3.8.egg/mantichora/hubmp.py", line 268, in _run_tasks
    result = task_func()
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/mantichora-0.10.0-py3.8.egg/mantichora/main.py", line 19, in __call__
    return self.task(*self.args, **self.kwargs)
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/alphatwirl-0.25.2-py3.8.egg/alphatwirl/concurrently/CommunicationChannel.py", line 16, in __call__
    return self.task(*self.args, **self.kwargs)
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/alphatwirl-0.25.2-py3.8.egg/alphatwirl/loop/EventLoop.py", line 45, in __call__
    self.reader.event(event)
  File "/software/zj19427/miniconda/envs/pk_py3/lib/python3.8/site-packages/alphatwirl-0.25.2-py3.8.egg/alphatwirl/loop/ReaderComposite.py", line 43, in event
    if reader.event(event) is False:
  File "/usersc/zj19427/faster-fast-hep-analysis/ExperimentalCode/2tree-analysis4/__init__.py", line 32, in event
    Hit_True_GenType, Hit_True_MarleyIndex, Hit_Start_Tick, Hit_View = chunk.tree.arrays(self.branches, outputtype=tuple)
AttributeError: 'DataSpace' object has no attribute 'arrays'

Again, should there be different syntax in the scribbler or are scribblers not compatible at this time?
Cheers

Luke Kreczko
@kreczko
@pjkeenan I am trying to keep the syntax mostly the same - I will add your example now and see if I can reproduce it
pjkeenan
@pjkeenan
@kreczko Thanks - let me know if anything isn't clear or is missing!
Luke Kreczko
@kreczko
I am currently trying to run without 2tree-analysis4.HitVars and debugging it from there. I am planning to spend two full days on the multi-tree stuff, so I will let you know hopefully sooner rather than later
pjkeenan
@pjkeenan
@kreczko that's brilliant, thanks. Note there are missing . on the isIndMarley0 line of my processing config yaml that I can't now edit (code broke before this point though)... Let me know if having the 2tree-analysis4.HitVars scribbler would be helpful to you
Luke Kreczko
@kreczko
@pjkeenan I think I mostly got it to work. I am playing with your example right now and had to do a few corrections. E.g. for Norm_Start_Tick_Ind and Norm_Start_Tick_Col you need to define them first:
var_def:
    variables:
        - Norm_Start_Tick_Ind: 'mcanalyser10ind.MCTree.Hit_Start_Tick'
        - Norm_Start_Tick: 'mcanalyser20.MCTree.Hit_Start_Tick'
I am preparing a pull request with the new features now, so it should get a proper run of testing before I can tag a development release
Luke Kreczko
@kreczko
Jim B
@jimbrooke
does anyone have an example of producing a turn-on curve in FAST ? i assume this has been done...
Jim B
@jimbrooke
in particular, i'm interested in what tools people use for calculating uncertainties (binomial or otherwise)
Andrew Naylor
@asnaylor
Are FAST-HEP planning to move to uproot4 and awkward1? If so when?
JatGreer
@JatGreer

@benkrikler @kreczko I have a case where it would be useful to me to count the file number which is being processed by the scribbler (https://gitlab.cern.ch/fast-hep/public/fast_cms_public_tutorial/blob/master/cms_hep_tutorial/__init__.py).

I have a number of files and it appears to me that each chunk (actually each call of chunk.tree.arrays) being processed is the data from a single file (I have 100 events per root file), so I assumed event() is called once per file. However, if I declare a variable in the __init__() function and try to increment it by 1 for each time the event() function is called, the counter doesn't get incremented (I'm testing with only 3 root files). Please could you help me understand if there is some way of counting the file number or something equivalent?

Luke Kreczko
@kreczko
@JatGreer not sure the modules stay alive for more than one event() call. @benkrikler ?
but also: chunk is called >=1 times per file (depending on the size) so it might not be what you want
Luke Kreczko
@kreczko
the best place to add something like this would be in the bookkeeping (e.g. datasets.yml
JatGreer
@JatGreer
@kreczko
Do you mean the yaml that is created by fast-curator?
It may be easier for me to introduce a file number to my root tree before using fast-hep by using uproot.recreate, although it's new to me. Might that be a better option?