Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 25 11:16
    jorana review_requested #387
  • Jan 25 11:15
    jorana opened #387
  • Jan 25 10:50

    jorana on master

    very small simplification of pb… make static methods static for … do same extra patching on tqdm and 5 more (compare)

  • Jan 25 10:50
    jorana closed #386
  • Jan 25 10:50
    jorana closed #354
  • Jan 25 10:06
    jorana synchronize #386
  • Jan 25 08:55
    jorana synchronize #386
  • Jan 25 08:42
    jorana commented #386
  • Jan 24 23:30
    jorana synchronize #386
  • Jan 24 23:27
    jorana review_requested #386
  • Jan 24 23:26
    jorana edited #386
  • Jan 24 23:26
    jorana edited #386
  • Jan 24 23:17
    jorana reopened #354
  • Jan 24 23:17
    jorana ready_for_review #386
  • Jan 24 23:17
    jorana synchronize #386
  • Jan 24 23:09
    jorana synchronize #386
  • Jan 24 18:55
    jorana converted_to_draft #386
  • Jan 24 17:46
    jorana synchronize #386
  • Jan 24 17:41
    jorana opened #386
  • Jan 22 17:15

    WenzDaniel on stable

    remove operators from mongo st… Added keep_columns into docstri… Parse requirements for testing and 10 more (compare)

Jelle Aalbers
@JelleAalbers
Maybe this relates to what was brought up a few straxfernos ago (forget by who), that we are not checking whether all data fits in the chunk time bounds, just the last element: https://github.com/AxFoundation/strax/blob/master/strax/chunk.py#L68
Maybe there is somehow a record in the middle of the chunk with a bad endtime. Or maybe only in records, not raw_records. Then a hit is found in it, hits get sorted by time, and now the problem comes to the surface.
Of course records are supposed to be fixed-length and sorted by time, so this should still not happen.
But it's easy to check: change that line to strax.endtime(self.data).max() and reprocess the run. If this hypothesis is right it will crash earlier on some raw_records / records chunk
Daniel Wenz
@WenzDaniel

They are sorted by time not end time right? So you could get something like this:

r1 |---------|
r2 s |----| last one

s is a spacer since whit space does not work

...

Of course records are supposed to be fixed-length and sorted by time, so this should still not happen.

The data is fixed length not the length length, is not it?

Jelle Aalbers
@JelleAalbers
Yes, you're right, the last record of a pulse has a variable length
So if a record extends past its chunk boundary, but another channel has a much shorter record that starts a little later, strax would miss it
Daniel Wenz
@WenzDaniel

Dear all who are currently reviewing the nveto plugins and hitlet functions. Thanks a lot. Please note, that I uploaded a new notebook which explains in detail the purpose of the different hitlet functions and how they are embedded in the nveto plugins. You can find the corresponding comment here https://github.com/AxFoundation/strax/pull/275#issuecomment-656025873

The very same notebook also serves as an mini-introduction to straxn(en) for the nveto-subgroup. Hence some comments might be less interesting for the reviewers.

All issues which were raised until now were solved. I hope this new notebook will help to speed up the reviewing process such that more people of the nveto subgroup can start playing with the data without installing their own strax(en).

Joran Angevaare
@jorana

But it's easy to check: change that line to strax.endtime(self.data).max() and reprocess the run. If this hypothesis is right it will crash earlier on some raw_records / records chunk

@JelleAalbers you are right. We end up with this:

ValueError: Attempt to create chunk [008543.raw_records: 1594281342sec 999999000 ns - 1594281348sec 499999000 ns, 5836752 items, 258.9 MB/s] whose data ends late at 1594281348499999220
Jelle Aalbers
@JelleAalbers
Great! Well, not great that there is an error, but at least we are starting to understand it :-)
Looks like there is a similar problem here https://github.com/XENONnT/straxen/blob/master/straxen/plugins/daqreader.py#L160 so we are not doing our sanity checks on the DAQ output right
Of course the other possibility is that the DAQ is producing fine output but we're doing something wrong in the splitting in the DAQReader
Darryl Masson
@darrylmasson
Records are sorted by start time, so we can't guarantee that the final record in a chunk has the highest endtime
Jelle Aalbers
@JelleAalbers
Indeed, we should not assume this, but it looks like we do in the DAQreader's last_end computation. This isn't just used for the sanity check but also for the splitting
So this is most likely the bug
Joran Angevaare
@jorana
let me do a quick fix to see if this solves the issue
Joran Angevaare
@jorana
alright thanks Jelle and Daniel! It seems fixed. Here are the corresponding PRs:
AxFoundation/strax#281
XENONnT/straxen#146
Daniel Wenz
@WenzDaniel
Okay I will check right away
Daniel Wenz
@WenzDaniel

alright thanks Jelle and Daniel! It seems fixed. Here are the corresponding PRs:
AxFoundation/strax#281
XENONnT/straxen#146

Looks fine, but maybe we can do a bit better for raw_records at least.

Joran Angevaare
@jorana
@all could someone help us and update the bleeding edge env such that the fast response team can look at the latest GXe data? (the lineage has changed)
Joran Angevaare
@jorana
In the meanwhile you could load the data with fuzzy_for=tuple(<plugin names>) as an argument in the st = straxen.context.xenonnt_online(fuzzy_for=tuple(<plugin names>), <other things>)
Jason Brodsky
@jpbrodsky
I am working on a plugin to merge two streams on Monte Carlo data. It should exhaust one stream, and if the other stream still has entries it should just ignore the remaining entries.
This requires overriding the default behavior of Plugin.cleanup
Which normally objects if any input runs out while another input has data remaining
 def cleanup(self,iters, wait_for):
           for d in iters.keys():
                        if not self._fetch_chunk(d, iters):
                            print(f'Source {d} is finished.')
                        else:
                            print(f'Source {d} is not exhausted, but was stopped early since another source finished first')
                            #shut down the thread for source d
What should I do to replace that comment at the end?
I could just loop _fetch_chunk() until the source is exhausted, but that doesn't sound like a great plan
since I don't need this source any more
and don't want to spend time asking it for more data
I do need to shutdown the thread(s) associated with the unexhausted source
since right now the analysis cannot complete since that thread is just sitting around
Jelle Aalbers
@JelleAalbers

Hi Jason, as you likely guessed there is no nice support for this in strax yet; we assume each plugin always exhausts all its inputs. Even if your plugin doesn't, some other plugin or saver may want all of the data for that input, so it's risky to let just one plugin flip a 'kill switch'.

Of course shutting things down is always possible. You could define a custom exception class (class JasonsKillSwitchActivated(Exception): pass) and .throw() it into the iterator/generator for the input you no longer need. That will kill that input's mailbox, which will shut down the thread writing to it, and anyone else as soon as they try to read from it.

You might have to modify the mailbox code to propagate the exception to mailboxes further upstream, if you have any (i.e. use kill with upstream = True when your exception comes along). If the exception shows up on the screen, you could look where the printing happens and similarly bypass it when your custom exception is caught. If there is another plugin or saver reading the forcefully stopped datatype, it will show the exception in its metadata, but you probably want that behavior (the data is now incomplete). If/when we replace the mailbox system with another concurrency backend, these modifications would have to be revisited.

Yossi Mosbacher
@jmosbacher
@darrylmasson I implemented what we talked about for auto selecting from multiple compressions here , any chance you can review this and maybe test that it works if you happen to have some chunks larger than 2GB lying around?
Jason Brodsky
@jpbrodsky
Thanks, Jelle, that's straightforward. Just throwing StopIteration seems to work fine as well.
Jelle Aalbers
@JelleAalbers
Ah nice, maybe that shuts it down in a cleaner way too, since that is the usual way the end of the stream is communicated
Darryl Masson
@darrylmasson
@jmosbacher if you open a PR it'll make reviewing it much simpler
Jason Brodsky
@jpbrodsky
what's the right thing to do if I'd like two plugins to share a config setting?
so a single update to that config setting would set the same value to both plugins and trigger rerunning both of them
Sophia Andaloro
@sophiaandaloro
You should be able to do st.set_config() and specify your config settings within there... then when you run st.make() or st.get_array() etc, then you should trigger the reprocessing of the data because you reset your context. Every time I've had plugins share a config setting this method worked for me.
I think you can also specify this when you set the context, that is, st=straxen.contexts.your_context(some_settings), as well. I just always do it after for ease.
Jason Brodsky
@jpbrodsky
thanks, Sophia
When I set up a @strax.takes_config(...) I set defaults for the config options
if I'm using the same config in two plugins, can I set the default in just one of them?
Jason Brodsky
@jpbrodsky
Also, if I have a parameter that is derived from the configuration, e.g. the maximum drift time (which I am deriving from the field strength), is there a preferred way to calculate that once, and share the value with other plugins?
(and recalculate it if the config changes)
one (maybe silly?) idea: have a plugin that computes this parameter and produces a one-entry output. Other plugins would then depend on this output. But this doesn't seem like good match for Strax's timeline syncing of dependencies.
Sophia Andaloro
@sophiaandaloro
To your first question I believe the answer is no. Unless you set two different contexts for the two different targets you want to calculate. I don’t believe otherwise you can do that. Someone correct me if I’m wrong. Second question: not sure unless you just build a “plugin” of sorts to calculate and return it. You’re right though, the dependencies might be an issue so someone should weigh in if there’s a better method I don’t know about.
Peter Gaemers
@petergaemers

@jpbrodsky

if I'm using the same config in two plugins, can I set the default in just one of them?

That doesn't work. You'll need to set the default in the context config, else it will crash (st.set_config(..))