Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Jim Pivarski
    @jpivarski
    It's evidently multiplying the itemsize by the product of the shape. So I guess that's what we have to do. That makes numbytes less useful as a way to measure memory usage.
    1 reply
    Angus Hollands
    @agoose77:matrix.org
    [m]
    Ah fab
    (I was checking about range support)
    I just re read your message, it made sense the first time round
    Angus Hollands
    @agoose77:matrix.org
    [m]
    @jpivarski: I was writing a PR to add a context manager for deprecations, but then I realised we're basically implementing features that already exist in the warnings module. Is the deprecations_as_warnings public api something we can break?
    1 reply
    Angus Hollands
    @agoose77:matrix.org
    [m]
    @jpivarski: will do - I'll make sure to check against py2!
    Angus Hollands
    @agoose77:matrix.org
    [m]
    @jpivarski: I've been running into some problems when using Awkward with Dask concerning the incompatability of forms (when using partitioned) / schemas (when using Parquet).
    From your knowledge, does the parquet schema depend upon the order of the fields in a record array?
    3 replies
    Angus Hollands
    @agoose77:matrix.org
    [m]
    When deserialising arrays in a dataset, the partitioned array failed because the partitions didn't have the same schemas
    That doesn't sound quite right so it might have been related to it. I'm using dask and its been a busy 24 hrs 😂
    I'm now using hdf5 as the parquet stuff is a little unstable rn
    It's on my to do list, but I'm not quite ready at the moment to tackle it
    Jim Pivarski
    @jpivarski
    Partitions have to have the same schemas. If that didn't raise an error message, it should have.
    Angus Hollands
    @agoose77:matrix.org
    [m]
    Yeah that sounds about right
    Angus Hollands
    @agoose77:matrix.org
    [m]
    Angus Hollands
    @agoose77:matrix.org
    [m]
    @jpivarski: I'm running into a problem concerning partitioning and could use some input
    I've implemented a "dataset" on top of to_buffers (& h5py), and load it using a partitioned virtual array.
    Angus Hollands
    @agoose77:matrix.org
    [m]
    I think the schemas are varying between regular arrays and listoffsetarrays. Is there any application agnostic way to resolve this?
    1 reply
    The main issue is that I don't have all of the partitions that are written to disk in memory at the same time/process. I'm currently thinking my options are either:
    1. run another pass over the data and cast to the first form
    2. manually specify the form in the writer
    3. convert at read-time to a common form
    Angus Hollands
    @agoose77:matrix.org
    [m]
    I also am guessing that ak.packed is what has introduced these bugs, because it can change the layout according to the data
    Jim Pivarski
    @jpivarski
    Regular and irregular lists are different types, so if ak.packed is not making a distinction between them, then that's a bug. An input method, like ak.from_iter or ak.from_numpy, always makes data of one type or the other, so it shouldn't be the case that a partition of data that "happens to be regular" would make the partitions differ on this point: the input method would always create regular or irregular typed data regardless of the content of the input.
    Angus Hollands
    @agoose77:matrix.org
    [m]
    Right that makes sense - I was spit balling after a long day trying to move my analysis to awkward + dask. I've filed an issue now that I have worked out the cause!
    Lukas
    @lukasheinrich
    hi
    is there an easy way to rename fields of an existing array
    Angus Hollands
    @agoose77:matrix.org
    [m]
    @lukasheinrich: I think the easiest way is just to use ak.with_name and create a new RecordArray. Awkward arrays are supposed to be immutable (which they mostly are, apart from a few areas), so anything you do needs ultimately to create a new array.
    Lukas
    @lukasheinrich
    do you have an example?
    how to use with_name?
    Angus Hollands
    @agoose77:matrix.org
    [m]
    Sure
    ak.with_name just adds / replaces the __record__ (name) of the first RecordArray it encounters. It will visit the entire layout of the array, starting from the root (outermost dimension)
    with_new_name = ak.with_name(array, "MyArray")
    If your RecordArray is nested inside of some other records, then you need to build a new array:
    to_rename = array.some.nested.recordarray
    with_new_name = ak.with_name(to_rename, "MyArray")
    new_array = ak.with_field(array, with_new_name, ("some","nested","recordarray"), )
    The latter code just pulls out the record array that you want to rename, gives it the new name, and then rebuilds the original array such that the renamed recordarray is in the right location
    Lukas
    @lukasheinrich
    hm - does this just cchange the give a name of the record arary type/
    does this acctually change the field names of tthe recordarray?
    i.e imagine I have this
    Lukas
    @lukasheinrich
    a = ak.zip({'some_clunky_name': ak.Array([[1,2,3],[],[4,5]]), 'ok_name': ak.Array([[1,2,3],[],[4,5]])})         
    <ListOffsetArray64>
        <offsets><Index64 i="[0 3 3 5]" offset="0" length="4" at="0x7fe04712a600"/></offsets>
        <content><RecordArray length="5">
            <field index="0" key="some_clunky_name">
                <NumpyArray format="l" shape="5" data="1 2 3 4 5" at="0x7fe045af1800"/>
            </field>
            <field index="1" key="ok_name">
                <NumpyArray format="l" shape="5" data="1 2 3 4 5" at="0x7fe04710fe00"/>
            </field>
        </RecordArray></content>
    </ListOffsetArray64>
    and I want to change some_clunky_name -> to something better
    (but in reality it will be deeply nested.. iis there a easy way to change the field?)
    Jim Pivarski
    @jpivarski
    @agoose77:matrix.org and @lukasheinrich ak.with_name changes the name of the record (e.g. rename "Electron" as "Muon"). It does not change the names of the fields.
    To change the field names, re-zipping it is probably the best option.
    ak.unzip and ak.fields are your friends, here.
    >>> a = ak.zip({'some_clunky_name': ak.Array([[1,2,3],[],[4,5]])})
    >>> ak.fields(a)
    ['some_clunky_name']
    >>> ak.unzip(a)
    (<Array [[1, 2, 3], [], [4, 5]] type='3 * var * int64'>,)
    >>> b = ak.zip(dict(zip(["better_name"], ak.unzip(a))))
    >>> b
    <Array [[{better_name: 1, ... better_name: 5}]] type='3 * var * {"better_name": ...'>
    >>> b.layout
    <ListOffsetArray64>
        <offsets><Index64 i="[0 3 3 5]" offset="0" length="4" at="0x56208dee9320"/></offsets>
        <content><RecordArray length="5">
            <field index="0" key="better_name">
                <NumpyArray format="l" shape="5" data="1 2 3 4 5" at="0x56208df4a170"/>
            </field>
        </RecordArray></content>
    </ListOffsetArray64>
    Angus Hollands
    @agoose77:matrix.org
    [m]
    @lukasheinrich @jpivarski sorry, another example of not reading the question. Running a bit low on sleep the last week!
    Andrew Naylor
    @asnaylor

    I'm trying to run a numba function over awkward-arrays on dask. The function works fine outside of dask but dask complains of a Typing Error when trying to run the function with dask.delayed:

    TypingError: Failed in nopython mode pipeline (step: nopython frontend)
    non-precise type pyobject
    During: typing of argument at <ipython-input-6-0f4baa26940d> (75)
    
    File "<ipython-input-6-0f4baa26940d>", line 75:
    <source missing, REPL/exec in use?>
    
    This error may have been caused by the following argument(s):
    - argument 0: cannot determine Numba type of <class 'awkward.highlevel.Array'>
    - argument 1: cannot determine Numba type of <class 'awkward.highlevel.Array'>

    I'd read online that sometimes with Numba you need to explicitly define the signature for the input and outputs. How do you do that for awkward arrays?

    Jim Pivarski
    @jpivarski
    Not in this case (nb.vectorize is the only one that sometimes needs it, depending on what you're doing). In this case, it's because the remote workers don't have the Awkward-Numba definitions. That's an installation/configuration thing, but you can try to force it with this line:
    
    
    ak.numba.register()
    If it's installed on the remote Dask workers, but the "entry point" isn't set correctly for some reason, this will fix it. If it's just not installed on the remote Dask workers, then this will give a more useful error message.
    Andrew Naylor
    @asnaylor
    Ah, thank you for such a quick response @jpivarski that line of magic fixed it, the function works on dask now