Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
    Simon Perkins
    Is this thing on?
    Seems so!
    Simon Perkins
    OK, so to reiterate, having the same column in group_cols and taql_where causes problems
    As described in ska-sa/dask-ms#94
    As I understand in IanHeywood/shadeMS#13, you wish to plot RE vs IM per antenna
    and each antenna is split across ANTENNA1 and ANTENNA2 columns
    sjperkins @sjperkins cogitates

    OK, so to reiterate, having the same column in group_cols and taql_where causes problems

    My (not necessarily correct) understanding is the opposite, i.e. TaQL does not work unless the columns involved in the query are in the group_cols list

    If I want to iterate over antennas using TaQL then I need both ANTENNA1 and ANTENNA2 to be in the group_cols list.
    If they are not in the list then TaQL returns nothing. Same with SCAN_NUMBER, I have to add that to the group_cols list to iterate over scans.
    It makes no difference whether they are in the list of columns to be returned.
    Here's something similar to what I was trying to do, but using pyrap. There's no need to ANTENNA1 or ANTENNA2 to get involved at all except for during the query.
    In [1]: from pyrap.tables import table
    In [2]: tt = table('AF0236_spw01.ms')
    Successful readonly open of default-locked table AF0236_spw01.ms: 26 columns, 683354 rows
    In [3]: subtab = tt.query(query='ANTENNA1==5 || ANTENNA2==5',columns='DATA')
    In [4]: subtab.colnames()
    Out[4]: ['DATA']
    In [5]: subtab.getcol('DATA').shape
    Out[5]: (50496, 1, 4)
    Simon Perkins
    Hey @IanHeywood. This is the general approach I've come up with thus far
    import argparse
    from daskms import xds_from_ms
    import dask
    import dask.array as da
    def create_parser():
        p = argparse.ArgumentParser()
        p.add_argument("-rc", "--row-chunks", default=1000, type=int)
        return p
    def script():
        args = create_parser().parse_args()
        datasets = xds_from_ms(args.ms, chunks={'row': args.row_chunks})
        ds_data = []
        # cartesian product of unique (FIELD_ID, DATA_DESC_ID)
        for ds in datasets:
            # Find the unique ANTENNA1 and ANTENNA2 values
            # and their locations within the data
            uant1, ant1_inv = da.unique(ds.ANTENNA1.data, return_inverse=True)
            uant2, ant2_inv = da.unique(ds.ANTENNA2.data, return_inverse=True)
            uants = da.concatenate([uant1, uant2])
            # At this point, compute unique antenna values
            # We need to know them to loop through them
            uants = dask.compute(uants)[0]
            ant_data = {}
            # Continue to construct a lazy expression for 
            # each antenna's data
            for a in uants:
                # Select rows where ANTENNA1 == a or ANTENNA2 == a
                sel = da.logical_or(ant1_inv == a, ant2_inv == a)
                # Select data at the relevant rows
                data = ds.DATA.data[sel]
                # TODO
                # This only exists to prevent memory explosions by
                # reducing to a single value. Remove it in actual code
                data = da.nanmean(data)
                # Stash the lazy expression in a dictionary on
                # antenna number as the key
                ant_data[a] = data
            # Add lazy antenna expression data for this dataset
        # Now actually compute all the lazy expressions
        # (dask traverses native python structures looking for
        #  dask objects)
    if __name__ == "__main__":
    I think you'd need to expand on the datashader bits
    Simon Perkins
    @IanHeywood Please let me know if you feel the example is incomplete. I feel that it's enough to demonstrate the approach but your mileage may vary