Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Thomas Rolinger
    @thomasrolinger
    chpl version 1.22.1
    Copyright 2020 Hewlett Packard Enterprise Development LP
    Copyright 2004-2019 Cray Inc.
    (See LICENSE file for more details)
    HYDRA build details:
        Version:                                 3.2.1
        Release Date:                            General Availability Release
        CC:                              gcc   -fPIC
        CXX:                             g++
        F77:                             gfortran
        F90:                             gfortran
        Configure options:                       '--disable-option-checking' '--prefix=/cm/shared/apps/mvapich2/gcc/64/2.3' '--with-pm=mpirun:hydra' '--with-hwloc' '--enable-sharedlibs=gcc' 'CC=gcc' 'CFLAGS= -fPIC -DNDEBUG -DNVALGRIND -O2' 'CXX=g++' 'FC=gfortran' 'F77=gfortran' '--cache-file=/dev/null' '--srcdir=.' 'LDFLAGS=-L/lib -L/lib -L/lib -Wl,-rpath,/lib -L/lib -Wl,-rpath,/lib -L/lib -L/lib' 'LIBS=-libmad -lrdmacm -libumad -libverbs -ldl -lrt -lm -lpthread ' 'CPPFLAGS= -I/root/rpmbuild/BUILD/mvapich2-2.3/src/mpl/include -I/root/rpmbuild/BUILD/mvapich2-2.3/src/mpl/include -I/root/rpmbuild/BUILD/mvapich2-2.3/src/openpa/src -I/root/rpmbuild/BUILD/mvapich2-2.3/src/openpa/src -D_REENTRANT -I/root/rpmbuild/BUILD/mvapich2-2.3/src/mpi/romio/include -I/include -I/include -I/include -I/include' 'MPLLIBNAME=mpl'
        Process Manager:                         pmi
        Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
        Topology libraries available:            hwloc
        Resource management kernels available:   user slurm ll lsf sge pbs cobalt
        Checkpointing libraries available:
        Demux engines available:                 poll select
    Elliot Ronaghan
    @ronawho
    If it's an issue like chapel-lang/chapel#13082, you could see if using a different spawner helps. If mpi is available gasnet will use mpirun by default, but you can override that by setting export GASNET_IBV_SPAWNER=ssh (but that does require you can do passwordless ssh to compute nodes, you can test that with a 1-node salloc followed by ssh $SLURM_NODELIST)
    Thomas Rolinger
    @thomasrolinger
    I can get the nodes via ssh with a password, so I'll give that a shot.
    Elliot Ronaghan
    @ronawho
    I think it'll require passwordless ssh. Should be able to enable that with cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys (and if you don't have an existing keypair, you can create with ssh-keygen -t rsa)
    Thomas Rolinger
    @thomasrolinger
    sorry, I meant "no" password!
    Elliot Ronaghan
    @ronawho
    :+1:
    Thomas Rolinger
    @thomasrolinger
    ok, switching to the ssh spawner now prints the right number of PUs
    Elliot Ronaghan
    @ronawho
    Ok, cool -- would you mind creating an issue for the mpi spawner getting the wrong number of cores?
    As far as benchmarks, here are some from that performance page starting with least to most communication:
    # No communication
    chpl test/release/examples/benchmarks/hpcc/stream.chpl --fast
    ./stream -nl 2
    
    # Stencil with minimal communication
    chpl --fast test/studies/prk/Stencil/optimized/stencil-opt.chpl -sorder="sqrt(16e9*numLocales / 8):int"
    ./stencil-opt -nl 2
    
    # Lots of fine-grained PUTs/GETs (expect this to perform poorly on InfiniBand)
    chpl --fast test/release/examples/benchmarks/hpcc/ra.chpl -suseOn=false -sN_U="2**(n-10)"
    ./ra -nl 2
    Thomas Rolinger
    @thomasrolinger
    Sure, I'll most likely create it tomorrow morning (on east coast so it's about the end of my day).
    I'll check out those benchmarks as well and probably follow-up if my own application is to blame for the performance I am seeing. Thanks!
    Elliot Ronaghan
    @ronawho
    Sounds good, let me know what you find. Always curious about performance, and more recently we've been trying to work on infiniband performance so it's particularly relevant.
    hokiegeek2
    @hokiegeek2
    Does anyone have a code example for explicitly controlling which elements of a Chapel array are written to which locale?
    Thomas Rolinger
    @thomasrolinger

    That is something I'd also be interested in seeing. I've only figured out that you can alter the targetLocales argument for, say a Block Dist, to control block-to-locale mapping. But that is fairly coarse grain control and isn't an element by element solution.

    To assign the elements that would normally go to locale 0 to now be on locale 3, the code could look something like:

    use BlockDist;
    
    const Space = {0..7, 0..7};
    var tLocales = Locales;
    tLocales[0] = tLocales[3];
    var D: domain(2) dmapped Block(boundingBox=Space, targetLocales=tLocales) = Space;
    var mat: [D] int;
    
    forall elem in mat {
        elem = elem.locale.id;
    }
    
    writeln(mat);

    And the output would be:

    3 3 3 3 1 1 1 1
    3 3 3 3 1 1 1 1
    3 3 3 3 1 1 1 1
    3 3 3 3 1 1 1 1
    2 2 2 2 3 3 3 3
    2 2 2 2 3 3 3 3
    2 2 2 2 3 3 3 3
    2 2 2 2 3 3 3 3
    hokiegeek2
    @hokiegeek2
    @thomasrolinger Nice! I was gonna ask you for a code example...and there it is.
    hokiegeek2
    @hokiegeek2
    @bradcray To provide more context to my post earlier today at 0902, I am attempting to write strings to hdf5, and I'd like (and may need, depending upon the use case) the capability to manually assign uint(8) elements to locales to ensure all uint(8) elements of a word are written to the same locale. Sounds like @thomasrolinger is also interested. Is this possible in Chapel?
    Brad Chamberlain
    @bradcray
    @hokiegeek2: I don’t think our block-distributed arrays have any way of providing more user-level control over where the partitioning takes place (e.g., “make sure the blocks are a multiple of 4”) beyond the default “as evenly as possible” algorithm. We’ve talked about a “cut” distribution as a variation on block that would permit the user to specify where the hyperplanes defining the blocks should take place, and someone in the community managed to prototype this some number of years back, but sadly the code never made it back onto master.
    It would probably not be terribly difficult to clone and modify the Block routine to introduce such constraints onesself, though it’s not pretty code to wade through. But with a guide or persuasion that we should take it on, I don’t think it would be too difficult to introduce such constraints. IIRC, there are a few places in the code that would need to be changed: the place that determines what block each locale owns; and the code that maps a given index to a given locale.
    Engin Kayraklioglu
    @e-kayrakli
    @bradcray — While reading @hokiegeek2’s question, I was thinking that what’s needed is a distributed string rather than a distributed array of strings per se. And my thinking was that it could be achieved by a wrapper around the string/bytes type, where “writing” it would mean doing on statements on locales, copying slices of the base string to that locale and writing it there. And similarly, reading would do a similar operation in reverse. However, I don’t know much about hdf5 and maybe overlooking something
    Brad Chamberlain
    @bradcray
    Ah, sorry, I missed the reference to string and was thinking word in terms of “fixed-size chunk of memory” not “natural language word”, which is definitely a more complex case than what I was suggesting, since it’s more sensitive to the actual array data.
    That definitely feels like a much heavier “lift” to me, though your approach sounds intriguing (essentially a just in time communication + buffering + local write, right?)
    hokiegeek2
    @hokiegeek2
    @e-kayrakli yup, if that's possible, a distributed string is exactly what's needed!
    @bradcray cool, excellent explanation, thanks!
    Engin Kayraklioglu
    @e-kayrakli
    @bradcray — Yeah, it is pretty much just that
    What I imagine is something like:
    record distString {
      forwarding var s: string;
      // some other fields to describe “distribution”
    
      proc writeToHDF5() {
        coforall l in Locales do on l {
           // get local slice, something like
           const myLocalSlice = s[this.localSliceDescriptorRangeWhatever];
           // now just write myLocalSlice locally somehow
        }
      }
    }
    hokiegeek2
    @hokiegeek2
    @bradcray @e-kayrakli What I was thinking, if we stick with the current construct of arrays of things, where, in this case, strings=things, my thought was to key on the delimiter (in the case of strings, I think is always 0:uint(8)), and basically have a max BlockDist size and, if the next sequence of non-null characters > max BlockDist size, write to the next one
    I don't think this approach is that complicated, but I am definitely a Chapel novice, so I may be way off, dunno
    Brad Chamberlain
    @bradcray
    I think “array of strings” is easy to get partitioned correctly, but will have each string in its own heap buffer (where the array elemenst are essentially the string meta-data pointing to the heap). Where Arkouda gets complicated is that it uses arrays of uints and then imposes a string interpretation on it, so the language knows nothing about the strings; it’s all embedded in the application-level meta-data.
    hokiegeek2
    @hokiegeek2
    @bradcray Yeah, gotcha. I think the key thing is that I need to handle the use case where a user just reads in one file
    Brad Chamberlain
    @bradcray
    Having a block distribution where the partitions are sensitive to the data in the array is complicated because it’s a chicken and egg problem: “Make me a 100000-element array that’s block distributed based on running this function over the 100000 elements.” If you could store all the elements to think about them, you wouldn’t need to make the array distributed to begin with. And when you can’t, it’s hard to know how to partition the array since you’d essentially be doing it on the fly as you populated its elements. Tricky.
    What I like about Engin’s proposal is that it leaves the array as-is and then does the localization lazily / on-the-fly / when needed at the I/O boundaries.
    hokiegeek2
    @hokiegeek2
    And that's why I am asking. If that's not a valid use case, then we may be able to emulate what happens when the array of "strings" AKA uint(8) sequences are written to openmem(), which is proven to work just fine.
    @bradcray yeah, if what @e-kayrakli is proposing is doable, that's definitely they way to go
    Engin Kayraklioglu
    @e-kayrakli
    I also want to add that I don’t know how Arkouda strings work. I just know a bit about Chapel strings :) In other words, I don’t see any major roadblock in what I am suggesting, but I don’t know how applicable it is to your use case
    hokiegeek2
    @hokiegeek2
    and @e-kayrakli proposal I am doing on the fly with grabbing the local slices and manipulating 'em. But....the key thing is that his approach should cover all of my use cases, including when the user wants to read from 1..n locales, not the entire dataset
    Engin Kayraklioglu
    @e-kayrakli
    And get a string that is only part of the data?
    hokiegeek2
    @hokiegeek2
    @e-kayrakli AFAIK, Arkouda strings = Chapel strings = uint(8) arrays?
    Engin Kayraklioglu
    @e-kayrakli
    I remember something about segmented arrays (?) at somepoint that made Arkouda strings = special uint(8) arrays. But I can be wrong
    hokiegeek2
    @hokiegeek2
    @e-kayrakli Yup, you're correct re: Arkouda strings = segmented arrays. I don't, however, have to get one string. This is for writing "strings" to/reading from hdf5, and I am thinking I need to make it so each string AKA array of uint(8) elements go to the same locale if I need to enable reading strings from one locale only.
    Brad Chamberlain
    @bradcray
    @hokiegeek2: Not really, unfortunately. I mean, Chapel strings are buffers of uint(8)s at their core, but there’s metadata that describes them and implements their string-ness which isn’t present in Arkouda.
    hokiegeek2
    @hokiegeek2
    @e-kayrakli you are correct! Given you answer, I am guessing Chapel strings ain't the same
    @bradcray gotcha, okay
    Brad Chamberlain
    @bradcray
    Think of a Chapel string as a record pointing to a uint(8) buffer. An array of Chapel strings is an array of records, each of which points off to somewhere else. So the string metadata is consecutive in memory and blocked such that no string is spanning locales, but the strings themselves are all disjoint on the heap.
    Arkouda’s strings are block-distributed arrays of uint(8)s which Arkouda interprets as strings by storing segments of where each string begins. So all the string data is consecutive in memory; but a string may be split between locales.
    Each approach has tradeoffs, obviously.
    hokiegeek2
    @hokiegeek2
    @bradcray ah, yes, excellent summary, thank you!
    hokiegeek2
    @hokiegeek2
    @bradcray @e-kayrakli thank you both for allowing me to borrow your respective brains -> excellent discussion that really helped
    So @e-kayrakli, when are you getting started, LOL
    Engin Kayraklioglu
    @e-kayrakli
    :) FWIW, looking at Brad’s description of Arkouda strings, you may need to adapt what I was suggesting to Arkouda strings. i.e. keep storing them using regular Block dist as it already is, but locally buffer and write that buffer when you need to write it. Reading may get a bit complicated though. i.e. I am not sure how to answer the question “I read these bytes here, but where do they go??”, but it sounds like you may have an answer by looking at Arkouda’s string implementation
    hokiegeek2
    @hokiegeek2
    @e-kayrakli Gotcha, cool, excellent info, will ponder this a bit more. Thanks again to both you and @bradcray!