Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    David Melkumov
    @Dmelkumo
    Hi!
    I was wondering if it's possible to dmap/distribute a domain after initializing it? I'm essentially trying to create an array and perform some operations on it while it's on one locale, then apply a block distribution for further use. I read that an array's domain can't be swapped after declaration, so I'm assuming that altering its domain is how I'd accomplish this. Here's sort of an example of what I'm trying to do in case this makes it any clearer:
    var d: domain(1) = {0..19};
    var arr: [d] int = 1;
    // do something that would have the effect of having done 'var d: domain(1) dmapped Block(boundingBox={0..19}) = {0..19}' at the start instead to distribute array
    8 replies
    David Melkumov
    @Dmelkumo
    Also related to that, is there a way to distribute a 2d array in blocks where each block is 1 or more rows? I'd like to have it distributed similarly to an array of arrays where the outer array is using a block distribution.
    // I want to do this but in a single 2d array rather than an array of arrays
    var d: domain(1) dmapped Block(boundingBox={0..19}) = {0..19};
    var arr: [d] [0..19] int;
    3 replies
    Tom Westerhout
    @twesterhout:matrix.org
    [m]

    If I have a distributed array such as:

      const box = {0 ..# 20};
      const dom = box dmapped Block(box, Locales);
      var arr : [dom] int;

    what would be the simplest way to get C pointers to all subDomains? I.e. I can do something like this:

    arrPtrs : [0 ..# numLocales] c_ptr(int);
    coforall loc in Locales do on loc {
      arrPtrs[loc.id] = c_ptrTo(arr[arr.localSubdomain().low]);
    }

    but is there a better way that avoids remote task spawns?

    Tom Westerhout
    @twesterhout:matrix.org
    [m]

    Found a way, but it's relying on some implementation details...

      var arrPtrs : [0 ..# numLocales] c_ptr(arr.eltType);
      for loc in Locales do {
        arrPtrs[loc.id] = arr.locArr[loc.id].myElems._value.data:c_void_ptr:c_ptr(arr.eltType);
      }

    This generates a few remote cache-gets and get_nbs, but no remote tasks are spawned, yay!

    Tom Westerhout
    @twesterhout:matrix.org
    [m]

    A question: what are the chances of RVO firing when returning a tuple? I.e. something like this

    proc shouldRVO() {
      var A : [1 .. 10] int;
      const B = otherComputation();
      return (A, B);
    }

    Is there a way to force A to not be copied? I tried doing Memory.Initialization.moveToValue(A), but it gave me a segfault :/

    Brad Chamberlain
    @bradcray
    @twesterhout:matrix.org : With respect to your latest question, since A and B are local variables, they should logically be copied out since they’ll be de-allocated at the end of the routine. The compiler then might have a chance of optimizing the pattern by “stealing” the array memory back to the callsite rather than copying and de-allocating it, but I don’t happen to know offhand how well that works in the presence of tuples today. You could probably get a sense of how well or poorly this works by tracking the memory allocations across the call to see whether an array’s worth of value was allocated/deallocated? Or @mppf may happen to know (as he did most of the work on this optimization).
    4 replies
    With respect to your previous question, my head went most quickly to the two techniques you ended up with. There may even be a way to do it even more cheaply. This could be reasonable to open a feature request for, to avoid having to rely on internals like this.
    Tom Westerhout
    @twesterhout:matrix.org
    [m]
    Thanks @mppf , #18077 is exactly what I stumbled upon (I was trying to return a tuple of two distributed arrays). out intent requires me to declare the variable outside of the function, doesn't? So I lose the type deduction :( but it might be a good workarout until #18077 is implemented.
    2 replies
    Zhihui Du
    @zhihuidu
    @bradcray , Hi, Brad, I have a question on forall/coforall constructure. If I have the following code
    forall (iteration 1) {
    forall (iteration2) {
    }
    }
    What will chapel do for the forall iteration 2? Obviously, the code can explore more parallelism. I just want to know if we have enough parallel resources, can Chapel run all of them in parallel or chaple will execute forall iteration 2 in sequential? Thanks!
    Thomas Rolinger
    @thomasrolinger
    @zhihuidu this link has some info that may help answer, though Brad or someone else can provide updates: https://stackoverflow.com/questions/51350695/are-there-any-benefits-or-drawbacks-to-using-nested-forall-loops
    13 replies
    asianintel
    @asianintel:matrix.org
    [m]
    use Map;
    
    record A {
        param a: int;
    }
    
    class AbstractB {
        proc getA(): A {
            halt("Virtual Method");
            return new A(1);
        }
    }
    
    class B: AbstractB {
        var class_a: A;
    
        override proc getA(): A {
            return this.class_a;
        }
    }
    
    var m = new map(string, shared AbstractB);
    m.add("t1", new shared B(new A(3)));
    writeln(m.getValue("t1").getA());
    So, in a function, I need to get an object of class B and extract class A from it. The abstract class is needed to be able to store it into a map. class A unfortunately needs to have multiple param fields in it. The getA function will obviously error at compile with a conflicting return type error since A is a generic type and different values of a will sort of be a new type in itself. How would I go about writing getA so it returns appropriately?
    3 replies
    Josh Milthorpe
    @milthorpe

    This is not really a question or request, more of a grumble: I wanted to define a procedure over an array of tuples, where one of the tuple components is of a generic type. I believe this should be done as follows:

    proc f(a: [] (?t, int)) { }
    
    // example instantiation for tuple of (real, int)
    var realArr = { (3.0, 1) };
    f( realArr );

    When I compile the above code with Chapel 1.27, I get an error message I can easily understand:

    genericTupleArray.chpl:1: In function 'f':
    genericTupleArray.chpl:1: error: Query expressions are not currently supported in this context
      genericTupleArray.chpl:1: called as f(a: [domain(1,int(64),false)] (real(64),int(64)))
    note: generic instantiations are underlined in the above callstack

    However, if I actually try to refer to type t anywhere in the procedure -- e.g. var big: max(t); -- I get a more confusing compiler message:

    genericTupleArray.chpl:1: In function 'f':
    genericTupleArray.chpl:2: error: 't' used before defined
    genericTupleArray.chpl:1: note: defined here

    Obviously, the second compile error was the one I actually saw first, and it confused me for a long while until I deleted all uses of t from the body of the procedure.
    The first compiler message seems to suggest that query expressions may eventually be supported for arrays of composite type. Is there an open GitHub issue that relates to this feature?

    Brad Chamberlain
    @bradcray
    Hi Josh @milthorpe — That’s a really interesting behavior, and I think it’d definitely be worth filing an issue with this observation to improve the quality of errors by generating the first error message first, or instead of the second.
    I think it’s correct that we’d like to support more general pattern matching like this over time than we do today, but don’t know offhand whether there’s an existing GitHub issue for it or not. I don’t think it’s a recent one, if there is one. If you can’t find one with a perfunctory search, I wouldn’t feel shy about filing a feature request for it.
    Brad Chamberlain
    @bradcray

    I was going to suggest a workaround for this in case you hadn’t already found one, but am finding other reasons to grumble instead. Specifically, I wanted to be able to write:

    proc f(a: [] ?et) where isTupleType(et) && et.size == 2 && et(1) == int {
      type t = et(0);
    }

    but it looks as though this form of indexing into tuple types is not supported (or it’s too late for me to get the invocation right).

    3 replies
    Here’s what I came up instead, and am not particularly happy with (due to the need to declare the variable dummy:
    proc f(a: [] ?et) where isTupleType(et) && et.size == 2 {
      var dummy: et;
      if dummy(1).type != int then
        compilerError("the second element of the tuples must be int");
      type t = dummy(0).type;
      writeln(t:string);
    }
    
    var realArr = [ (3.0, 1), ];
    f( realArr );
    1 reply
    Note that I changed the declaration of realArr to use square brackets rather than curly brackets, as the latter would make it a domain rather than an array. I also used a trailing comma for (minor) style preference on my part, and because I’m never sure whether single-element array literals like [ (3.0, 1) ] will work. But removing it, it seems to.
    Luca Ferranti
    @lucaferranti
    Hi there, I noticed chapel is currently not on exercism. Do you think it might be interesting / valuable to have a chapel track there? Might increase visibility of the language (yeah might be a bit of a crazy idea, I know :) )
    8 replies
    npadmana
    @npadmana
    Hi all - is there a way for CHPL_MODULE_PATH to search all subdirectories of a path? I currently end up putting all modules into a single directory, but was wondering if there was a better way to organize these?
    Brad Chamberlain
    @bradcray
    @npadmana : Not at present, that I’m aware of. Though you should be able to put multiple directories manually into the path, I believe/hope?
    npadmana
    @npadmana
    @bradcray - yes, I can do that... just wanted to see if there was something else...
    Brad Chamberlain
    @bradcray
    Not at present I’m afraid. It would be a reasonable feature request. I’m not aware of a precedent for it in other compilers I’m familiar with, which is why I think the current behavior is as it is.
    Note that for specific patterns like “these modules should be submodules of this other module”, there is the fairly new / fairly unused include statement which permits modules in subdirectories to be brought in using a specific pattern.
    npadmana
    @npadmana
    I was vaguely aware of this effort -- what is the best place to read about this? And I know there was some discussion about submodules living in directories with the same name as the parent module -- did that converge a stable version?
    Brad Chamberlain
    @bradcray
    That's correct, and the same feature as include. I think the best reference is: https://chapel-lang.org/docs/technotes/module_include.html
    1 reply
    Thomas Rolinger
    @thomasrolinger
    Given a type that is known to be an atomic (i.e., type t = atomic int), is there a way to "extract" the fact that it is based on an int? Brute force approach would be to have a select statement that goes through the possible atomics types (there aren't too many, right?). This doesn't need to be super clean, as it is very behind-the-scenes code, but anything that already exists would be helpful.
    2 replies
    Josh Milthorpe
    @milthorpe
    Is there a way to get debug symbols for optimized code? It looks like --fast disables -g, as if I use both, I don't get debug symbols for the application code
    3 replies
    David Melkumov
    @Dmelkumo

    Is there a way to perform a minloc reduction but only over certain elements in an array/its domain? Or would I need to create a filtered copy of that array and then perform the reduction on it?

    Also, if I were doing something like this where I had to store the filtered copy, is there a way to have the resulting array be distributed?

    var d: domain(1) dmapped Block({0..19}, Locales) = {0..19};
    var arr: [d] int = 0..19;
    var arrFiltered = [i in d] if arr[i] % 2 == 0 then arr[i];
    5 replies
    LightPegasus
    @LightPegasus
    Hello, I am trying to use Distributed Bag, but for some reason when I want to balance the bag to be used across my locales, it doesn't actually do that. I am wondering why it was putting everything onto the last locale and how to fix it. Is this a bug?
    use DistributedBag;
     var resGraph: [0..5, 0..5] int;
     resGraph[0,..] = [0,12,13,0,0,0];
     resGraph[1,..] = [0,0,10,12,0,0];
     resGraph[2,..] = [0,4,0,0,14,0];
     resGraph[3,..] = [0,0,9,0,0,20];
     resGraph[4,..] = [0,0,0,7,0,4];
     resGraph[5,..] = [0,0,0,0,0,0];
     var bag = new DistBag((int, int), Locales);
    
     bag.add((1,0));
     bag.add((3,1));
     bag.add((5,3));
    
     coforall loc in Locales do
       on loc {
         bag.balance();
         forall i in bag {
             writeln("Locale: ", i.locale.id, " => ", resGraph[i(1), i(0)]);    
         }   
      }
    3 replies
    David Melkumov
    @Dmelkumo

    Hi, I was wondering if anyone had some insight as to why this section gets slower with added locales?

            findTimer.start();
            var minVal = (1000000, -1);
            forall i in d with (min reduce minVal) {
                if !inTree[i] && dist[i] < minVal(0) {
                    minVal = (dist[i], i);
                }
            }
            findTimer.stop();

    inTree and dist are both 1d arrays of ints using the same block distributed domain (d). I'm also using a tuple for minVal to keep track of the index of the minimum value.

    16 replies
    LightPegasus
    @LightPegasus
    I am trying to implement a parallel version of the Ford-Fulkerson algorithm (Edmond-Karp Algorithm version). I am using forall to split the work on my array of tuples across my locales. I cannot figure out why it is creating such a large overhead i.e. (parallel: 0.09s for 10 vertices vs. serial: 0.008s for 10 vertices). Is there a better way to parallelize it? Should I use a different distribution?
    /* Function that implements the Ford-Fulkerson Max Flow algorithm
     * 
     * Return: the maxium flow from s to t
     * resGraph: an adjacency matrix that contains the capacities
     * s: source vertex
     * t: sink vertex
     */
    
    proc FordFulkerson(resGraph: [], s: int, t: int) 
    { 
      // array that stores the path by BFS
      var parent: [0..V-1] int;
      var max_flow: int = 0; // no flow initially
    
      while (bfs(resGraph, s, t, parent)) {   
        // Find the minimum residual capacity of the edges
        var path_flow: int = max(int);
        var q = new list((int, int));  
    
        var v: int = t;
        while (v != s) {
          q.append((v, parent[v]));
          v = parent[v];
        }   
    
        const Space = {0..q.size-1}; 
        var D = Space dmapped Block(Space);
        var A: [D] (int, int) = q.toArray();
    
        forall i in A with (min reduce path_flow) do
          path_flow = min(path_flow, resGraph[i(1), i(0)]);
    
        // Update residual capacities of the edges and reverse edges along the path
        forall i in A with (ref resGraph) {
          resGraph[i(1), i(0)] -= path_flow;
          resGraph[i(0), i(1)] += path_flow;
        }
    
        // Add path flow to overall flow
        max_flow += path_flow;
    
      }
      return max_flow;
    } //End of the FordFulkerson function
    12 replies
    Michael Merrill
    @mhmerrill
    what is the recommended way of breaking out of a coforall?
    and stopping all the tasks
    we ended up calling Errors.exit(0) to exit the program in this case but this feels a little icky, I guess we could throw an exception from the thread that wants to stop all the tasks...
    1 reply
    Michael Merrill
    @mhmerrill
    I guess we could also share a var and poll it...
    1 reply
    LightPegasus
    @LightPegasus
    I am working on a parallel BFS algorithm. I get the correct answer when running my program that is using said algorithm, but the time it takes to run is worse then running the algorithm in series. I am wondering if anyone has any suggestion on what to work on or how to improve upon it. My code is on my GitHub: https://github.com/LightPegasus/Ford-Fulkerson
    Thomas Rolinger
    @thomasrolinger
    @LightPegasus I’d suggest looking at the code in listing 5 in this paper: https://ieeexplore.ieee.org/document/9721333 it is far from the best performing code but it should give you somewhere to start from. In general, a distributed bag is not likely going to do what you want for BFS. The paper describes an approach to use aggregation to make the performance better but it is a bit out of date for how we could do it today. If you’re interested in that approach, let me know.
    Thomas Rolinger
    @thomasrolinger
    Another issue not specific to BFS is that your graph data structure is a dense/full 2D matrix rather than a compressed/sparse matrix. So you’re spending tons of time in the forall on line 49 iterating over every vertex to find neighbors. A compressed representation only stores the non-zeros (the edges in the graph). That way you can easily access a given vertex’s neighbors. Also, the graph/matrix is not distributed, so you will have a ton of remote communication in that forall when you access the graph from any locale besides locale 0.
    David Melkumov
    @Dmelkumo
    I already asked this in a thread, but I thought I'd ask again for visibility: how would I use a minloc reduction intent in a forall loop? Would I need to zip an array and its domain to iterate, and then have one variable for the min and one for the index?
    11 replies
    Tom Westerhout
    @twesterhout:matrix.org
    [m]

    I have a weird segmentation fault:

    proc getBlockPtrs(arr) {
      logDebug("getBlockPtrs(", arr, ")");
      type eltType = arr.eltType.eltType;
      var ptrs : [0 ..# arr.size] c_ptr(eltType);
      for i in arr.dim(0) {
        ref locBlock = arr[i];
        if locBlock.dom.size > 0 {
          logDebug("if");
          ref x = locBlock.data[locBlock.dom.low];
          ptrs[i] = __primitive("_wide_get_addr", x):c_ptr(eltType);
          logDebug("end if");
        }
      }
      logDebug("returning");
      return ptrs;
    }
    
    proc finalizeInitialization(...) {
        logDebug(_dataPtrs);
        logDebug("assigning...");
        _dataPtrs = getBlockPtrs(_locBlocks);
        logDebug("finalizeInitialization is done!");
    }

    This code prints everything except for "finalizeInitialization is done!" and fails with a segmentation fault. It seems like the error happens during array assignment. Are there techniques to debug that?

    EDIT: Oh yeah, forgot to mention that the type of _dataPtrs is [0 ..# 1] c_ptr(real(64)).

    Tom Westerhout
    @twesterhout:matrix.org
    [m]
    Interestingly, when I change getBlockPtrs function to receive ptrs by reference rather than return it, the error dissappears...
    Lydia Duncan
    @lydia-duncan
    Hmm. My guess is that the pointers being stored in ptrs are more local than you’d want them to be. They could potentially be referring to a copied version of what is sent into arr that is local to getBlockPtrs. Or maybe the ptrs array is getting cleaned up in such a way that impacts what’s being returned as well
    Tom Westerhout
    @twesterhout:matrix.org
    [m]
    I'd think so as well, but the failure happens even before my code got a chance to use ptrs. In other words, the segfault happens when I try to assign to _dataPtrs rather than when trying to dereference one of the pointers.
    The code seems to consistently fail to copy arrays of c_ptr(uint(64)) of length 1. Could there be some magic conversion for size-one arrays that's taking place?
    Currently, I have to trace all the uses of the array and do the following:
      // This crashes:
      var basisStatesPtrs : [0 ..# numLocales] c_ptr(uint(64)) = basisStates._dataPtrs;
      // This works:
      var basisStatesPtrs : [0 ..# numLocales] c_ptr(uint(64)) = noinit;
      c_memcpy(c_ptrTo(basisStatesPtrs), c_const_ptrTo(basisStates._dataPtrs),
               numLocales:c_size_t * c_sizeof(c_ptr(uint(64))));
    And I'm compiling with CHPL_COMM=none so numLocales==1
    2 replies
    Tom Westerhout
    @twesterhout:matrix.org
    [m]
    @benharsh: actually, if you have a minute, could you try reproducing the segfault on your side? Since you already have access to my code it should be relatively painless, I think. Could you try doing git pull and then compiling the second example: make bin/Example02. There is now a flag --enableSegFault which will cause the program to crash (twesterhout/distributed-matvec@5efe6ff). The Chapel environment that I'm using is this one: https://github.com/twesterhout/distributed-matvec/blob/master/env/setup-env-comm-none.sh
    6 replies
    Tom Westerhout
    @twesterhout:matrix.org
    [m]
    That's great! Should I open an issue about it or have you done so already?
    1 reply
    Oliver Alvarado Rodriguez
    @alvaradoo
    Hi everyone! I am currently trying to find a way to create an array A of different size on all my locales. For example, say I have 2 locales, I want the domain of A to be 0..3 on locale0 and the domain of A to be 2..10 on locale1. Is there any straightforward way of doing this? I’ve been trying some stuff out but have not been getting it to work as expected.
    13 replies
    Tom Westerhout
    @twesterhout:matrix.org
    [m]
    A random question: is anybody using Chapel with Nix? (I've checked, and there appears to be no Chapel package in nixpkgs) Has this ever been considered? and if not, would there be interest in it?
    5 replies
    npadmana
    @npadmana
    Hi all -- what is the best way to read data from a pipe (i.e. start a process and pipe its output into Chapel)?