## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
Michael Merrill
@mhmerrill
// this might turn into
proc foo(lst : borrowed GenericList) : borrowed GenericValue
{
//...
}

this currently turns into something like:

// this might turn into
proc foo(lst : borrowed shared GenericListClass(shared GenericValueClass) : borrowed shared GenericValueClass
{
//...
}

which is ungood ;-)

9 replies
Luca Ferranti
@lucaferranti

Good day everyone! Can someone help me understand why the following happens

record R {

var dom: domain(1);
var num: real;
var arr: [dom] real;
}

record S {
var num : real;
var num2 : real;
}

proc foo(a : real, b : real) {return new S(a, b);}

proc foo(val : real, grad : [?D] real) {return new R(D, val, grad);}

var a = foo(1.0, 2.0),
b = foo(1.0, [1.0, 2.0]);

writeln("a = ", a, " with type ", a.type : string);
writeln("b = ", b, " with type ", b.type : string);

var c = foo(1, 2), // here int implicitly converted to real
d = foo(1, [2, 3]); // something strange happens here.

writeln("c = ", c, " with type ", c.type : string);
writeln("d = ", d, " with type ", d.type : string);

produces the output

a = (num = 1.0, num2 = 2.0) with type S
b = (dom = {0..1}, num = 1.0, arr = 1.0 2.0) with type R
c = (num = 1.0, num2 = 2.0) with type S
d = (num = 1.0, num2 = 2.0) (num = 1.0, num2 = 3.0) with type [domain(1,int(64),false)] S

the first two variables a and b are the "trivial cases", everything goes as expected. Then things get interested, when I create c, int gets automatically converted to real and the right record is contructed. Based on this, I would have expected a similar behavior for d, but instead it seems that it automatically broadcasted the function call to each element of the array.

16 replies
Guillaume Helbecque
@Guillaume-Helbecque

Hello everyone!
I faced a strange issue, reproduced in the following code:

proc main(): int
{
coforall loc in Locales do on loc {

writeln("hello from task ", tid, " of locale ", loc.id);

while true {}
}
}

return 0;
}

I expect each thread of each locale to say hello to me before being infinitely blocked. However, only threads of locale 0 do this. As if the threads of other locales were not created. If I remove the while statement, this is ok. If I add an explicit global barrier before the while, that's also correct.
Can anyone explain me what's happening ?

9 replies
Tom Westerhout
@twesterhout:matrix.org
[m]
I'm trying to create a wrapper for an external C library (i.e. I'm not really supposed to change its code), and that library defines a type member attribute in a struct. How can I make such a struct on the Chapel side? (since type is a keyword, extern record isn't happy with it)
Tom Westerhout
@twesterhout:matrix.org
[m]
It worked :) Thanks, @lydia-duncan !
Tom Westerhout
@twesterhout:matrix.org
[m]

I'm running Chapel on a 2-socket AMD Epyc 7502. Each socket has 32 cores, i.e. 64 total, but for some reason qthreads only starts 32 shepherds and prints the following warning when I try to force 64 threads: "warning: QTHREADS: Reduced numThreadsPerLocale=64 to 32 to prevent oversubscription of the system."
lscpu shows the following:

CPU(s):              64
On-line CPU(s) list: 0-63
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        2

Do you have any suggestions how to debug this behavior?

6 replies
David Melkumov
@Dmelkumo
Hi!
I was wondering if it's possible to dmap/distribute a domain after initializing it? I'm essentially trying to create an array and perform some operations on it while it's on one locale, then apply a block distribution for further use. I read that an array's domain can't be swapped after declaration, so I'm assuming that altering its domain is how I'd accomplish this. Here's sort of an example of what I'm trying to do in case this makes it any clearer:
var d: domain(1) = {0..19};
var arr: [d] int = 1;
// do something that would have the effect of having done 'var d: domain(1) dmapped Block(boundingBox={0..19}) = {0..19}' at the start instead to distribute array
8 replies
David Melkumov
@Dmelkumo
Also related to that, is there a way to distribute a 2d array in blocks where each block is 1 or more rows? I'd like to have it distributed similarly to an array of arrays where the outer array is using a block distribution.
// I want to do this but in a single 2d array rather than an array of arrays
var d: domain(1) dmapped Block(boundingBox={0..19}) = {0..19};
var arr: [d] [0..19] int;
3 replies
Tom Westerhout
@twesterhout:matrix.org
[m]

If I have a distributed array such as:

  const box = {0 ..# 20};
const dom = box dmapped Block(box, Locales);
var arr : [dom] int;

what would be the simplest way to get C pointers to all subDomains? I.e. I can do something like this:

arrPtrs : [0 ..# numLocales] c_ptr(int);
coforall loc in Locales do on loc {
arrPtrs[loc.id] = c_ptrTo(arr[arr.localSubdomain().low]);
}

but is there a better way that avoids remote task spawns?

Tom Westerhout
@twesterhout:matrix.org
[m]

Found a way, but it's relying on some implementation details...

  var arrPtrs : [0 ..# numLocales] c_ptr(arr.eltType);
for loc in Locales do {
arrPtrs[loc.id] = arr.locArr[loc.id].myElems._value.data:c_void_ptr:c_ptr(arr.eltType);
}

This generates a few remote cache-gets and get_nbs, but no remote tasks are spawned, yay!

Tom Westerhout
@twesterhout:matrix.org
[m]

A question: what are the chances of RVO firing when returning a tuple? I.e. something like this

proc shouldRVO() {
var A : [1 .. 10] int;
const B = otherComputation();
return (A, B);
}

Is there a way to force A to not be copied? I tried doing Memory.Initialization.moveToValue(A), but it gave me a segfault :/

@twesterhout:matrix.org : With respect to your latest question, since A and B are local variables, they should logically be copied out since they’ll be de-allocated at the end of the routine. The compiler then might have a chance of optimizing the pattern by “stealing” the array memory back to the callsite rather than copying and de-allocating it, but I don’t happen to know offhand how well that works in the presence of tuples today. You could probably get a sense of how well or poorly this works by tracking the memory allocations across the call to see whether an array’s worth of value was allocated/deallocated? Or @mppf may happen to know (as he did most of the work on this optimization).
4 replies
With respect to your previous question, my head went most quickly to the two techniques you ended up with. There may even be a way to do it even more cheaply. This could be reasonable to open a feature request for, to avoid having to rely on internals like this.
Tom Westerhout
@twesterhout:matrix.org
[m]
Thanks @mppf , #18077 is exactly what I stumbled upon (I was trying to return a tuple of two distributed arrays). out intent requires me to declare the variable outside of the function, doesn't? So I lose the type deduction :( but it might be a good workarout until #18077 is implemented.
2 replies
Zhihui Du
@zhihuidu
@bradcray , Hi, Brad, I have a question on forall/coforall constructure. If I have the following code
forall (iteration 1) {
forall (iteration2) {
}
}
What will chapel do for the forall iteration 2? Obviously, the code can explore more parallelism. I just want to know if we have enough parallel resources, can Chapel run all of them in parallel or chaple will execute forall iteration 2 in sequential? Thanks!
Thomas Rolinger
@thomasrolinger
13 replies
asianintel
@asianintel:matrix.org
[m]
use Map;

record A {
param a: int;
}

class AbstractB {
proc getA(): A {
halt("Virtual Method");
return new A(1);
}
}

class B: AbstractB {
var class_a: A;

override proc getA(): A {
return this.class_a;
}
}

var m = new map(string, shared AbstractB);
writeln(m.getValue("t1").getA());
So, in a function, I need to get an object of class B and extract class A from it. The abstract class is needed to be able to store it into a map. class A unfortunately needs to have multiple param fields in it. The getA function will obviously error at compile with a conflicting return type error since A is a generic type and different values of a will sort of be a new type in itself. How would I go about writing getA so it returns appropriately?
3 replies
Josh Milthorpe
@milthorpe

This is not really a question or request, more of a grumble: I wanted to define a procedure over an array of tuples, where one of the tuple components is of a generic type. I believe this should be done as follows:

proc f(a: [] (?t, int)) { }

// example instantiation for tuple of (real, int)
var realArr = { (3.0, 1) };
f( realArr );

When I compile the above code with Chapel 1.27, I get an error message I can easily understand:

genericTupleArray.chpl:1: In function 'f':
genericTupleArray.chpl:1: error: Query expressions are not currently supported in this context
genericTupleArray.chpl:1: called as f(a: [domain(1,int(64),false)] (real(64),int(64)))
note: generic instantiations are underlined in the above callstack

However, if I actually try to refer to type t anywhere in the procedure -- e.g. var big: max(t); -- I get a more confusing compiler message:

genericTupleArray.chpl:1: In function 'f':
genericTupleArray.chpl:2: error: 't' used before defined
genericTupleArray.chpl:1: note: defined here

Obviously, the second compile error was the one I actually saw first, and it confused me for a long while until I deleted all uses of t from the body of the procedure.
The first compiler message seems to suggest that query expressions may eventually be supported for arrays of composite type. Is there an open GitHub issue that relates to this feature?

Hi Josh @milthorpe — That’s a really interesting behavior, and I think it’d definitely be worth filing an issue with this observation to improve the quality of errors by generating the first error message first, or instead of the second.
I think it’s correct that we’d like to support more general pattern matching like this over time than we do today, but don’t know offhand whether there’s an existing GitHub issue for it or not. I don’t think it’s a recent one, if there is one. If you can’t find one with a perfunctory search, I wouldn’t feel shy about filing a feature request for it.

I was going to suggest a workaround for this in case you hadn’t already found one, but am finding other reasons to grumble instead. Specifically, I wanted to be able to write:

proc f(a: [] ?et) where isTupleType(et) && et.size == 2 && et(1) == int {
type t = et(0);
}

but it looks as though this form of indexing into tuple types is not supported (or it’s too late for me to get the invocation right).

3 replies
Here’s what I came up instead, and am not particularly happy with (due to the need to declare the variable dummy:
proc f(a: [] ?et) where isTupleType(et) && et.size == 2 {
var dummy: et;
if dummy(1).type != int then
compilerError("the second element of the tuples must be int");
type t = dummy(0).type;
writeln(t:string);
}

var realArr = [ (3.0, 1), ];
f( realArr );
Note that I changed the declaration of realArr to use square brackets rather than curly brackets, as the latter would make it a domain rather than an array. I also used a trailing comma for (minor) style preference on my part, and because I’m never sure whether single-element array literals like [ (3.0, 1) ] will work. But removing it, it seems to.
Luca Ferranti
@lucaferranti
Hi there, I noticed chapel is currently not on exercism. Do you think it might be interesting / valuable to have a chapel track there? Might increase visibility of the language (yeah might be a bit of a crazy idea, I know :) )
12 replies
Hi all - is there a way for CHPL_MODULE_PATH to search all subdirectories of a path? I currently end up putting all modules into a single directory, but was wondering if there was a better way to organize these?
@npadmana : Not at present, that I’m aware of. Though you should be able to put multiple directories manually into the path, I believe/hope?
@bradcray - yes, I can do that... just wanted to see if there was something else...
Not at present I’m afraid. It would be a reasonable feature request. I’m not aware of a precedent for it in other compilers I’m familiar with, which is why I think the current behavior is as it is.
Note that for specific patterns like “these modules should be submodules of this other module”, there is the fairly new / fairly unused include statement which permits modules in subdirectories to be brought in using a specific pattern.
I was vaguely aware of this effort -- what is the best place to read about this? And I know there was some discussion about submodules living in directories with the same name as the parent module -- did that converge a stable version?
That's correct, and the same feature as include. I think the best reference is: https://chapel-lang.org/docs/technotes/module_include.html
Thomas Rolinger
@thomasrolinger
Given a type that is known to be an atomic (i.e., type t = atomic int), is there a way to "extract" the fact that it is based on an int? Brute force approach would be to have a select statement that goes through the possible atomics types (there aren't too many, right?). This doesn't need to be super clean, as it is very behind-the-scenes code, but anything that already exists would be helpful.
2 replies
Josh Milthorpe
@milthorpe
Is there a way to get debug symbols for optimized code? It looks like --fast disables -g, as if I use both, I don't get debug symbols for the application code
3 replies
David Melkumov
@Dmelkumo

Is there a way to perform a minloc reduction but only over certain elements in an array/its domain? Or would I need to create a filtered copy of that array and then perform the reduction on it?

Also, if I were doing something like this where I had to store the filtered copy, is there a way to have the resulting array be distributed?

var d: domain(1) dmapped Block({0..19}, Locales) = {0..19};
var arr: [d] int = 0..19;
var arrFiltered = [i in d] if arr[i] % 2 == 0 then arr[i];
5 replies
LightPegasus
@LightPegasus
Hello, I am trying to use Distributed Bag, but for some reason when I want to balance the bag to be used across my locales, it doesn't actually do that. I am wondering why it was putting everything onto the last locale and how to fix it. Is this a bug?
use DistributedBag;
var resGraph: [0..5, 0..5] int;
resGraph[0,..] = [0,12,13,0,0,0];
resGraph[1,..] = [0,0,10,12,0,0];
resGraph[2,..] = [0,4,0,0,14,0];
resGraph[3,..] = [0,0,9,0,0,20];
resGraph[4,..] = [0,0,0,7,0,4];
resGraph[5,..] = [0,0,0,0,0,0];
var bag = new DistBag((int, int), Locales);

coforall loc in Locales do
on loc {
bag.balance();
forall i in bag {
writeln("Locale: ", i.locale.id, " => ", resGraph[i(1), i(0)]);
}
}
3 replies
David Melkumov
@Dmelkumo

Hi, I was wondering if anyone had some insight as to why this section gets slower with added locales?

        findTimer.start();
var minVal = (1000000, -1);
forall i in d with (min reduce minVal) {
if !inTree[i] && dist[i] < minVal(0) {
minVal = (dist[i], i);
}
}
findTimer.stop();

inTree and dist are both 1d arrays of ints using the same block distributed domain (d). I'm also using a tuple for minVal to keep track of the index of the minimum value.

16 replies
LightPegasus
@LightPegasus
I am trying to implement a parallel version of the Ford-Fulkerson algorithm (Edmond-Karp Algorithm version). I am using forall to split the work on my array of tuples across my locales. I cannot figure out why it is creating such a large overhead i.e. (parallel: 0.09s for 10 vertices vs. serial: 0.008s for 10 vertices). Is there a better way to parallelize it? Should I use a different distribution?
/* Function that implements the Ford-Fulkerson Max Flow algorithm
*
* Return: the maxium flow from s to t
* resGraph: an adjacency matrix that contains the capacities
* s: source vertex
* t: sink vertex
*/

proc FordFulkerson(resGraph: [], s: int, t: int)
{
// array that stores the path by BFS
var parent: [0..V-1] int;
var max_flow: int = 0; // no flow initially

while (bfs(resGraph, s, t, parent)) {
// Find the minimum residual capacity of the edges
var path_flow: int = max(int);
var q = new list((int, int));

var v: int = t;
while (v != s) {
q.append((v, parent[v]));
v = parent[v];
}

const Space = {0..q.size-1};
var D = Space dmapped Block(Space);
var A: [D] (int, int) = q.toArray();

forall i in A with (min reduce path_flow) do
path_flow = min(path_flow, resGraph[i(1), i(0)]);

// Update residual capacities of the edges and reverse edges along the path
forall i in A with (ref resGraph) {
resGraph[i(1), i(0)] -= path_flow;
resGraph[i(0), i(1)] += path_flow;
}

// Add path flow to overall flow
max_flow += path_flow;

}
return max_flow;
} //End of the FordFulkerson function
12 replies
Michael Merrill
@mhmerrill
what is the recommended way of breaking out of a coforall?
we ended up calling Errors.exit(0) to exit the program in this case but this feels a little icky, I guess we could throw an exception from the thread that wants to stop all the tasks...
Michael Merrill
@mhmerrill
I guess we could also share a var and poll it...
LightPegasus
@LightPegasus
I am working on a parallel BFS algorithm. I get the correct answer when running my program that is using said algorithm, but the time it takes to run is worse then running the algorithm in series. I am wondering if anyone has any suggestion on what to work on or how to improve upon it. My code is on my GitHub: https://github.com/LightPegasus/Ford-Fulkerson
Thomas Rolinger
@thomasrolinger
@LightPegasus I’d suggest looking at the code in listing 5 in this paper: https://ieeexplore.ieee.org/document/9721333 it is far from the best performing code but it should give you somewhere to start from. In general, a distributed bag is not likely going to do what you want for BFS. The paper describes an approach to use aggregation to make the performance better but it is a bit out of date for how we could do it today. If you’re interested in that approach, let me know.
Thomas Rolinger
@thomasrolinger
Another issue not specific to BFS is that your graph data structure is a dense/full 2D matrix rather than a compressed/sparse matrix. So you’re spending tons of time in the forall on line 49 iterating over every vertex to find neighbors. A compressed representation only stores the non-zeros (the edges in the graph). That way you can easily access a given vertex’s neighbors. Also, the graph/matrix is not distributed, so you will have a ton of remote communication in that forall when you access the graph from any locale besides locale 0.
David Melkumov
@Dmelkumo
I already asked this in a thread, but I thought I'd ask again for visibility: how would I use a minloc reduction intent in a forall loop? Would I need to zip an array and its domain to iterate, and then have one variable for the min and one for the index?
11 replies
Tom Westerhout
@twesterhout:matrix.org
[m]

I have a weird segmentation fault:

proc getBlockPtrs(arr) {
logDebug("getBlockPtrs(", arr, ")");
type eltType = arr.eltType.eltType;
var ptrs : [0 ..# arr.size] c_ptr(eltType);
for i in arr.dim(0) {
ref locBlock = arr[i];
if locBlock.dom.size > 0 {
logDebug("if");
ref x = locBlock.data[locBlock.dom.low];
logDebug("end if");
}
}
logDebug("returning");
return ptrs;
}

proc finalizeInitialization(...) {
logDebug(_dataPtrs);
logDebug("assigning...");
_dataPtrs = getBlockPtrs(_locBlocks);
logDebug("finalizeInitialization is done!");
}

This code prints everything except for "finalizeInitialization is done!" and fails with a segmentation fault. It seems like the error happens during array assignment. Are there techniques to debug that?

EDIT: Oh yeah, forgot to mention that the type of _dataPtrs is [0 ..# 1] c_ptr(real(64)).

Tom Westerhout
@twesterhout:matrix.org
[m]
Interestingly, when I change getBlockPtrs function to receive ptrs by reference rather than return it, the error dissappears...
Lydia Duncan
@lydia-duncan
Hmm. My guess is that the pointers being stored in ptrs are more local than you’d want them to be. They could potentially be referring to a copied version of what is sent into arr that is local to getBlockPtrs. Or maybe the ptrs array is getting cleaned up in such a way that impacts what’s being returned as well
Tom Westerhout
@twesterhout:matrix.org
[m]
I'd think so as well, but the failure happens even before my code got a chance to use ptrs. In other words, the segfault happens when I try to assign to _dataPtrs rather than when trying to dereference one of the pointers.