Chapel programming language | Peak developer hours are 0600-1700 PT (UTC-07/08) weekdays | Developer Resources: http://www.chapel-lang.org/developers.html
/* Function that implements the Ford-Fulkerson Max Flow algorithm
*
* Return: the maxium flow from s to t
* resGraph: an adjacency matrix that contains the capacities
* s: source vertex
* t: sink vertex
*/
proc FordFulkerson(resGraph: [], s: int, t: int)
{
// array that stores the path by BFS
var parent: [0..V-1] int;
var max_flow: int = 0; // no flow initially
while (bfs(resGraph, s, t, parent)) {
// Find the minimum residual capacity of the edges
var path_flow: int = max(int);
var q = new list((int, int));
var v: int = t;
while (v != s) {
q.append((v, parent[v]));
v = parent[v];
}
const Space = {0..q.size-1};
var D = Space dmapped Block(Space);
var A: [D] (int, int) = q.toArray();
forall i in A with (min reduce path_flow) do
path_flow = min(path_flow, resGraph[i(1), i(0)]);
// Update residual capacities of the edges and reverse edges along the path
forall i in A with (ref resGraph) {
resGraph[i(1), i(0)] -= path_flow;
resGraph[i(0), i(1)] += path_flow;
}
// Add path flow to overall flow
max_flow += path_flow;
}
return max_flow;
} //End of the FordFulkerson function
Errors.exit(0)
to exit the program in this case but this feels a little icky, I guess we could throw an exception from the thread that wants to stop all the tasks...
I have a weird segmentation fault:
proc getBlockPtrs(arr) {
logDebug("getBlockPtrs(", arr, ")");
type eltType = arr.eltType.eltType;
var ptrs : [0 ..# arr.size] c_ptr(eltType);
for i in arr.dim(0) {
ref locBlock = arr[i];
if locBlock.dom.size > 0 {
logDebug("if");
ref x = locBlock.data[locBlock.dom.low];
ptrs[i] = __primitive("_wide_get_addr", x):c_ptr(eltType);
logDebug("end if");
}
}
logDebug("returning");
return ptrs;
}
proc finalizeInitialization(...) {
logDebug(_dataPtrs);
logDebug("assigning...");
_dataPtrs = getBlockPtrs(_locBlocks);
logDebug("finalizeInitialization is done!");
}
This code prints everything except for "finalizeInitialization is done!" and fails with a segmentation fault. It seems like the error happens during array assignment. Are there techniques to debug that?
EDIT: Oh yeah, forgot to mention that the type of _dataPtrs
is [0 ..# 1] c_ptr(real(64))
.
getBlockPtrs
function to receive ptrs
by reference rather than return it, the error dissappears...
ptrs
are more local than you’d want them to be. They could potentially be referring to a copied version of what is sent into arr
that is local to getBlockPtrs
. Or maybe the ptrs
array is getting cleaned up in such a way that impacts what’s being returned as well
ptrs
. In other words, the segfault happens when I try to assign to _dataPtrs
rather than when trying to dereference one of the pointers.
c_ptr(uint(64))
of length 1
. Could there be some magic conversion for size-one arrays that's taking place? // This crashes:
var basisStatesPtrs : [0 ..# numLocales] c_ptr(uint(64)) = basisStates._dataPtrs;
// This works:
var basisStatesPtrs : [0 ..# numLocales] c_ptr(uint(64)) = noinit;
c_memcpy(c_ptrTo(basisStatesPtrs), c_const_ptrTo(basisStates._dataPtrs),
numLocales:c_size_t * c_sizeof(c_ptr(uint(64))));
CHPL_COMM=none
so numLocales==1
git pull
and then compiling the second example: make bin/Example02
. There is now a flag --enableSegFault
which will cause the program to crash (twesterhout/distributed-matvec@5efe6ff). The Chapel environment that I'm using is this one: https://github.com/twesterhout/distributed-matvec/blob/master/env/setup-env-comm-none.sh
A
of different size on all my locales. For example, say I have 2 locales, I want the domain of A
to be 0..3
on locale0
and the domain of A
to be 2..10
on locale1
. Is there any straightforward way of doing this? I’ve been trying some stuff out but have not been getting it to work as expected.
Subprocess
module if I were trying to do this: https://chapel-lang.org/docs/modules/standard/Subprocess.html
@bradcray When an array A is evenly distributed onto many locales, and each locale only works on its local part of A, this is perfect.
coforall loc in locales {
on loc {
work on elements of A assigned to the current locale
}
}
However, if we just do it like this
forall a in A {
work on a
}
It seems that only locale 0 will execute the loop. I wonder for the elements of A that were not assigned to locale 0, locale 0 will first get the data from other locales, work on the element, and then send them back or ask other locales to directly work on the elements?
BlockDist
or other built-in distribution? If that is the case, the forall
approach should execute a given iteration on the locale where a
is located. How are you determining that only locale 0 is executing the loop?
"A is evenly distributed" means no load balancing problem. For example, the size of A is 100 and we have 4 locales with block distribution.
You mean for the code
forall a in A {
work on a
}
even though I did not put it in
coforall loc in locales {
on loc {
}
}
the compiler can automatically assign the work to different locales?
I assume that all code not in
coforall loc in locales {
on loc {
}
}
will execute only on locale 0. Is this correct?
The default iterators provided for BlockDist
arrays/domains will essentially implement the behavior you have for the coforall
. So when you say forall a in A
and A
is indeed a block distributed array, it will do what you want (i.e., execute across the locales).
See the first example shown here: https://chapel-lang.org/docs/modules/dists/BlockDist.html
forall
assuming it isn't nested in another structure like a coforall
. The forall
iterations are chunked up and assigned to tasks located across the locales. Those iterations are then executed on the different locales. Brad and others may have an easier method to explain the overall parallelism model.
forall i in 1..100
, yes the default behavior is to utilize just the cores/resources on locale 0 (assuming no other enclosing structures; what I mean by this is that locale 0 is not necessarily the locale that will execute that loop. It will be whatever locale you are currently on. You could have had something like on Locales[1] { forall i in 1..100
and that would execute on locale 1.)
forall i in expr…
depends entirely on expr
. Specifically, if expr
is an invocation of a parallel iterator, the recipe for how many tasks to use for the loop, where they should run, and how the loop’s iterations should be divided between those tasks is defined within the iterator. If expr
is instead a variable, then there will be a default parallel iterator defined on that variable’s type which specifies these things (and if there is not, the parallel loop will not be legal).
expr
is a range (like 1..n
) or domain (like {1..n, 1..n}
) or array (like var A: [1..n] real;
) then we will invoke the default parallel iterator for the corresponding type.
forall i in 1..n
or forall i in {1..n, 1..n}
or forall a in A
(for my A
above) will only use local tasks to execute iterations.
.locale
query and compare that to where it is running using the built-in here
variable.
@zhihuidu : That’s correct (and in fact, you would not want to put the forall
into a coforall loc in Locales on loc
loop, or you’d end up doing a bunch of redundant work, since each task on each locale would invoke A’s default parallel iterator).
Think of forall a in A
as translating to forall a in A.defaultParallelIterator()
where .defaultParallelIterator()
is currently spelled .these()
in Chapel.
channel.read
used to do this, but it doesn't look like it works, and readBinary
doesn't seem to accept an array?
A question. I have two nested coforall
loops (one over nodes and one over threads). There is quite a bit of communication happening between nodes, but no remote tasks are spawned except for "fast executeOn", (I think) no allocations are performed, just computations. The peculiar thing is that I have to insert a few calls to chpl_task_yield
to have the code terminate -- without them it just hangs. What could be causing such behavior?
EDIT: One other thing, there is no oversubscription, so tasks are not competing for cores.
Hi all (again!).... I was trying to a reduction over a user-defined record as below, but running into an issue because there wasn't an way to define the initial state of the reduction variable... is there a simple way around this :
record R {
const x : int;
var ii : int;
proc init(x0: int) {
this.x = x0;
this.complete();
}
}
operator +(lhs:R, rhs:R) {
var ret = new R(lhs.x);
ret.ii = lhs.ii + rhs.ii;
return ret;
}
var h = new R(10);
coforall ii in 0..4 with (+ reduce h) {
h.ii += ii;
}
writeln(h);
fails to compile with
dev-0.1] 22:20 histogram$ chpl test.chpl
test.chpl:17: error: unresolved call 'R.init()'
test.chpl:5: note: this candidate did not match: R.init(x0: int)
$CHPL_HOME/modules/internal/ChapelReduce.chpl:140: note: because call does not supply enough arguments
test.chpl:5: note: it is missing a value for formal 'x0: int(64)'