These are chat archives for elemental/chat

3rd
Dec 2016
Jack Poulson
@poulson
Dec 03 2016 02:49
it just timed out
was it a debug build or running in an overloaded VM?
Ryan H. Lewis
@rhl-
Dec 03 2016 05:07
Maybe the latter. That seems most reasonable
Drew Lewis
@calewis
Dec 03 2016 12:49
@rhl- I found the QueueUpdate, but I am not entirely sure how it works. For the Block DistMatrices is the Entry type a Matrix<T>? I see that Abstract.hpp uses Entry<T> types, but doesn't declare that type or include a header for it?
Ryan H. Lewis
@rhl-
Dec 03 2016 16:41
@poulson should be able to answer this. I think that may be true, if QueueUpdate does as @Poulson was suggesting earlier
Entry may be defined elsewhere besides Abstract.hpp
Best thing to do is try it and see if there is an error
Jack Poulson
@poulson
Dec 03 2016 17:55
@calewis El::Entry<T> is defined in El/core/types.hpp which is implicitly included beforehand. It is simply a struct of {i,j,value}. You can also directly pass in the triplet: https://github.com/elemental/Elemental/blob/a5c2c2ff49bd7de2caa5faddf87a26bacfdd7f5b/include/El/core/DistMatrix/Abstract.hpp#L226
An example would be:
// Increase the j'th column of an m x n matrix by alpha
A.ReserveUpdates(m);
for( Int i=0; i<m; ++i )
    A.QueueUpdate( i, j, alpha );
A.ProcessQueues();
The communication all happens within ProcessQueues in a batch
Drew Lewis
@calewis
Dec 03 2016 18:50
@poulson Is there a way to just write a local matrix using something like A.local_block(block_row_index, block_col_index, ptr_to_data); Sort of like std::copy?
Jack Poulson
@poulson
Dec 03 2016 18:57
do you mean without communication? I'm not sure what you're asking
you can directly manipulate A.Matrix()
or A.SetLocal( iLocal, jLocal, value )
Drew Lewis
@calewis
Dec 03 2016 18:59
@poulson Yes I have local data that I want to write to A, but I want to do it in a way that is more efficient than writing by element. Is is possible to get a pointer to the first element of a block, then I could just call copy?
Jack Poulson
@poulson
Dec 03 2016 19:00
in what way is writing by element inefficient relative to what is going to be done with Elemental?
you can access the direct local buffers if you like, but I suspect you are overoptimizing
A.Buffer(iLocal,jLocal) returns what you're looking for
which should be used with A.LDim()
but this only works if the data was already located on the appropriate processes
more generally an MPI_Alltoallv needs to be called behind the scenes (and this call should dwarf the cost of the local copy)
Drew Lewis
@calewis
Dec 03 2016 19:05
I probably am over optimizing, but it just seemed more simple to write a single function that gets the Matrix buffer and copies with something like copy(A.Buffer(i,j), my_struct.data(), block_size); Eventually I want to flatten tensors into Matrix Blocks and this type of interface is easier to write than looping over a variable number of dimensions.
Jack Poulson
@poulson
Dec 03 2016 19:05
ah, if it is a utility routine then you want to do the equivalent of std::copy on A.Buffer()
if it is one-off, I would recommend QueueUpdates
or a for loop over A.SetLocal
Drew Lewis
@calewis
Dec 03 2016 19:09
It's going to be a routine for copying our Distributed tensor blocks into DistMatrix Blocks. My plan was to communicate our blocks myself to where ever the DistMatrix block is stored and write and the copy it. Would QueueUpdate be more efficient? I had in mind something like
auto my_block = My_mat.find(block_index);
std::copy(A.Buffer(block_index[0], block_index[1], my_block.data(), my_block.size());
Where find will do communication.
Jack Poulson
@poulson
Dec 03 2016 19:12
QueueUpdate should be the baseline; an optimized routine could certainly be significantly faster
but it would be silly to not start with the five-line implementation that should Just Work
the primary unnecessary overhead of just using QueueUpdate is that it sends the row and column metadata for each entry, when it is only needed for each block
it would make sense to add an analogue of QueueUpdate for BlockMatrix called QueueBlockUpdate
where each item is an (i,j) pair for the top-left entry of the matrix, the height and width of the matrix, and the matrix itself
Drew Lewis
@calewis
Dec 03 2016 19:16
Thanks for the help. Also one last question. Are the linear algebra routines implemented in terms of block matrices or will they redistribute to element based ones? If there is going to be a redistribution for things like EVD and SVD I should I just use QueueUpdate into an element based matrix?
Jack Poulson
@poulson
Dec 03 2016 19:20
they are implemented for element-wise (with the exception of the Hessenberg Schur decomposition) but the interface supports all distributions and redistributes as necessary using El::DistMatrix[Read][Write]Proxy
the redistribution should usually be ignorable for eigensolvers/SVD relative to the operation itself
Drew Lewis
@calewis
Dec 03 2016 19:20
Ok, thanks for all the help. If I run into any more issues I'll come back here.
Jack Poulson
@poulson
Dec 03 2016 19:21
I have been systematically going through and updating the interfaces of routines from accepting ElementalMatrix to AbstractDistMatrix, but please let me know if you run into an issue
Ryan H. Lewis
@rhl-
Dec 03 2016 21:08
I’ve created #208 and #209 for disabling PMRRR and METIS. I’ve gone ahead and committed .appeveyor.yml to master and disabled the Windows CI because its a bit too far off.
Jack Poulson
@poulson
Dec 03 2016 21:09
Disabling METIS would be a pain as it would cut off a huge portion of functionality
I don't think it's reasonable until there are native implementations
Ryan H. Lewis
@rhl-
Dec 03 2016 21:12
Ok, lets close it in favor of such a ticket then.
also, does: -fsanitize=undefined by itself fail?
Jack Poulson
@poulson
Dec 03 2016 21:13
please go easy on all of the ticket creation deletion
-fsanitize=undefined doesn't work on OS X for me yet
(as it, it's Clang bugs)
and GCC doesn't work on recent OS X either
Ryan H. Lewis
@rhl-
Dec 03 2016 21:14
I see
Jack Poulson
@poulson
Dec 03 2016 21:14
so I need to test on a Linux box
Ryan H. Lewis
@rhl-
Dec 03 2016 21:14
docker?
Jack Poulson
@poulson
Dec 03 2016 21:14
I have a linux box at home
Ryan H. Lewis
@rhl-
Dec 03 2016 21:14
oh, ok
Jack Poulson
@poulson
Dec 03 2016 21:14
I think it was honestly a punt by your colleague
I will go through the motions but am pretty confident it's a GCC bug
it will be bug number 500 uncovered by the project
Ryan H. Lewis
@rhl-
Dec 03 2016 21:16
500?
Jack Poulson
@poulson
Dec 03 2016 21:16
there were a lot of IBM compiler bugs :-p
Ryan H. Lewis
@rhl-
Dec 03 2016 21:16
haha
Jack Poulson
@poulson
Dec 03 2016 21:16
I'm exaggerating, but this is honestly at least bug number 20
Ryan H. Lewis
@rhl-
Dec 03 2016 21:17
we could flag it on github
Jack Poulson
@poulson
Dec 03 2016 21:17
and quite a few Intel compiler bugs
ubsan seems to be somewhat rough in Clang
(e.g., being non-functional on recent versions of OS X)