These are chat archives for elemental/chat

Mar 2017
Ryan H. Lewis
Mar 17 2017 04:36
What's wrong with .Get(i,j) or GetLocal()
There is also the QueueUpdate()/ProcessQueues stuff
Jack Poulson
Mar 17 2017 06:48
Hi @gpavanb, you are correct that there is no Proxy for El::DistMultiVec. But the point of El::DistMatrixReadProxy is that El::DistMatrix supports a wide variety of different data distributions and some routines are most effective in a particular one (but it would be inconvenient for users for the interface to only support a particular distribution).
So I am not sure of what functionality from El::DistMatrixReadProxy you're looking for since El::DistMultiVec only supports one distribution.
And, for what it's worth, El::DistMultiVec<T> is a quasi-deprecated, but not yet replaced, class that has a data distribution roughly equivalent to El::DistMatrix<T,El::VC,El::STAR,El::BLOCK>.
El::DistMultiVec<T> is a holdover from some work from several years ago and the long-term plan is to reimplement /generalize El::DistSparseMatrix<T> to support as many different 2D distributions as El::DistMatrix and to delete El::DistSparseMatrix's legacy dependencies on El::DistMultiVec in favor of switching over to El::DistMatrix in the process
I've been in the middle of the quagmire of revamping Elemental's Interior Point Methods for a while, and I'm not very close to finishing yet either
Pavan B Govindaraju
Mar 17 2017 12:04

Thanks @poulson , that was helpful indeed.

To elaborate on the particular functionality I am looking for, the problem I am dealing with is a El::DistSparseMatrix solve, which I am content with performing using El::LinearSolve. However, the solution vector must ideally be available to every processor in its entirety. (These values go in a non-contiguous fashion into an array)

I am aiming for scalability and using commands like Get or GetLocal, as mentioned by @rhl- are causing just the distribution of the solution vector to be way slower than the matrix solve itself. I would appreciate an appropriate suggestion for

1) The sparse matrix distribution
2) Method of access of solution vector : using Get, GetLocal or an appropriate ReadProxy
(The documentation mentions the presence of a more generic ReadProxy in Sec. 3.13.1, which doesn't seem to be supported)

Also, @rhl- , I am not quite sure what you meant by using QueueUpdate as the update here needs to be done on an array and not an Elemental object.

Pavan B Govindaraju
Mar 17 2017 12:21

More than that, there seem to be way too many parameters for me to play with

1) Number of processors (ideally, this should be the only independent quantity)
2) Grid dimensions (which in turns decides the number of processors in each grid block)
3) Block size

Also, the DistSparseMatrix might prefer one kind of breakdown as opposed to the DistMultiVec. It seems like a fine balance of parameters, which don't necessarily lead to the most optimal set for each part of the program, viz.,
a) matrix creation
b) linear solve
c) redistribution of solution

but cumulatively lead to the most efficient performance

Jack Poulson
Mar 17 2017 17:05
I would recommend using the El::DistMultiVec<T>::QueuePull and El::DistMultiVec<T>::ProcessPullQueueroutines rather than individually calling El::DistMatrix<T>::Get, which generally involves a broadcast
Also, I just reread the Sec. 3.13.1 documentation and am not sure what you're referring to that isn't supported.
Jack Poulson
Mar 17 2017 17:11
In terms of being able to configure grid dimensions: are you suggesting it is a problem that they can be configured? Certainly I wish it was the case that there existed a single, ideal, machine and algorithm-independent choice, but unfortunately one does not exist. If you do not care about performance optimization, the default, near-square dimensions should work decently well.
The algorithmic block size is again not algorithmic or machine independent, but the default value should be reasonably good.
Further, El::DistSparseMatrix<T> and El::DistMultiVec<T> are the roughest edges in the library right now and are in the wonderful state of being deprecated but not yet replaced.
I wouldn't expect very good performance from distributed sparse matrix-vector products with El::DistSparseMatrix<T> right now.
But the performance of the sparse-direct solver itself is completely unrelated to the details of the El::DistSparseMatrix<T>, with the exception of how the multifrontal tree is initialized from said matrix