These are chat archives for elemental/chat
./SVD: error while loading shared libraries: libEl.so: cannot open shared object file: No such file or directory
PATH=/usr/local/lib64/:$PATH ; export PATHdid not solve it. Then I tried it by putting the SVD executable file to the same directory where libEl.so is (i.e. /usr/local/lib64), but the same error.
./SVD --height 300 --width 300or
mpiexec ./SVD --height 300 --width 300takes just 0.13 s to run, while
mpiexec -n 2 ./SVD --height 300 --width 300takes 33 s,
mpiexec -n 4 ./SVD --height 300 --width 300takes 66 s and
mpiexec -n 4 ./SVD --height 300 --width 300takes 138 s? So they are doubling in execution time. And also interesting that even if I set the number of processes to either 2 or 4, all the 8 threads of my Core i7 processor are at full load.
export OMP_NUM_THREADS=1should fix it
export OPENBLAS_NUM_THREADS=1, it's the same. Our university cluster has about 5000 cores. Does Elemental scale until then?
mpiexec -n 1 ./SVD) and
tests/blas_like/Gemmas a sanity check
5 n^3flops, running at
1e10flops/second would take 13.5 seconds
tests/blas_like/Gemm, as it gives the local
Gemmspeed and the
Results with OMP_NUM_THREADS=4, OPENBLAS_NUM_THREADS=4,
mpiexec -n 1 ./Gemm --m 3000 --n 3000 for real matrices:
float: Stationary A algorithm - Finished in 0.0533109 seconds (33.7642 GFlop/s)
double: Stationary A algorithm - Finished in 0.112217 seconds (16.0404 GFlop/s)
quad: Stationary A algorithm - Finished in 66.826 seconds (0.0269356 GFlop/s)
Results with OMP_NUM_THREADS=1, OPENBLAS_NUM_THREADS=1,
mpiexec -n 4 ./Gemm --m 3000 --n 3000 for real matrices:
float: Stationary A algorithm - Finished in 0.0983532 seconds (18.3014 GFlop/s)
double: Stationary A algorithm - Finished in 0.22003 seconds (8.1807 GFlop/s)
quad: Stationary A algorithm - Finished in 20.5049 seconds (0.0877838 GFlop/s)
-bind-to hwthreadoption and then the
bind-to coreoption. Both of them resulted in basically no-speed up in the bandwidth.
Bidiagis scalable but SVD is not, that is a very strong sign that there could be a bug in the new distributed D&C