These are chat archives for elemental/chat

25th
Feb 2017
Aidan Dang
@AidanGG
Feb 25 2017 04:33
Hi Jack, I've uploaded another (670 x 670 real double, 64-bit Int, 2x2 grid) matrix that's having an issue "SecularLast solver did not converge in 400 iterations": https://drive.google.com/open?id=0B3uyvVfsuP46X3J3N3VnQzlqb00
I'll try it on Debug to see if I have issues there.
Aidan Dang
@AidanGG
Feb 25 2017 05:26
Ok, so it does seem to be failing for me on Debug as well.
Jack Poulson
@poulson
Feb 25 2017 07:57
Are the other matrices passing?
Aidan Dang
@AidanGG
Feb 25 2017 07:58
Sorry, which other matrices, the last ones I sent you? Those ones work fine, but I can check that again.
Jack Poulson
@poulson
Feb 25 2017 07:59
no need to check again, I just wanted to clarify
Aidan Dang
@AidanGG
Feb 25 2017 08:02
The last one I sent is OK for me.
Jack Poulson
@poulson
Feb 25 2017 09:15
hmm, that succeeds for me with an up-to-date build using D&C
Aidan Dang
@AidanGG
Feb 25 2017 09:16
It could just be a thing with my gcc version. I'll check it on my local cluster with a different GCC.
Jack Poulson
@poulson
Feb 25 2017 09:17
are you sure that you are linking to an install of HEAD?
and picking up the right headers?
Aidan Dang
@AidanGG
Feb 25 2017 09:17
Yep, I did a clean build before sending it to you.
Jack Poulson
@poulson
Feb 25 2017 09:17
with that said, floating-point differs on different machines and you could be hitting a rare corner case that I'm not
would you mind running with the 'progress' field of 'SecularSVDCtrl' set to 'true'?
Aidan Dang
@AidanGG
Feb 25 2017 09:18
Sure, I'll try that.
Aidan Dang
@AidanGG
Feb 25 2017 09:36
If I up the secular iterations, it ends with
Relative interval is [0.979796,0.979796], sigmaEst=0.979796
Stepped out of bounds
Relative interval is [0.979796,0.979796], sigmaEst=0.979796
Stepped out of bounds
Relative interval is [0.979796,0.979796], sigmaEst=0.979796
Stepped out of bounds
Relative interval is [0.979796,0.979796], sigmaEst=0.979796
Stepped out of bounds
Relative interval is [0.979796,0.979796], sigmaEst=0.979796
Stepped out of bounds
Relative interval is [0.979796,0.979796], sigmaEst=0.979796
Stepped out of bounds
Relative interval is [0.979796,0.979796], sigmaEst=0.979796
Stepped out of bounds
Relative interval is [0.979796,0.979796], sigmaEst=0.979796
Stepped out of bounds
Relative interval is [0.979796,0.979796], sigmaEst=0.979796
Stepped out of bounds
Aidan Dang
@AidanGG
Feb 25 2017 13:04
Running on local cluster seems OK, so I'll just put it down to one of those rare issues.
Jack Poulson
@poulson
Feb 25 2017 18:21
does everything converge with one MPI process?
and, since it seems it is process 2 having issues on the first machine, with what I assume is a four process run, would you mind sending me the output with only process two having the secularCtrl.progress flag equal to true?
also, the output you sent doesn't have any error messages printed except for that process 2 aborted, which is strange
I think you're seeing a lot of noise from other processes and that it would be useful to only have process 2 print the secularCtrl.progress = true information
Jack Poulson
@poulson
Feb 25 2017 19:13
Also, if you recompile after inserting:
            Print( d, "d" );
            Output("rho=",rho);
            Print( z, "z" );
at line 1214 of src/lapack_like/spectral/SecularSVD.cpp, then it should print the diagonal plus rank-one update that caused the problem
I recently added such an output for most of the secular solves, but not for the very last index (which is what you're seeing)