These are chat archives for elemental/chat

13th
Nov 2016
Ryan H. Lewis
@rhl-
Nov 13 2016 20:09
I can ask Steve to look at the apple toolchain issues
If we can send him something
Jack Poulson
@poulson
Nov 13 2016 20:54
I am looking into it now; Valgrind still doesn't seem to work on Sierra
but the crash appears to be happening in View
and only when compiling with -O3
it might be a copy elision issue, which would be a nightmare to debug, but hopefully it's something simpler
Ryan H. Lewis
@rhl-
Nov 13 2016 20:57
Heading home to work on BFGS shortly
Jack Poulson
@poulson
Nov 13 2016 20:58
let me know if I can provide more feedback on the PR
Ryan H. Lewis
@rhl-
Nov 13 2016 21:01
Well I need to rewrite the line search entirely
Based on those papers
Jack Poulson
@poulson
Nov 13 2016 21:08
there's no rush on good engineering
Ryan H. Lewis
@rhl-
Nov 13 2016 21:36
@poulson i took another look at the TSVD we were working on. I’m not sure how we would implement InfinityNorm(_) for an operator.
Jack Poulson
@poulson
Nov 13 2016 21:49
Hager-Higham-Tisseur algorithm
the segfaults disappear when print statements are added to View and Matrix::Attach...
Jack Poulson
@poulson
Nov 13 2016 23:01
it appears the print statement in Matrix::operator()( Range I, Range J) was all that mattered
I am convinced this is an issue with GCC and the OS X linker disagreeing
Ryan H. Lewis
@rhl-
Nov 13 2016 23:05
you absolutely can’t do that
mix gcc and osx linkers
why would that be happening
is the block 1-norm power method or an equivalent method already in Elemental?
also, thanks for the reference :)
Jack Poulson
@poulson
Nov 13 2016 23:08
@rhl- I think it is a backend issue with the linker, but I am not confident
Ryan H. Lewis
@rhl-
Nov 13 2016 23:09
are you using the gcc linker with gcc?
Jack Poulson
@poulson
Nov 13 2016 23:11
yes, whatever CMake suggests
nothing is funny there
Ryan H. Lewis
@rhl-
Nov 13 2016 23:16
@poulson why do you need 16 byte boundary alignment?
Jack Poulson
@poulson
Nov 13 2016 23:17
because the 128-bit SSE commands assume it
it is not something I specifically requested
also, GCC since 4.5 demands that, except in special cases, all stacks are aligned on 16-byte boundaries
Ryan H. Lewis
@rhl-
Nov 13 2016 23:19
weird..
Jack Poulson
@poulson
Nov 13 2016 23:19
though in cases where it believes other alignments do not pose a problem when the optimization level is high enough, it relaxes this constraint
I believe this is going haywire on OS X with -O3 for Matrix::operator()( Range<Int>, Range<Int> )
Ryan H. Lewis
@rhl-
Nov 13 2016 23:22
thats so weird. that code does very little
why is there even SSE involved there?
Jack Poulson
@poulson
Nov 13 2016 23:23
it is just the compiler choosing to move data to/from the stack in 128-bit chunks because it is faster
Ryan H. Lewis
@rhl-
Nov 13 2016 23:23
oh, curious.
@poulson i’ve updated my line search routine now, I am not 100% confident it is correct yet, but, I am following more carefully the writeup in N&W
Jack Poulson
@poulson
Nov 13 2016 23:25
+1
Ryan H. Lewis
@rhl-
Nov 13 2016 23:26
there algorithm 3.6 for Zoom is somewhat vague and requires careful attention
in particular, there is an “Interpolation” step, which I initially used bisection for, i’ve now upgraded it to a quadratic interpolant, followed by a potential cubic interpolant.
i do the cubic only if the quadratic value doesn’t provide sufficient decrease
they had a few alternative suggestions on line searches, but, they don’t explain how to combine them with Zoom for BFGS
Jack Poulson
@poulson
Nov 13 2016 23:27
have you observed that making a difference?
Ryan H. Lewis
@rhl-
Nov 13 2016 23:28
Well, I observed that I could generate examples where my existing BFGS would fail to converge
and when it fails it breaks down in really horrible ways
one would think that the quadratic/cubic interpolants can’t be worse than bisection.
Looking at the papers referenced (at least the ones I can actually read) there should be ways to improve the performance (and maybe stability?) of the update steps themselves, although, I regard this as a detail.
Jack Poulson
@poulson
Nov 13 2016 23:32
higher-order interpolation assumes more smoothness
Ryan H. Lewis
@rhl-
Nov 13 2016 23:32
I guess I’m thinking of convex problems
Jack Poulson
@poulson
Nov 13 2016 23:34
l1 penalties are common in convex opt. but are non-smooth
Ryan H. Lewis
@rhl-
Nov 13 2016 23:39
yeah, well, how would use you BFGS in the first place
you need a gradient
l1 is not differentiable..
Jack Poulson
@poulson
Nov 13 2016 23:47
twice-differentiable is preferred for BFGS but not required: http://www.cs.nyu.edu/~overton/papers/pdffiles/bfgs_inexactLS.pdf
I need to give that paper a read
Ryan H. Lewis
@rhl-
Nov 13 2016 23:48
it seems that for l1 regularized logistic regression people recommend not using BFGS
and using a modified quasi-newton approach
with the observation that that l1 function is usually differentiable.
Jack Poulson
@poulson
Nov 13 2016 23:53
it would be interesting to check the claim at the bottom of pg. 24 that the failure is due to rounding error by running in high-precision arithmetic
Ryan H. Lewis
@rhl-
Nov 13 2016 23:55
Writing unit tests for BFGS has proven to be challening
like how do you test the lineSearch routine?
as a subproblem, what are even some equivalance classes of inputs which are reasonable to try and cover?
Jack Poulson
@poulson
Nov 13 2016 23:57
conjecture 7.1 of the linked paper is that BFGS with random initialization converges with probability one if the function is semi algebraic and locally lipschitz
l1 penalties should be both lipschitz and semialgebraic
I haven't thought about semialgebraic functions much, but it seems that the graph being a finite union of polynomial inequalities should be enough (and the l1 penalty should be representable with 2^n linear inequalities)