Well, I observed that I could generate examples where my existing BFGS would fail to converge
and when it fails it breaks down in really horrible ways
one would think that the quadratic/cubic interpolants can’t be worse than bisection.
Looking at the papers referenced (at least the ones I can actually read) there should be ways to improve the performance (and maybe stability?) of the update steps themselves, although, I regard this as a detail.
conjecture 7.1 of the linked paper is that BFGS with random initialization converges with probability one if the function is semi algebraic and locally lipschitz
l1 penalties should be both lipschitz and semialgebraic
I haven't thought about semialgebraic functions much, but it seems that the graph being a finite union of polynomial inequalities should be enough (and the l1 penalty should be representable with 2^n linear inequalities)