Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Lorenzo Stella
    @lostella
    here IndAffineUnscaled is a very simple modification of IndAffine, where the three lines that do the scaling are commented out
    the very big difference is in allocations: this is due to the fact that A and b are not scaled in-place in IndAffine
    Lorenzo Stella
    @lostella
    In fact by doing the scaling in-place, allocations get much closer to those of IndAffineUnscaled, and times as well
    but of course in that case one is modifying user-provided data
    Lorenzo Stella
    @lostella
    in any case I think this does not prevent us from merging julia0.6 into master, if everything in that branch is up-to-date with the new language features, and then address these other issues
    Mattias Fält
    @mfalt
    This benchmark looks ok, no need to wait with the merge
    I'll try to see if I can find the problematic example
    Mattias Fält
    @mfalt
    Great work on the package and the docs @lostella !
    Lorenzo Stella
    @lostella
    hey, thank you!
    note that the docs automatically deploy: each time something is pushed to master (or merged) then one of the Travis processes writes the compiled docs on the gh-pages branch upon test completion
    also: the “stable” version of the docs keeps track of the last tagged commit, while the “latest” refers to what’s currently on master
    (so for now the “stable” one is not there: we will create a new tag very soon I guess)
    Mattias Fält
    @mfalt
    Hey, I found a bug in my code which was incredibly hard to find. It turns out that the vector form for PSD cones is a bit differnt than what we are using in this package
    The vector is supposed to store the off-diagonal elements multiplied with sqrt(2) if we want it to be compatible with for example Convex.jl and MathProgBase
    Mattias Fält
    @mfalt

    This means that we have to multiply all off-diagonal elements by 1/sqrt(2) before we treat the elements as elements of a matrix, i.e. in prox_naive the line

    X, v = prox_naive(f, Symmetric(X), gamma)

    should be replaced by

    #scale off-diagonal by 1/sqrt(2)
    X, v = prox_naive(f, Symmetric(X), gamma)
    #scale off-diagonal by sqrt(2)

    or equivalently (more efficient), working with sqrt(2)*X and scaling back

    for i = 1:n
      X[i,i] .*= sqrt(2)
    end
    X, v = prox_naive(f, Symmetric(X), gamma)
    for i = 1:n
      X[i,i] ./= sqrt(2)
    end
    Mattias Fält
    @mfalt
    @lostella I saw that you are considering removing the structured tests as we have them now. I think there is definitely a point in being able to write specific tests for some functions, but I also think that the structured tests adds a lot of value
    Lorenzo Stella
    @lostella
    Hey @mfalt, yes, in fact I’m a bit uncertain about this
    Mattias Fält
    @mfalt
    If possible, it would be nice if we could make sure that it is still easy to automatically test a lot of things. For example, it would be nice to check if the function accepts complex numbers and automatically test that case too (if thats possible)
    Lorenzo Stella
    @lostella
    I’m just trying to find a nicer way (more modular) to organize tests
    Mattias Fält
    @mfalt
    With the structure we have, it should also be fairly simple to have some sort of automatic benchmark of the functions (which we talked about a while ago)
    Lorenzo Stella
    @lostella
    well, so far I have just been isolating those prox_test and call_test methods, so as to be able to use them in whatever test script — they just function as replacements for call and prox that do checks internally
    you’re right
    let’s see if we can find a more modular way to do all this though
    because a long list of dictionaries is not really handy :D
    Mattias Fält
    @mfalt
    yeah, I think you might be right that there is a point in doing it more modular
    as long as we test with prox_test, we could add additional functionality in prox_test that benchmarks and logs for example
    Lorenzo Stella
    @lostella
    then, for how I see it the most general structured test routine (instead of having test_calls and test_results and possibly others) would go like this:
    • for a given function f and input x, compute f(x), y = prox(f, x) (and gfx = gradient(f, x) if defined) and verify that they satisfy some prescribed properties
    prescribed properties can be whatever
    y being equal to some point, f(x) or f(y) having some value
    or y satisfying some optimality condition (approximately)
    now for each constructor one should be able to describe the whole suite of test cases
    in a modular way, and then have a script run through the union of all suits of test cases
    Mattias Fält
    @mfalt
    This sounds like a good structure
    Lorenzo Stella
    @lostella
    then yes, every call to f or prox or gradient will be done through the _test methods, in which we could log performance data as well, as you suggest
    Dmitry Volodin
    @mi-volodin
    I am making a
    Dmitry Volodin
    @mi-volodin
    Sorry, bad connection.
    I have deducted from the discussion above that I make test_graph a main test file for my case with all kind of tests, including the copy of one placed in test_calls. Right?
    So one it become modular, but still uses common framework functions like prox_test and etc
    "one" was a type :smile:
    Typo...
    Dmitry Volodin
    @mi-volodin
    @mfalt @lostella
    I think I need your advice. See 2606-2608 of current linux build in Travis CI.
    The IndAffine test is failed due to prox did not returned a point within a set.
    Surprisingly - it worked fine in Mac OS build.
    Possibly, the randomization is different, so the complex matrix generated in linux is different. But anyway - the f(prox(f,c)) == Inf which is not good
    patwa67
    @patwa67
    I was looking at the paper Giselsson et al. (2016) where a line search is suggested for the ADMM algorithm. I would be very interested to see an example of how this is implemented, for example in the ADMM lasso. It looks straightforward, but there are some sign differences compared to the example here that I don't understand. Does Mattias have any code available?
    Mattias Fält
    @mfalt
    I will take a look and see if I have any code ready for that line search. My plan is that this and other things will be available in https://github.com/mfalt/FirstOrderSolvers.jl if it is not already.
    But that package is currently under development
    Mattias Fält
    @mfalt
    @patwa67 I don't think there is a good implementation out there yet, but it is worth mentioning that the article is mainly about presenting the possibility of doing line search, is is not necessarily best to do line search in the residual direction as proposed. We are still trying to evaluate what the best way to do it is. I'm happy to anwer any questions though. Also, take a look at the SuperMann algorithm, another way of doing line search with Quasi-Newton directions:
    https://arxiv.org/abs/1609.06955
    I have one implementation here:
    https://github.com/mfalt/FirstOrderSolvers.jl/blob/supermann/src/solvers/supermann.jl
    but it is not tested properly (the whole package is under development, and this is on a separate branch)
    patwa67
    @patwa67

    I have used the following code for the ADMM lasso:

    #Line search
    c = 0.5
    lr = 1.0
    loss(z) = 0.5*norm(A*z-b)^2
    grad = A'*(A*x-b)
    while  loss(z) > (loss(x) + grad'*(-x) + (1.0/(2.0*lr))*norm(-x)^2)
      lr = lr * c
      println(lr)
    end

    It seems to work, but I don't know if it is theoretically correct. It is based on the Armijo rule for projected gradient ideas in Bertsekas (2016).

    patwa67
    @patwa67
    There is a paper (Proximal Algorithms in Statistics and Machine Learning, 2015, Stat.Sci., DOI: 10.1214/15-STS530) that contains some interesting proximal operators (Table 1) that maybe could be of interest for this package.
    Mattias Fält
    @mfalt
    Are we lacking a function of the sum of quadratic and indicator function, i.e. 1/2x^T Qx + i(Ax=b) ?