Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Arie Gurfinkel
    @agurfinkel
    benchmarks vs competition: I propose (a modified) HWMCC model
    1. we collect all benchmarks that can potentially be converted to something the community is interested in. They leave in separate repos and are curated by the group interested in them (e.g., VMT benchmarks from Alberto). We don't worry about format. The community of interest will, since otherwise their benchmarks will not be used by tool builders.
    1. Each competition, we select subset of benchmarks for each category. I don't know the selection process. We will figure it out as we go. The goal is to be somewhat representative, reduce noise (i.e., similar benchmarks), while reducing running time (i.e., you should not need a huge cluster and years of cpu time to reproduce competition run for a few tools)
    As such, right now, we only need to worry what format we want for the competition benchmarks. We will then ensure that during our selection process we end up only with benchmarks that fit the format. This is likely to require conversion tools, but they can be highly specialized to the benchmarks we have now (which is much easier)
    single vs multi-query: it is easier to judge when solving a benchmark == solving one query
    Arie Gurfinkel
    @agurfinkel
    As Philipp points out, it is easy to encode many queries into one. Sometimes, it is useful to decode single query into many (based on tools that produced the benchmark)
    in HWMCC, there are separate tracks: single and multiple for single and multiple properties, respectively. Most tools participate in single track only. Multi-property track has more complicated rules (to address @dddejan concerns) and is geared towards solvers that do something special for multiple queries (as opposed to encode them into a single query). There is also a question of interpretation -- with multiple properties in HWMCC is to detect the subset of queries that are true. This is harder to define for CHC.
    In summary, for this year, let's come up with the format(s) for the competition. Let's populate initial benchmark set for the competition. Figure out how to run everything (starexec, rules, collecting results, t-shirts for participants, sponsors?!, ) etc. And once we have everything going, next year we can discuss how to tweak things to go to the direction community is more interested in (SMT-LIB logic specification, support by other tools (i.e., pysmt), categories, tracks, etc.)
    Arie Gurfinkel
    @agurfinkel
    @/all thanks for commenting on the format. I've incorporated the comments and started a short discussion. The discussion requires your vote, please do so as quickly as you can. For now, I'll maintain the discussion in an ad-hoc google doc. We can use fancier technology if this will start to get out of hand.
    @AdrienChampion , can you look into creating scripts to check that benchmark is in our suggested format, and a convert into that format when it is not hard
    @NikolajBjorner suggested to use z3 with python bindings for that. That would be my choice. @dddejan already tried it out (there is a link to his repo somewhere here)
    @/all suggestions and volunteers are welcome ;)
    Dejan Jovanović
    @dddejan
    Adrien Champion
    @AdrienChampion

    https://github.com/dddejan/chctranslate
    Right?

    According to the readme, it seems to be more than a translation as it produces a version that's not equivalent (although equi-satisfiable). Are we fine with this?

    Arie Gurfinkel
    @agurfinkel
    Well, according to the code it's a tiny example that uses Z3 to parse the input and then executes horn-simplify tactic on the input.
    Equi-satisfiable is ok. More important is that we can guarantee the format
    In the worst case, we will include the benchmark not as it was intended. This is better than including a malformed benchmark.
    Note that we will not (now) try to normalize all the benchmarks. But we will only include normalized benchmarks in the competition
    In other words, benchmarks that are hard to normalize simply will not make it into the competition in this round
    Adrien Champion
    @AdrienChampion

    I modified the code from @dddejan (thanks!) so that it formats the benchmarks. I think the options I use currently produce equivalent benchmarks.

    Further testing is needed, but for now I'd like to put it on a repo somewhere. Should I submit a PR to @dddejan 's repo or create one on chc-comp? Or something else?

    Arie Gurfinkel
    @agurfinkel
    create repo on chc-comp

    Can someone take a look at the code checking the clauses or play with it a bit to find stupid mistakes (if any)?

    I ran it on all the benchmarks from https://github.com/chc-comp/hopv and it looks fine so far. I'll run on other bench repos later.

    Arie Gurfinkel
    @agurfinkel
    @AdrienChampion , I've left some comments in the commit on ghub. Unfortunately, ghub does not make it easy to comment on non-pull-requests. The basic structure looks good. I'll look at it some more
    Adrien Champion
    @AdrienChampion
    I (think I) addressed all your comments. Look in test/check_err and test/check_ok which are system the checker should reject and accept respectively.
    Adrien Champion
    @AdrienChampion
    There's one problem left though, z3 prints the tail of the clause as a let binding when it's above a certain length.
    (let ((a!1 (and <tail>))) (=> a!1 <head>))
    I could not find a way to tell it not to do that. @/all Anyone knows whether there's a solution? Or should I just print the clauses "manually"?
    Arie Gurfinkel
    @agurfinkel
    there is a parameter, but it is global for the whole formula, so it will suppress let binders everywhere in the formula
    printing conjuncts separately is more stable
    Adrien Champion
    @AdrienChampion
    Okay, I'll do that then, thanks
    Arie Gurfinkel
    @agurfinkel
    @/all first draft of the format. Please comment: https://chc-comp.github.io/2018/format.html
    Adrien Champion
    @AdrienChampion

    Since we're formatting everything anyway, wouldn't it be simpler to have

    ```

    (Sorry)
    chc_tail    ::=
      | (and (u_predicate var+)* i_formula*)
    (Also note that (u_predicate var*) needs to be changed to (u_predicate var+) everywhere.)
    Adrien Champion
    @AdrienChampion
    It makes rules (and formatting) easier IMHO.
    Sorry to go back to this but looking at the rules I think u_predicate should either always or never be required to be over distinct variables. (I'm fine either way.) I think it would make the rules simpler and less confusing thanks to uniformity.
    Adrien Champion
    @AdrienChampion
    Sorry, for chc_tail I meant
    chc_tail    ::=
      | (and (u_predicate var+)* i_formula)
    Adrien Champion
    @AdrienChampion
    As Arie said a while ago it's not very convenient to discuss this here, I updated the google doc
    https://docs.google.com/document/d/1JL1_3Rnfaa1qyhV-ZiSX1HCPqECtijnEZ8DsCBY_f0M/edit#
    Arie Gurfinkel
    @agurfinkel
    The only significant chance I can spot is that multiple i_formula must be allowed in chc_tail. I've extended the format with that.
    For nullary predicates, there is no agreement right now
    For distinct variables -- the current agreement is that there is no restriction on what the predicates are applied to in the chc_tail, but the head must be applied to distinct variables
    The rationale is that if the head is not applied to distinct variables, this is adds an implicit equality constraint in chc_tail. It is simpler to not have implicit constraints in the tail.
    The same does not apply to the tail because the implicit constraints are entailed by constraints that are already in the tail
    Dejan Jovanović
    @dddejan
    Was that a typo, or does Adrien really want multiple i_formulas in rules? (considering the "Sorry" comment")
    Adrien Champion
    @AdrienChampion

    I actually don't want anything, neither as a participant for my tool or an organizer for the formatting script. Whatever the rules end up being my tool will support them (modulo bug) and I'll modify the script in consequence. I was trying to make the rules simpler.

    Regarding chc_tail my point was simply that we can remove the first two variants and only keep the and one.

    The formatting script produces systems in a fragment a bit more restrictive than the one from the rules, I'll update the google doc shortly if you want to take a look.

    Adrien Champion
    @AdrienChampion
    Done, see the last page of the google doc. I argue these rules are simpler, but again, I have no reason to champion (!) them. They just happen to be how the script works atm.
    Arie Gurfinkel
    @agurfinkel
    @dddejan , I don't think it is a type. I'd like the tail to have things of the form (and (Inv x y) (> x 0) (> y 0)) instead of (and (Inv x y) (and (> x 0) (> y 0)))
    I don't think it makes a difference to any of the tools
    Adrien Champion
    @AdrienChampion
    Is the formatting script expected to support non-SMT-LIB benchmarks such as the following one?
    https://github.com/chc-comp/hcai-bench/blob/master/svcomp/O3/id_i15_o15_false-unreach-call_true-termination.smt2
    z3 does not like it at all...
    Arie Gurfinkel
    @agurfinkel
    yes. that's z3 specific format :)