Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Oct 21 2017 21:21
    @jpitts banned @Musk55
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<leonardoalt> that solc-js branch is a bit of a mess but makes it a lot easier to run Eldarica
<leonardoalt> hopefully it will get into master soon
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<dawit> Awesome, thank you for the detailed information! definitely gonna try this out
<leonardoalt> no worries, lmk if you run into any trouble
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
This message was deleted
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<dawit> I just went through the above steps to use solc with Eldarica and it works!
<dawit> I only had to change the version of solc in package.json to a more recent one
<leonardoalt> awesome!
<leonardoalt> I wonder why, I usually don't have to do that
<leonardoalt> ah you mean so it downloads a newer soljson?
<dawit> It was complaining about the options divModNoSlacks and invariants being unknown. Once I updated from 0.8.4 to 0.8.11 it worked
<leonardoalt> ah right, yes
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> Can the Solidity compiler output information about loop-heads? EVM bytecode doesn't have structured loops and detecting them at the bytecode level I'm pretty sure is impossible in general. So I think I need some hints from the compiler about "this JUMPI instruction is corresponding to a loop head or not".
<ehildenb> Basically, every way I've come up with for detecting loop-heads at the bytecode level, I've been able to come up with counterexamples (where the detection reports that it's a loop head, even if it's not).
<ehildenb> I'm working with limited knowledge, since I don't have the entire execution trace in-hand, I only have the branch points (which is when there is a JUMPI instruction). So it's possible that by storing more information during execution I can make the detection possible, I haven't considered that yet.
<leonardoalt> I guess looking at bytecode level only you'll always have false positives because of implicit things that become loops, like allocating a new array, ABI decoding/encoding, deleting an array, etc
<ehildenb> Yeah exactly, I have some examples of dynamic inputs, etc... that trigger it falsely sometimes.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> For example, if you have a function signature twoAddress(address a1, address a2), the compiler will factor out the code that computes x == (x & ((2 ^ 256) - 1)), and JUMP to it for each argument. This looks a lot like a loop, in fact when you jump to it the only differences each time you jump are some elements on teh wordstack (and the gas consumed), which looks a lot like a loop.
<ehildenb> The other heuristic I've looked at is "make sure that any elements which are on the wordstack and are valid jump destinations don't change between supposed loop iterations"
<ehildenb> But this is easy to write solidity code that tricks this heuristic (though the code itself is pathological)
<leonardoalt> I guess it shouldn't be hard to add info about loop heads. Is there an issue about that already?
<ehildenb> I don't see one.
<ehildenb> Want me to make one?
<leonardoalt> yea that'd be cool
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<leonardoalt> but here there's no jump back, right?
<leonardoalt> even if those annotations exist, another problem is keeping them across optimizations
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> Hmmmm, let me see if the direction we're jumping can be used as a heuristic.
<ehildenb> But my guess is no that it can't be.
<ehildenb> Because in the example above (the twoAddress function), the compiler can choose to put the code for checking the arguments before or after the main code path that we're in.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<leonardoalt> but does it become a loop in the cfg?
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> Yes, with naive loop detection it odes.
<ehildenb> And I'm not sure there is a more clever way to do it.
<ehildenb> Because it basically jumps to the same code twice, and the only difference is that we're looking at a different offset in the <callData>
<ehildenb> But otherwise the states are structurally identical, so it looks like a loop.
<ehildenb> But like I said, I'm only storing the branch points, so it may bethat if I store more information from betwene branch points, I can infer that this isn't a loop.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> Actually, thinking about this more, I think I'll go with a different more user-oriented approach anyway.Basically we can provide hints about what we think are loops, and in cases we're 100% sure about loop-identification then we can do the loop folding, but otherwise we leave it to the user to identify loops for us.
<ehildenb> Would be a lot easier if there was a distinction between branches and loops in EVM bytecode.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<leonardoalt> ah I see
Eth-Gitter-Bridge
@Eth-Gitter-Bridge

<leonardoalt> @ehildenb if I compile this contract with --asm:

contract L {
    function f(uint x) public pure {
        uint y;
        for (y = 0; y < x; ++y) {
        }
        assert(x == y);
    }
}

I see

/* "loop.sol":59:88  for (y = 0; y < x; ++y) {... */
    tag_9:
...
/* "loop.sol":59:88  for (y = 0; y < x; ++y) {... */
      jump(tag_9)
<leonardoalt> I haven't looked into the srcmap, but now I'm wondering if it's not there

<leonardoalt> Also, in

contract L {
    function f(uint[20] memory a) public pure {
        uint s;
        for (uint i = 0; i < 20; ++i) {
            s += a[i];
        }
    }
}

if you compile with --ir you will see two loops, but with in --asm you can see that one is annotated with the for loop, and the other is annotated with utility.yul, denoting that it's from a compiler routine (here it's abi decoding)

Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> Huh, TIL, that will likely help a lot.
<ehildenb> So it looks like compiler already puts in the information.
<ehildenb> Thank you!
<ehildenb> I am going to have a heuristic that will only attempt to fold a loop if no items on the wordstack are valid jump destinations which change between loop iterations, basically. This should be safe in the sense that we won't detect loops which aren't actually loops (loop abstraction results in loss of control flow precision, so we want to do it as little as possible), but also give the user or external tool the option to explicitely specify where the loops are.
<ehildenb> So that we can work on most bytecode anyway, but if you provide hints (like the above, we'll be able to work on your code too).
<ehildenb> I'm also realizing it's important to make it possible to manually identify loops externally because we want this tool to work with any semantics.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> But I think the heuristic of "wordstack contains no items which change between loop iteration 1 and loop iteration 2 where both the pre and post are valid jump destinations" should work most of the time, unless the code does something really silly like store the jump-back destination in the local memory....
<ehildenb> maybe it's better to instead calculate the state delta, and make sure that nowhere in the state do you have a change from one integer to another where both integers are valid jump destinations. Then you know for sure that your jump structure must be the same. Still would rule out automatic detection of loops where every iteration of the loop changes something in the state from one valid jump destination to another, but is not using that part of the state for actual control flow. Kinda a hacky heuristic.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<leonardoalt> yea I think that makes sense