Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Oct 21 2017 21:21
    @jpitts banned @Musk55
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<leonardoalt> even if those annotations exist, another problem is keeping them across optimizations
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> Hmmmm, let me see if the direction we're jumping can be used as a heuristic.
<ehildenb> But my guess is no that it can't be.
<ehildenb> Because in the example above (the twoAddress function), the compiler can choose to put the code for checking the arguments before or after the main code path that we're in.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<leonardoalt> but does it become a loop in the cfg?
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> Yes, with naive loop detection it odes.
<ehildenb> And I'm not sure there is a more clever way to do it.
<ehildenb> Because it basically jumps to the same code twice, and the only difference is that we're looking at a different offset in the <callData>
<ehildenb> But otherwise the states are structurally identical, so it looks like a loop.
<ehildenb> But like I said, I'm only storing the branch points, so it may bethat if I store more information from betwene branch points, I can infer that this isn't a loop.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> Actually, thinking about this more, I think I'll go with a different more user-oriented approach anyway.Basically we can provide hints about what we think are loops, and in cases we're 100% sure about loop-identification then we can do the loop folding, but otherwise we leave it to the user to identify loops for us.
<ehildenb> Would be a lot easier if there was a distinction between branches and loops in EVM bytecode.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<leonardoalt> ah I see
Eth-Gitter-Bridge
@Eth-Gitter-Bridge

<leonardoalt> @ehildenb if I compile this contract with --asm:

contract L {
    function f(uint x) public pure {
        uint y;
        for (y = 0; y < x; ++y) {
        }
        assert(x == y);
    }
}

I see

/* "loop.sol":59:88  for (y = 0; y < x; ++y) {... */
    tag_9:
...
/* "loop.sol":59:88  for (y = 0; y < x; ++y) {... */
      jump(tag_9)
<leonardoalt> I haven't looked into the srcmap, but now I'm wondering if it's not there

<leonardoalt> Also, in

contract L {
    function f(uint[20] memory a) public pure {
        uint s;
        for (uint i = 0; i < 20; ++i) {
            s += a[i];
        }
    }
}

if you compile with --ir you will see two loops, but with in --asm you can see that one is annotated with the for loop, and the other is annotated with utility.yul, denoting that it's from a compiler routine (here it's abi decoding)

Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> Huh, TIL, that will likely help a lot.
<ehildenb> So it looks like compiler already puts in the information.
<ehildenb> Thank you!
<ehildenb> I am going to have a heuristic that will only attempt to fold a loop if no items on the wordstack are valid jump destinations which change between loop iterations, basically. This should be safe in the sense that we won't detect loops which aren't actually loops (loop abstraction results in loss of control flow precision, so we want to do it as little as possible), but also give the user or external tool the option to explicitely specify where the loops are.
<ehildenb> So that we can work on most bytecode anyway, but if you provide hints (like the above, we'll be able to work on your code too).
<ehildenb> I'm also realizing it's important to make it possible to manually identify loops externally because we want this tool to work with any semantics.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> But I think the heuristic of "wordstack contains no items which change between loop iteration 1 and loop iteration 2 where both the pre and post are valid jump destinations" should work most of the time, unless the code does something really silly like store the jump-back destination in the local memory....
<ehildenb> maybe it's better to instead calculate the state delta, and make sure that nowhere in the state do you have a change from one integer to another where both integers are valid jump destinations. Then you know for sure that your jump structure must be the same. Still would rule out automatic detection of loops where every iteration of the loop changes something in the state from one valid jump destination to another, but is not using that part of the state for actual control flow. Kinda a hacky heuristic.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<leonardoalt> yea I think that makes sense
<leonardoalt> why no changes?
<ehildenb> When the compiler factors out a piece of code to be reused (but it's not a loop, it's just re-used in a bunch of places, eg. internal functions or argument processing routines), it stores the location to jump back to on the wordstack.
<ehildenb> So in the twoAddress example above (which actually comes from a transferFrom example), the CFG builder sees "hey, I've seen this state before", because everything is identical except for (i) the argument it's processing (so the offset into th ecalldata that shows up on the wordstack), and (ii) the argument on the wordstack representing the location to jump back to.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> Specifically, here is what the state delta looks like:
https://cdn.discordapp.com/attachments/798403407076917269/933107250984267886/message.txt
<ehildenb> If you look closely, you'll see the only change in pre and post state are on the <gas> cell, and the <wordStack> cell.
<ehildenb> The <gas> cell change is expected.
<ehildenb> Here we're trying to decide "is this a loop or not?" My old heuristic was "<k>, <pc>, and <callData>" cells are the same, so it returned "true, this is a loop", even though its not, it's just two jumps to the same piece of code which process the address arguments.
<ehildenb> So I'm trying to come up with a better heuristic.
<ehildenb> And the only telling thing that this isn't a loop is that the 6th argument on the <wordStack> goes from 306 => 323, and that is the argument that is used as the "jump-back" destination after the argument processing.
<ehildenb> For a loop, that "jump-back" argument, if it's present (depending on how the loop is structured), should always be the same, so there shouldn't be a state delta.
<ehildenb> I'd rather error on the side of assuming things aren't loops. So in this case, I basically decided that if the state-delta contains a rewrite anywhere from one valid jump destination (306) to another different one (323) (and we already have the valid jump destinations computed in the <jumpDests> cell), then do not consider the two states in looping relation.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> So you can trick the tool into thinking things aren't loops by writing pathalogical code (think something that every iteration of a loop wrote a different valid jump destination to storage slot 0 and did nothing with it), but in that case you just lose out on automatic loop detection.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge

<ehildenb> Not sure if the above makes sense, but this new heuristic:

def kevmShouldSubsume(constrainedTerm1, constrainedTerm2):
    if all([getCell(constrainedTerm1, cn) == getCell(constrainedTerm2, cn) for cn in ['PC_CELL', 'K_CELL', 'CALLDATA_CELL', 'PROGRAM_CELL', 'JUMPDESTS_CELL']]):
        jumpDests   = [ i['args'][0] for i in flattenLabel('_Set_', getCell(constrainedTerm1, 'JUMPDESTS_CELL')) ]
        (state1, _) = splitConfigAndConstraints(constrainedTerm1)
        (state2, _) = splitConfigAndConstraints(constrainedTerm2)
        stateDelta  = pushDownRewrites(KRewrite(state1, state2))
        rewrites    = []
        def _collectRewrites(_k):
            if isKRewrite(_k) and isKToken(_k['lhs']) and isKToken(_k['rhs']):
                rewrites.append((_k['lhs'], _k['rhs']))
            return _k
        traverseBottomUp(stateDelta, _collectRewrites)
        return not any([(i != j) and (i in jumpDests) and (j in jumpDests) for (i, j) in rewrites])
    return False

checks that:

  • All of the <pc>, <k>, <callData>, <program>, and <jumpDests> cells are identical (could probably also add <callDepth> to this one)
  • There are no deltas in the state between two integers that are valid jump destinations.

This correctly identifies the loop in this code:

    function sum(uint n) external pure returns (uint s) {
        s = 0;
        uint i = n;
        while (0 < i) {
            s = s + i;
            i = i - 1;
        }
        return s;
    }

while correctly not identifying a loop in this code:

    function twoAddress(address a1, address a2) public pure returns (uint256) {
        if (a1 == a2) {
            return 0;
        }
        return 1;
    }
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<leonardoalt> ah ok yes, I see now
<leonardoalt> how about actual loops that are internally generated, like abi decoding? What does the new heuristic do?
Eth-Gitter-Bridge
@Eth-Gitter-Bridge

<ehildenb> I'm testing it out. In this example:

contract LoopInput {
    uint x = 0;

    function addLoop(uint256[] memory intArr) public {
        uint256 tmpSum = 0;
        uint256 i      = 0;
        for (i = 0; i < intArr.length; i++) {
            tmpSum += intArr[i];
        }
        x = tmpSum;
    }
}

It seems to correctly detect the loop, but loses too much information during loop abstraction and has messy control-flow afterward.

<ehildenb> It also takes several iterations of the loop to abstract it properly. Not sure if the loop is from the actual for loop I write or from abi decoding.
<ehildenb> Haven't looked closely enough.
<ehildenb> So it detects a loop (purple node), and looks like it's beginning to detect some loops later (blue nodes with self-edges).
https://cdn.discordapp.com/attachments/798403407076917269/933357163680788510/Screenshot_from_2022-01-19_06-47-24.png
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<leonardoalt> but there should be two actual loops
<leonardoalt> yours and the decoding one
<ehildenb> Yeah, I think so.
<ehildenb> Just need to wait, it's very slow.
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
Eth-Gitter-Bridge
@Eth-Gitter-Bridge
<ehildenb> Well gosh darn, it worked! It shows the second loop (the actual for-loop I wrote) 3 times because it took two iterations of the first loop (the abi decoding) to find out that it was indeed a loop. So the first copy of the last loop is "if there were no iterations of the first loop", the second one is "if there was one iteration of the first loop", and the last one is "if there were N>1 iterations of the first loop".
https://cdn.discordapp.com/attachments/798403407076917269/933507949320601690/Screenshot_from_2022-01-19_16-45-02.png