These are chat archives for ethersphere/orange-lounge

29th
Sep 2017
Viktor Trón
@zelig
Sep 29 2017 07:15
but does it pass eventually?
@lmars I asked @fjl to advise on the hack commit. I dont think i want to implement the dial peer myself. Could you quickly recap what issue you ran into here? On the face of it, implementing it appears trivial
lash
@nolash
Sep 29 2017 08:20
It always passed so far
So actually, maybe my wording is a bit off. Perhaps it hangs, but the teardown after timeout works, at least.
Viktor Trón
@zelig
Sep 29 2017 08:47
well if there is a timeout then it does not pass :)
lash
@nolash
Sep 29 2017 08:47
I'll narrow my wording further'.
It says "PASS" in the terminal.
But yes, on the surface it seems like the same behavior as earlier, although I think it takes more nodes to get into trouble. I will try to implement a quick pss run later today and see how it goes.
What should be the actual state of the discovery_test.go for it to pass?
Viktor Trón
@zelig
Sep 29 2017 09:03
it is a proper test mate, it is says PASS it means it passed, i.e, starting from a ring, it bootstraps a completely healthy kademlia topology
the definition of health is:
  • all the nearest neighbours according to the test oracle peerpot ARE actually found and connected in the proxbin AND
  • all bins from 0 to proxbin have at least one connection (unless there are actually no such peers according to the oracle peer pot)
lash
@nolash
Sep 29 2017 09:06
Sounds good
Viktor Trón
@zelig
Sep 29 2017 09:07
looking forward to pss tests :fire:
Lewis Marshall
@lmars
Sep 29 2017 10:25
@zelig with respect to adding DialPeer, I spent about an hour on it and realised it would probably take me about another few hours at least, and I didn't have the time
Aron
@homotopycolimit
Sep 29 2017 14:41
Just booked my flights to DevCon3 :D
holisticode
@holisticode
Sep 29 2017 17:37
@lmars thanks for your latest review
I addressed most points, but there are two things left to discuss
the other one is a weird behavior I had with my latest version of the mocker_test.go
If I don't add this line to p2p/simulations/mocker.go:
the following net.Connect() fails because it's saying that node X doesn't exist - as if it wouldn't have time to start up the nodes
holisticode
@holisticode
Sep 29 2017 17:42
this is weird as we have been using this same code in other parts and never had any problems
maybe it's related to the new code in p2p/simulations/mocker_test.go, but I can't see why
holisticode
@holisticode
Sep 29 2017 17:49
basically this test now added a subscription to events in order to address your comment regarding waiting for all upevents instead of letting the code time.Sleep().
Without this subscription, I don't get the problem described above, but if I add it, I need to add the timeout in mocker.goto avoid the problem....pretty weird in my eyes, as it doesn't seem to be related?
I know you are pretty busy, just wanted to progress on the PR addressing your comments and reporting my status
Lewis Marshall
@lmars
Sep 29 2017 19:28
that does look odd, I'll take a closer look later
as for the "why is this in a goroutine?" question, if there is no reason for it to be then take it out of the goroutine
holisticode
@holisticode
Sep 29 2017 19:29
ok
here's the stack trace for the other problem:
=== RUN   TestMsgFilterFailBadParams
--- PASS: TestMsgFilterFailBadParams (0.00s)
=== RUN   TestMocker
--- PASS: TestMocker (0.01s)
=== RUN   TestNetworkSimulation
panic: Could not startup node network for mocker

goroutine 679 [running]:
github.com/ethereum/go-ethereum/p2p/simulations.probabilistic(0xc4202b0300, 0xc420242b40, 0xa)
    /home/fabio/go/src/github.com/ethereum/go-ethereum/p2p/simulations/mocker.go:107 +0x5d1
created by github.com/ethereum/go-ethereum/p2p/simulations.(*Server).StartMocker
    /home/fabio/go/src/github.com/ethereum/go-ethereum/p2p/simulations/http.go:348 +0x1c1
exit status 2
FAIL    github.com/ethereum/go-ethereum/p2p/simulations    0.107s
what I is baffling to me:
It seems TestMockeractually PASS, but it's TestNetworkSimulation which panics?
But the latter doesn't use the mocker???
And adding the mocker makes everything pass....
Lewis Marshall
@lmars
Sep 29 2017 19:32
add the error message to your panic
we should also avoid code that panics
holisticode
@holisticode
Sep 29 2017 19:33
I guess though it's due to this line in http.go:
go mockerFn(s.network, s.mockerStop, nodeCount)
so the panic occurs in the mockerFn, the test completes and thus the panic occurs if it's later
Lewis Marshall
@lmars
Sep 29 2017 19:35
right, so really the mocker should be stopped when the network shuts down
so perhaps mockerStop needs to be on Network and then net.Shutdown closes mockerStop
otherwise, the mocker will be trying to take actions on a network which is no longer running
holisticode
@holisticode
Sep 29 2017 19:37
ok this sounds like right
but before we do that
why is it happening at all
    //time.Sleep(3 * time.Second)
    for i, id := range ids {
        peerID := ids[(i+1)%len(ids)]
        if err := net.Connect(id, peerID); err != nil {
      fmt.Println(err)
            log.Error("Error connecting a node to a peer! %s", err)
            return nil, err
        }
    }
if the Sleep above is commented, the Println puts node 79724ad3e957268f69a9b590a0b8d2dd5a2a30ce01a751c36fda1ca206582fad8ab44f5671af0aaacc70ee988697c9d150c049d372873a01373dc557cf33cecd does not exist
if the Sleep'is activated, there's no problem....
you actually said you'd have a look later, so don't worry
Lewis Marshall
@lmars
Sep 29 2017 19:39
do you print after net.NewNode to check it does exist?
holisticode
@holisticode
Sep 29 2017 19:39
I am just trying to make the problem clearer
I've done debugging yes, and it seems to be there
Lewis Marshall
@lmars
Sep 29 2017 19:41
is perhaps net.Reset being called somewhere? I've got to go, back later
holisticode
@holisticode
Sep 29 2017 19:42
don't worry, thanks a lot
Reset is called, but later
and net.Shutdown() is called in a defer func() {}
holisticode
@holisticode
Sep 29 2017 19:50
I believe I know what it is
the subscription for events only listens for node "up" events
If it got all 10 node up events, it thinks the job is done and continues with the test
which at some points issues Reset()
in the meantime though the mocker is trying to do net.Connet(), which then breaks
so either we leave a time.Sleep()somewhere, to give the mocker some time to complete it's code
or we need the subscription to wait for more events before considering the test complete (as it is a connectoin in ring, we could probably just wait for nodeCount connect events?)
holisticode
@holisticode
Sep 29 2017 20:04
hmmm not sure what's the safest amount here though - as it seems as the network starts to build up, more conn events are generated than the "literal" ones due to net.Connect()
this seems to work but I'd welcome an informed guess on what's safe here:
          if connCount == (nodeCount-1)*2 {
            wg.Done()
                        return
lash
@nolash
Sep 29 2017 20:55
I am getting "already connected" errors in the network sim when loading snapshot, with discovery disabled. That shouldn't happen, right?
Lewis Marshall
@lmars
Sep 29 2017 21:54
@holisticode the mocker should be stopped in net.Shutdown, that way it won't be trying to do anything once the test passes and the network shuts down
@nolash that ticker just keeps checking for healthy states every second
it is to prevent a race where an event is received which should make a node healthy but the node hasn't yet reached that conclusion, so we just keep checking
Lewis Marshall
@lmars
Sep 29 2017 22:00
@holisticode I must say, it seems this background goroutine style mocking is causing bugs and race conditions which would not exist if the mocking was done client side :grin:
holisticode
@holisticode
Sep 29 2017 22:44
might well be, but on the other hand we wouldn't be able to test the same way as now, and miss things (not impossible, just matter of effort, arch, style,...)