Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Christopher Davenport
    @ChristopherDavenport
    Was that fork the issue? That looks promising to what would result in a deadlock./
    Tim Spence
    @TimWSpence
    Sorry @ChristopherDavenport what looks like it would result in deadlock? I’ve got a branch with a rewrite of the pending txn logic which works most of the time but still deadlocks occasionally so there’s obviously something I’m still missing :sob:
    Christopher Davenport
    @ChristopherDavenport
    Do you have any work I could base investigation on? Would love to try to find whatever deadlock we’ve run into.
    Tim Spence
    @TimWSpence
    Thanks Chris! Yeah, I’ve struggled to find time the last week. I would love to fix it too! I’ve got some stuff I’ll push tomorrow morning
    Tim Spence
    @TimWSpence
    Hey @ChristopherDavenport I actually merged some of my changes as they fix some definite bugs around retries and they don’t always lock up when running the santa claus problem. I’ve also pushed to a branch debug-santa-claus that you might want to look at. At the moment, all it has is two print statements looping in fibers but the interesting thing is that they keep printing even when the santa problem stops so I don’t think we have thread deadlock. It looks to me like we aren’t correctly identifying/tracking opporunities to retry txns. The print shows that it gets stuck in a state where there are some pending transactions that never get retried
    Oleg Pyzhcov
    @oleg-py

    It looks to me like we aren’t correctly identifying/tracking opporunities to retry txns

    I had exactly the same issue in my STM, btw, and it was due to two threads acting on retries in parallel, one beginning a retry the very same moment another added one on top.

    • Everything just worked in a single thread setting
    • I ended up using a transaction ID to check "if a last-transaction-written has changed for any of the variables, retry immediately" (though it sounds easier than it was to actually fix, it took me several attempts)
    Tim Spence
    @TimWSpence
    Thanks @oleg-py I keep meaning to check your implementation out, it looks really interesting! In theory, all my state is modified inside a global lock so mine shouldn’t have that problem but I could well have missed something!
    Also a big fan of better monadic for btw :)
    Oleg Pyzhcov
    @oleg-py
    If I understand your code correctly, you modify the var state in lock, but not the callbacks in AtomicReferences
    You can also probably use STM.synchronized { ... } instead of semaphore for easy scala-js compatibility (synchronized does nothing on SJS, but using Java semaphore causes linking errors)
    Tim Spence
    @TimWSpence
    @oleg-py when did you lock at the code? I believe that should no longer be the case but it is Friday evening so who knows :joy: If it is still true, I would definitely like to know!
    Oleg Pyzhcov
    @oleg-py
    I looked yesterday :D Well, you still rerun pending without a lock (in flatTap), so it can race with adding them (with a lock). Also flatTap can cause problems wrt cancellation
    Tim Spence
    @TimWSpence
    😂 But the set of txns to rerun is collected while the lock is held? Or am I just being a complete idiot?
    Also thanks, what problems can it cause? I thought asyncwas uncancelable anyway?
    Oleg Pyzhcov
    @oleg-py
    Yeah, but the code in flatTap isn't
    Oleg Pyzhcov
    @oleg-py
    Didn't notice the collection - not sure it's enough though, because another transaction can happen just between the collection and execution
    Tim Spence
    @TimWSpence
    THanks for the tip about synchronized! I tried that and moved the retry logic inside the lock but that doesn’t fix it
    Which I think makes sense - if txn 1 succeeds and then txn 2 retries before 1 calls rerunPending, 2 shouldn’t be eligible to rerun anyway
    I think :joy:
    Oleg Pyzhcov
    @oleg-py
    Well, you might have a different problem with your implementation :D
    Tim Spence
    @TimWSpence
    :joy: Thanks for the help anyway! I made the change to synchronized anyway - it also simplifies the logic as it’s safe to call while you already have the lock. Really struggling to see where the bug is hiding now...
    Tim Spence
    @TimWSpence
    So I found the issue… TimWSpence/cats-stm#21 :joy: Seems like there might be some work to do on fairness - on my machine at least it seems to eventually only run the reindeer but apart from that it seems stable now
    @ChristopherDavenport in particular :point_up:
    Christopher Davenport
    @ChristopherDavenport
    Fairness is much better than deadlock.
    Tim Spence
    @TimWSpence
    Agreed!! I’m making a release now so that we have a working release out there
    Ross A. Baker
    @rossabaker
    :tada:
    Tim Spence
    @TimWSpence
    a depressingly simple fix in the end
    Christopher Davenport
    @ChristopherDavenport
    Well removing a succesful transaction seems… reasonable. :smile:
    Ross A. Baker
    @rossabaker
    $1 to remove the line, $9999 to know which one to remove.
    Christopher Davenport
    @ChristopherDavenport
    Note to self: Figure out how to get Ross’s consulting rates.
    Tim Spence
    @TimWSpence
    😂
    Ross A. Baker
    @rossabaker
    That one is often ascribed to Nikola Tesla visiting Henry Ford. Would have been a real good rate back then.
    Christopher Davenport
    @ChristopherDavenport
    That change looks great. So now for transaction fairness. I think the way the problem is designed is that the action is happening on the side so once we always have all the reindeers, it wins via that actions.reduceLeft(_.orElse(_))
    The print would need to take longer 0.01 seconds. Going to retry with higher values as it may be a matter of the concurrency rather than the atomicity.
    Christopher Davenport
    @ChristopherDavenport
    So I introduced a CS to shift and no change to final behavior.
    Christopher Davenport
    @ChristopherDavenport
    @TimWSpence So how are we doing? I’m still a little worried about our blockage here.
    Tim Spence
    @TimWSpence
    @ChristopherDavenport sorry, haven’t had much time to work on this recently! What did you mean by blockage sorry? I’ve had a bit of a look into retry fairness but haven’t gotten to the bottom of it yet
    PS you can see my attempted fix here: TimWSpence/cats-stm@11aef71 I might actually merge this anyway as it seems like the right thing to do - we will now retry transactions that have been waiting longer first. However it still doesn’t seem to fix the behaviour of the Santa Claus problem
    Tim Spence
    @TimWSpence
    Oh sorry, I just re-read your previous comment. I’ll try combining my change above with the one you suggested and see if that fixes it
    Christopher Davenport
    @ChristopherDavenport
    The issue is that without some level of fairness, we are in a position where I cant reliably expect an action that should logically occur to happen.
    As in the elves are ready but they are never happening and not because they weren’t waiting. As obviously they were done.
    Tim Spence
    @TimWSpence
    Yeah, so I believe the change linked above should fix the fairness. But I’m suspicous of this as well: https://github.com/TimWSpence/cats-stm/blob/master/examples/src/main/scala/io/github/timwspence/cats/stm/SantaClausProblem.scala#L74 I’m just trying to think through what that will do but creating new gates every time looks strange to me?
    And yes, I agree that lack of fairness is a significant problem :joy:
    Tim Spence
    @TimWSpence
    It think it is fine actually, it just wasn’t what I expected at first. Which leaves me back at square one :sob:
    Christopher Davenport
    @ChristopherDavenport
    Yeah. :cry:
    Tim Spence
    @TimWSpence
    Timer[IO].sleep(n.micros) this shouldn’t block a thread, should it? If I add a 0 to the random int bound then the whole program seems to block sometimes
    Christopher Davenport
    @ChristopherDavenport
    It shouldn’t block on a thread I believe
    Tim Spence
    @TimWSpence
    Yeah, I didn’t think so. I’ve probably done something else stupid
    Tim Spence
    @TimWSpence
    @ChristopherDavenport good news! After a 7 month sebatical (to be fair, I was on paternity leave for quite a bit of it), I’ve fixed our issues with this :)
    (Or at least the Santa Claus example now behaves as expected)