Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Rohan Sircar
    @rohan-sircar
    The scaladex page isn't working, or is it just me? https://index.scala-lang.org/com-lihaoyi/os-lib
    utgheith
    @utgheith
    What guarantees does (should) os.watch give? exactly-once? at-most-once? at-least-once? The documentation hints at "at-least-once" but I observed that the behavior varies by system/environment and there is a test case that fails about 25% of the time because of a missing event.
    Li Haoyi
    @lihaoyi
    at least once
    We use os.watch ridiculously heavily at work (basically reimplementing Dropbox) and it generally works, so not sure what isues you're hitting
    like it can pick up and deliver 10000s of file/folder events per second on both OSX and linux. Once in a while I've noticed OSX giving me events out of order, but that's Apple's fault and not much I can do to paper over
    utgheith
    @utgheith
    One of the os.watch test cases fail about 25% of the time when github actions runs it. I can reproduce the failure locally in Docker with much lower probability. I believe it is caused by event reordering but wanted to confirm the expected semantics before attempting to fix it.
    checkChanges(
              os.remove.all(wd / "folder4"),
              Set(
                os.sub / "folder4",
                os.sub / "folder4" / "nestedA",
                os.sub / "folder4" / "nestedA" / "a.txt",
                os.sub / "folder4" / "nestedB",
                os.sub / "folder4" / "nestedB" / "b.txt"
              )
            )
    Every now and then the delete folder4 event arrives before the delete folder4/nestedA (or B) event. This causes the implementation to cancel the folder/nestedA watch and that update is lost.
    utgheith
    @utgheith
    I don't see how "at-least-once" would work given the possibility of overflow but I'm probably missing something. Just wanted to confirm the expected behavior before digging deeper.
    Li Haoyi
    @lihaoyi
    oh we don't run it inside docker, containerized filesystems may have their own quirks
    we only run our stuff uncontainerized
    the stuff which uses os.watch anyway
    might be fixable, as long as you can figure out what the empirical behavior of the docker filesystem event stream is
    fixable/work-aroundable/paper-overable
    utgheith
    @utgheith
    Makes sense. I don't recall seeing problems when running natively.
    utgheith
    @utgheith

    I take it back. Running something like:

    declare -i i=0; while ./mill -i os.watch.jvm[2.13.4].test; do i=$((i+1)); echo $i; done

    Produced this output:

    [103/103] os.watch.jvm[2.13.4].test.test 
    -------------------------------- Running Tests --------------------------------
    + test.os.watch.WatchTests.singleFolder 5784ms  
    1
    [103/103] os.watch.jvm[2.13.4].test.test 
    -------------------------------- Running Tests --------------------------------
    + test.os.watch.WatchTests.singleFolder 5786ms  
    2
    [103/103] os.watch.jvm[2.13.4].test.test 
    -------------------------------- Running Tests --------------------------------
    + test.os.watch.WatchTests.singleFolder 5778ms  
    3
    [103/103] os.watch.jvm[2.13.4].test.test 
    -------------------------------- Running Tests --------------------------------
    + test.os.watch.WatchTests.singleFolder 5786ms  
    4
    [103/103] os.watch.jvm[2.13.4].test.test
    ...
    + test.os.watch.WatchTests.singleFolder 5783ms  
    50
    [103/103] os.watch.jvm[2.13.4].test.test 
    -------------------------------- Running Tests --------------------------------
    + test.os.watch.WatchTests.singleFolder 5794ms  
    51
    [103/103] os.watch.jvm[2.13.4].test.test 
    -------------------------------- Running Tests --------------------------------
    X test.os.watch.WatchTests.singleFolder 3200ms 
      utest.AssertionError: assert(expectedChangedPaths == changedSubPaths)
      expectedChangedPaths: Set[os.SubPath] = HashSet(folder4/nestedB, folder4/nestedA, folder4, folder4/nestedA/a.txt, folder4/nestedB/b.txt)
      changedSubPaths: scala.collection.mutable.Set[os.SubPath] = HashSet(folder4/nestedA/a.txt, folder4/nestedB, folder4, folder4/nestedB/b.txt)
        utest.asserts.Asserts$.assertImpl(Asserts.scala:30)
        test.os.watch.WatchTests$.checkChanges$1(WatchTests.scala:52)
        test.os.watch.WatchTests$.$anonfun$tests$3(WatchTests.scala:93)
        test.os.TestUtil$.prep(TestUtil.scala:71)
        test.os.watch.WatchTests$.$anonfun$tests$2(WatchTests.scala:7)
    1 targets failed
    os.watch.jvm[2.13.4].test.test test.os.watch.WatchTests test.os.watch.WatchTests.singleFolder
    utgheith
    @utgheith
    Fails less often than when run in Docker but it still fails.
    Li Haoyi
    @lihaoyi
    what OS and filesystem are you on? that's probably the difference
    utgheith
    @utgheith
    I ran a few more experiments. The following configuration fails (as above) about 2% of the time:
    file system: ext4
    uname -a : Linux linux3 5.8.0-43-generic #49~20.04.1-Ubuntu SMP Fri Feb 5 09:57:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
    java -version: openjdk 11.0.10 2021-01-19
    Same configuration with ZFS finished 5000 iterations without failure.
    MacOS (11.2.3), openjdk version "15.0.2" 2021-01-19 also run ~5000 iterations without failure
    I will track this down and find the root cause but I wanted to confirm the expected behavior with you first.
    utgheith
    @utgheith
    It would be great if you took a look at the test case and confirmed that is has the correct expectations and I'll take it from there.
    Li Haoyi
    @lihaoyi
    your expectations are correct
    it should receive events in thhe order of deletion, which is children first before parents
    utgheith
    @utgheith

    Question 1: consistency between MacOS and Linux. Removing the watched directory (an obvious corner-case) behaves differently on MacOS and Linux. What is the expectation? Maybe it should be documented as undefined behavior?

    [Sample program] (https://github.com/utgheith/watch/blob/master/src/main/scala/main.scala)

    MacOS

    event
      data

    Linux

    event
      data/a
      data/b
    Li Haoyi
    @lihaoyi
    yeah possibly, we don't use that functionality at work so not surprising that it's inconsistent
    utgheith
    @utgheith

    Here is a stress test that leads to failure in 3 native environments: MacOS, Ubuntu/ext4, Ubuntu/ZFS. There seem to be 2 separate issues:

    (1) a race condition that causes the watch thread to die with FileNotFoundException.
    (2) events get reordered leading to premature watch cancelation on some subdirectories

    utgheith
    @utgheith
    I see the race condition leading to the first issue and the solution appear to be simple so far.
    The second problem requires a bit more thought. Ideas: (1) delay event processing and order delete events (reduces the probability of failure), (2) generate synthetic delete triggers for missing directories (will generate more events than expected and doesn't handle missing file deletes), (3) Don't use WatchService (not obvious how it would help), ... I'll keep digging.
    utgheith
    @utgheith
    Would it be acceptable for os.watch.watch to produce more events than strictly promised (with low probability)?
    Li Haoyi
    @lihaoyi
    for my use case, sure
    it doesn't really have a rigorous spec lol
    utgheith
    @utgheith
    If that's acceptable then there is a simple fix: the implementation os.watch.watch keeps track of all registered paths and refreshes the list on each event. We can generate a synthetic DELETE event for any missing entries.
    Li Haoyi
    @lihaoyi
    i think it sounds reasonable
    maybe put up a PR describing the problem and solution in more detail?
    utgheith
    @utgheith
    Implications:
    (1) only works for directories but I've empirically noticed that WatchService only drops directory DELETE events (at least in the stress tests I tried)
    (2) might surprise applications that are not prepared for the extra DELETE events
    (3) would violate the stated behavior for moved directories (for those who might interpret it as anything but a lower bound)
    (4) would require changing the tests to use set inclusion instead of equality (change in one function)
    Will do. Either way I think it is critical that we get this issue out of the way because it is blocking unrelated pull requests.
    Li Haoyi
    @lihaoyi
    I think duplicate events is probably fine
    utgheith
    @utgheith
    But it would be more than duplicate events in this case. It might generate "unexpected" events. For example: I interpret the following paragraph as a lower bound but someone that interprets it as strict equality might be confused by seeing DELETE events for sub directories of moved directories:
    Once the call to watch returns, onEvent is guaranteed to receive a an event containing the path for:
    
    - Every file or folder that gets created, deleted, updated or moved within the watched folders
    
    - For copied or moved folders, the path of the new folder as well as every file or folder within it.
    
    - For deleted or moved folders, the root folder which was deleted/moved, but without the paths of every file that was within it at the original location
    Li Haoyi
    @lihaoyi
    entirely unexpected events sounds a bit tough
    utgheith
    @utgheith
    But we're already in violation of the 3rd assertion (using the strict interpretation). DELETE events are being generated for deleted sub-folders but not for moved ones.
    Li Haoyi
    @lihaoyi
    yeah
    utgheith
    @utgheith

    I apologize for all the spam but here is a simple test that shows that the problem is in WatchService. os.watch is likely masking some of those issues in its implementation but not all.

    https://github.com/utgheith/watch3/blob/master/src/main/scala/main.scala

    utgheith
    @utgheith

    It seems that readme.md is using travis-ci badges instead of the github actions ones. Clicking on the build status goes back to a January build (telling me that the switch occurred around this time). Tried to submit a pull request that fixes that but it gets stuck and never runs the tests. This seems to be related to an issue that I don't fully understand but requires special privileges to deal with anyway. Here is a link:

    https://github.community/t/expected-waiting-for-status-to-be-reported/16727

    Lorenzo Gabriele
    @lolgab
    @utgheith The build now has a name: ci that's why the statuses don't match. Can you please rebase the PR on top of current master?
    utgheith
    @utgheith
    done
    Li Haoyi
    @lihaoyi
    @utgheith I would not be surprised by bugs in the underlying WatchService, would be nice if we could paper over it though. Not sure hwo hard that would be
    utgheith
    @utgheith
    How about using intotify directly? Similar to what happens on MacOS. But first I need to convince myself that the issues in WatchService are not cause by inotify :-)