Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    utgheith
    @utgheith
    I don't see how "at-least-once" would work given the possibility of overflow but I'm probably missing something. Just wanted to confirm the expected behavior before digging deeper.
    Li Haoyi
    @lihaoyi
    oh we don't run it inside docker, containerized filesystems may have their own quirks
    we only run our stuff uncontainerized
    the stuff which uses os.watch anyway
    might be fixable, as long as you can figure out what the empirical behavior of the docker filesystem event stream is
    fixable/work-aroundable/paper-overable
    utgheith
    @utgheith
    Makes sense. I don't recall seeing problems when running natively.
    utgheith
    @utgheith

    I take it back. Running something like:

    declare -i i=0; while ./mill -i os.watch.jvm[2.13.4].test; do i=$((i+1)); echo $i; done

    Produced this output:

    [103/103] os.watch.jvm[2.13.4].test.test 
    -------------------------------- Running Tests --------------------------------
    + test.os.watch.WatchTests.singleFolder 5784ms  
    1
    [103/103] os.watch.jvm[2.13.4].test.test 
    -------------------------------- Running Tests --------------------------------
    + test.os.watch.WatchTests.singleFolder 5786ms  
    2
    [103/103] os.watch.jvm[2.13.4].test.test 
    -------------------------------- Running Tests --------------------------------
    + test.os.watch.WatchTests.singleFolder 5778ms  
    3
    [103/103] os.watch.jvm[2.13.4].test.test 
    -------------------------------- Running Tests --------------------------------
    + test.os.watch.WatchTests.singleFolder 5786ms  
    4
    [103/103] os.watch.jvm[2.13.4].test.test
    ...
    + test.os.watch.WatchTests.singleFolder 5783ms  
    50
    [103/103] os.watch.jvm[2.13.4].test.test 
    -------------------------------- Running Tests --------------------------------
    + test.os.watch.WatchTests.singleFolder 5794ms  
    51
    [103/103] os.watch.jvm[2.13.4].test.test 
    -------------------------------- Running Tests --------------------------------
    X test.os.watch.WatchTests.singleFolder 3200ms 
      utest.AssertionError: assert(expectedChangedPaths == changedSubPaths)
      expectedChangedPaths: Set[os.SubPath] = HashSet(folder4/nestedB, folder4/nestedA, folder4, folder4/nestedA/a.txt, folder4/nestedB/b.txt)
      changedSubPaths: scala.collection.mutable.Set[os.SubPath] = HashSet(folder4/nestedA/a.txt, folder4/nestedB, folder4, folder4/nestedB/b.txt)
        utest.asserts.Asserts$.assertImpl(Asserts.scala:30)
        test.os.watch.WatchTests$.checkChanges$1(WatchTests.scala:52)
        test.os.watch.WatchTests$.$anonfun$tests$3(WatchTests.scala:93)
        test.os.TestUtil$.prep(TestUtil.scala:71)
        test.os.watch.WatchTests$.$anonfun$tests$2(WatchTests.scala:7)
    1 targets failed
    os.watch.jvm[2.13.4].test.test test.os.watch.WatchTests test.os.watch.WatchTests.singleFolder
    utgheith
    @utgheith
    Fails less often than when run in Docker but it still fails.
    Li Haoyi
    @lihaoyi
    what OS and filesystem are you on? that's probably the difference
    utgheith
    @utgheith
    I ran a few more experiments. The following configuration fails (as above) about 2% of the time:
    file system: ext4
    uname -a : Linux linux3 5.8.0-43-generic #49~20.04.1-Ubuntu SMP Fri Feb 5 09:57:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
    java -version: openjdk 11.0.10 2021-01-19
    Same configuration with ZFS finished 5000 iterations without failure.
    MacOS (11.2.3), openjdk version "15.0.2" 2021-01-19 also run ~5000 iterations without failure
    I will track this down and find the root cause but I wanted to confirm the expected behavior with you first.
    utgheith
    @utgheith
    It would be great if you took a look at the test case and confirmed that is has the correct expectations and I'll take it from there.
    Li Haoyi
    @lihaoyi
    your expectations are correct
    it should receive events in thhe order of deletion, which is children first before parents
    utgheith
    @utgheith

    Question 1: consistency between MacOS and Linux. Removing the watched directory (an obvious corner-case) behaves differently on MacOS and Linux. What is the expectation? Maybe it should be documented as undefined behavior?

    [Sample program] (https://github.com/utgheith/watch/blob/master/src/main/scala/main.scala)

    MacOS

    event
      data

    Linux

    event
      data/a
      data/b
    Li Haoyi
    @lihaoyi
    yeah possibly, we don't use that functionality at work so not surprising that it's inconsistent
    utgheith
    @utgheith

    Here is a stress test that leads to failure in 3 native environments: MacOS, Ubuntu/ext4, Ubuntu/ZFS. There seem to be 2 separate issues:

    (1) a race condition that causes the watch thread to die with FileNotFoundException.
    (2) events get reordered leading to premature watch cancelation on some subdirectories

    utgheith
    @utgheith
    I see the race condition leading to the first issue and the solution appear to be simple so far.
    The second problem requires a bit more thought. Ideas: (1) delay event processing and order delete events (reduces the probability of failure), (2) generate synthetic delete triggers for missing directories (will generate more events than expected and doesn't handle missing file deletes), (3) Don't use WatchService (not obvious how it would help), ... I'll keep digging.
    utgheith
    @utgheith
    Would it be acceptable for os.watch.watch to produce more events than strictly promised (with low probability)?
    Li Haoyi
    @lihaoyi
    for my use case, sure
    it doesn't really have a rigorous spec lol
    utgheith
    @utgheith
    If that's acceptable then there is a simple fix: the implementation os.watch.watch keeps track of all registered paths and refreshes the list on each event. We can generate a synthetic DELETE event for any missing entries.
    Li Haoyi
    @lihaoyi
    i think it sounds reasonable
    maybe put up a PR describing the problem and solution in more detail?
    utgheith
    @utgheith
    Implications:
    (1) only works for directories but I've empirically noticed that WatchService only drops directory DELETE events (at least in the stress tests I tried)
    (2) might surprise applications that are not prepared for the extra DELETE events
    (3) would violate the stated behavior for moved directories (for those who might interpret it as anything but a lower bound)
    (4) would require changing the tests to use set inclusion instead of equality (change in one function)
    Will do. Either way I think it is critical that we get this issue out of the way because it is blocking unrelated pull requests.
    Li Haoyi
    @lihaoyi
    I think duplicate events is probably fine
    utgheith
    @utgheith
    But it would be more than duplicate events in this case. It might generate "unexpected" events. For example: I interpret the following paragraph as a lower bound but someone that interprets it as strict equality might be confused by seeing DELETE events for sub directories of moved directories:
    Once the call to watch returns, onEvent is guaranteed to receive a an event containing the path for:
    
    - Every file or folder that gets created, deleted, updated or moved within the watched folders
    
    - For copied or moved folders, the path of the new folder as well as every file or folder within it.
    
    - For deleted or moved folders, the root folder which was deleted/moved, but without the paths of every file that was within it at the original location
    Li Haoyi
    @lihaoyi
    entirely unexpected events sounds a bit tough
    utgheith
    @utgheith
    But we're already in violation of the 3rd assertion (using the strict interpretation). DELETE events are being generated for deleted sub-folders but not for moved ones.
    Li Haoyi
    @lihaoyi
    yeah
    utgheith
    @utgheith

    I apologize for all the spam but here is a simple test that shows that the problem is in WatchService. os.watch is likely masking some of those issues in its implementation but not all.

    https://github.com/utgheith/watch3/blob/master/src/main/scala/main.scala

    utgheith
    @utgheith

    It seems that readme.md is using travis-ci badges instead of the github actions ones. Clicking on the build status goes back to a January build (telling me that the switch occurred around this time). Tried to submit a pull request that fixes that but it gets stuck and never runs the tests. This seems to be related to an issue that I don't fully understand but requires special privileges to deal with anyway. Here is a link:

    https://github.community/t/expected-waiting-for-status-to-be-reported/16727

    Lorenzo Gabriele
    @lolgab
    @utgheith The build now has a name: ci that's why the statuses don't match. Can you please rebase the PR on top of current master?
    utgheith
    @utgheith
    done
    Li Haoyi
    @lihaoyi
    @utgheith I would not be surprised by bugs in the underlying WatchService, would be nice if we could paper over it though. Not sure hwo hard that would be
    utgheith
    @utgheith
    How about using intotify directly? Similar to what happens on MacOS. But first I need to convince myself that the issues in WatchService are not cause by inotify :-)
    utgheith
    @utgheith
    com-lihaoyi/os-lib#69 is an attempt to paper over it but it makes me uncomfortable.
    Li Haoyi
    @lihaoyi
    using inotify directly is possible, I never thought of it because I thought WatchService generally worked for the stuff we used it for (as opposed on on OS-X, where WatchService does not work at all
    on a large folder due to polling)
    utgheith
    @utgheith
    I will write some stress tests and share the results but I have a few concerns regarding os.watch and I want to know your thoughts on them.
    (1) os.watch is the sort of thing that requires stress testing. Unit testing is still important obviously. Do you have your own stress tests? If not, is there value in developing some and running them as part of CI?
    (2) The API doesn't make any statements regarding OVERFLOW (silently drop events, force the caller to re-examine the world, ...). inotify uses a finite system-wide queue for notifications and there is no guarantee that it will not overflow.
    Li Haoyi
    @lihaoyi
    idont have any stress tests directly on the library; only for downstream projects.
    I actualyl dont know what to do about OVERFLOW events
    utgheith
    @utgheith
    In the one project where I used os.watch, I resorted to periodically restarting the application in an orderly manner. I could imagine writing a loop that reinitializes the world when it receives an OVERFLOW event.
    utgheith
    @utgheith

    Same test as before but uses inotify (not directly but by spawning a process running inotifywatch). Works reliably on the same test that was failing before. This strongly hints that using inotify directly will be more reliable. Also inotify is explicit about how it handles moves (move_to, move_from, move_self, etc)

    https://github.com/utgheith/watch4/blob/master/src/main/scala/main.scala

    Li Haoyi
    @lihaoyi
    @utgheith I think com-lihaoyi/os-lib#69 sounds reasonable; am I right to say that this change is linux-only and won't affect the fsevents-based OSX version?