Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Gili Tzabari
    @cowwoc
    Okay. Suppose normal reading range between -50 and 50. Am I better off assigning missing values some average value (e.g. 0) or maybe I'm better off assigning a value that would never occur in real life (e.g. -1000) and hope that the model will learn to suppress/mask such inputs?
    Adam Pocock
    @Craigacp
    It depends why it's missing.
    jxtps
    @jxtps
    You may want to have a separate piece of input that’s all 1s for where the data is valid, and 0s where it’s not - i.e. an input mask. You’d then need to incorporate that into your network architecture in some sensible way (very task/network specific). If this is actually on the output you can usually provide some weight to the sample so it gets de-weighted compared to the rest of the batch.
    (The point of the input mask being to make it easier for your network to learn to ignore that part of the data)
    Gili Tzabari
    @cowwoc
    Interesting. I think I will experiment with 3 different options: (1) a separate input mask per sensor (2) a single input indicating the number of sensors present (3) an (inline) invalid sensor value implying that masking should take place.
    Jakob Sultan Ericsson
    @jakeri

    Hello again,
    We have gone from TF Java 0.2.0 to 0.3.1 on our linux hosts.
    And basically just changed so that it compiles. We load and unload a bunch of different savedmodels.
    We are now experiencing OutOfMemory on bytedeco as if the memory is not reclaimed. We try to calculate the size of the incoming savedmodel and only keep models to roughly half of the available memory of the host (we have a guava cache that get approximate weight of the model and only hold models up to half of the available memory).
    We are closing all tensors, savedmodel-bundles etc (not really changing from 0.2.0).

    -Dorg.bytedeco.javacpp.maxbytes=25G
    -Dorg.bytedeco.javacpp.maxphysicalbytes=25G

    We are about to try 0.4.0-SNAPSHOT and also do some kind of more specific test case.

    23 replies
    Jakob Sultan Ericsson
    @jakeri

    We have continued to try to nail down our problem with memory. We believe it is something strange with Linux version of 0.3.1 (and 0.4.0-SNAPSHOT). Some memory is not deallocated and it is quite visible when you run large models.

    I’ve tried to build a “test case” for this. Maven project with a Dockerfile. The Dockerfile will download a somewhat large model from tfhub (efficientnet_b7_classification_1).

    The test is basically:

                for (int q = 0; q < 10; q++) {
                    //Download https://storage.googleapis.com/tfhub-modules/tensorflow/efficientnet/b7/classification/1.tar.gz 
                    //and unpack inside src/main/resources
                    Path savedModelSourcePath = Path.of(Thread.currentThread().getContextClassLoader().getResource("efficientnet_b7_classification_1").getFile());
                    try (SavedModelBundle savedModelBundle = SavedModelBundle.load(savedModelSourcePath.toString(), "serve")) {
                        System.out.printf("After loading model, physicalBytes: %d, totalBytes: %d\n", Pointer.physicalBytes(), Pointer.totalBytes());
                    }
                    Pointer.deallocateReferences(); //trying to force
                    System.gc(); //trying to force
                }

    When running on Linux, memory will increase and usually crash after 3-4 iterations (depending on bytedeco mem flags or host memory) but running same test on OSX test will usually pass and memory decreases from time to time.
    Do you think this is related to tensorflow/java#304 or something else?
    We are trying to do some more with Tensorflow debug-logging with allocation and deallocation to try to find some pointers where this could be.

    Adam Pocock
    @Craigacp
    You can run the whole thing inside valgrind on Linux which will show where the allocated memory is lost. There are issues in the C API's model load code where it leaks memory, though when I looked at it it wasn't enough of a leak to go pop in a few iterations (though I was testing a much smaller model). The response from Google who maintain the C API was not promising - tensorflow/tensorflow#48802.
    Gili Tzabari
    @cowwoc
    @Craigacp I suspect tensorflow/tensorflow#48802 might get more traction if someone were to attach a self-contained testcase, ideally in pure C. Maybe it doesn't matter, but the way the issue is now is not super easy for a newbie contributor to pick it up.
    Samuel Otter
    @samuelotter

    Hi, I'm also investigating this memory leak together with @jakeri.

    I'm not sure i completely grok all the interactions between tensorflow, javacpp and the java wrappers but it looks like the TF_Session handle isn't destroyed when closing the Session. This can be confirmed by running with tensorflow VLOG set to 2 which will log allocations and deallocations. Some things are never deallocated (memory allocated restore ops for example).

    If I understand things correctly Session::delete will be invoked when closing the session, which should release the reference and ultimately the Pointer should be deallocated, which should trigger the the DeleteDeallocator in AbstractTF_Session. This works when calling TF_Session::newSession since that will set the deallocator on the pointer before returning, but, when loading a saved model the TF_Session object is created internally in native code in TF_LoadSessionFromSavedModel and returned, which means the deallocator is never set on the Pointer, which means the DeleteDeallocator is never invoked.
    When manually calling TF_CloseSession and TF_DeleteSession on the native handles it leaks much less memory (still a little it I think, but significantly less).

    Samuel Audet
    @saudet
    You're right, that's an issue. TF_Session.loadSessionFromSavedModel() needs to be called instead.
    Adam Pocock
    @Craigacp
    The call to TF_Graph above that also returns a pointer without a deallocator right? It doesn't look like the session takes ownership of it such that it'll be deleted on session close, so we should probably change that over to TF_Graph.newGraph() too.
    Adam Pocock
    @Craigacp
    Then we'll need to make sure the session and the graph live past the pointer scope.
    Adam Pocock
    @Craigacp
    @samuelotter could you see if this branch fixes your issue - https://github.com/Craigacp/tensorflow-java/tree/saved-model-leak
    Samuel Audet
    @saudet
    Yes, it's also calling close() on the graph, so we have to change that as well.
    Adam Pocock
    @Craigacp
    SavedModelBundle.close closes the graph, shouldn't that be sufficient?
    Samuel Audet
    @saudet
    Looks like it:
    // Destroy an options object.  Graph will be deleted once no more
    // TFSession's are referencing it.
    public static native void TF_DeleteGraph(TF_Graph arg0);
    Not sure if that means it will crash if we try to deallocate it explicitly anyway. The C API is very user unfriendly.
    Adam Pocock
    @Craigacp
    The unit tests all passed with this change, so presumably we'd have hit a double free already if it was going to happen.
    The wording of TF_LoadSesssionFromSavedModel suggests that it doesn't care about the graph so I'd expect that to mean that it's our job to manage it.
    Jakob Sultan Ericsson
    @jakeri
    Hey, me and @samuelotter will do some tests during the day with branch above. Is it isolated enough to be backported to 0.3-branch?
    4 replies
    Adam Pocock
    @Craigacp
    Yes we will be able to backport this if it fixes the issue.
    James Piggott
    @JamesPiggott
    Hello, I want to run a Tensorflow model I found with a Java app, but I am having difficulty with getting the input just right. Below you can see the result from the layer analysis. I found a few examples for one-dimensional input (mnist) and I got another model working that required integers, but creating Tensor<TFloat32> with dimensions {batch, height, width, channels} is a difficult task. I would like some help.
    input_image=name: "serving_default_input_image:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: -1
      }
      dim {
        size: -1
      }
      dim {
        size: -1
      }
      dim {
        size: 3
      }
    }
    5 replies
    Karl Lessard
    @karllessard

    @JamesPiggott , I don’t know if you are subscribed to the TensorFlow discussion forum but it is the new TF platform for starting new discussion and we’ll slowly migrate the different topics started on this Gitter channel to it.

    That being said, I’ve replied and posted an example of converting BufferedImage instances to a float tensor (TFloat32) in this post, if you can please take a look and maybe continue the discussion from there?

    (BTW sorry but my original snippet was in Kotlin, let me know if you need help to convert it to Java)

    Jakob Sultan Ericsson
    @jakeri
    When is 0.4.0 scheduled to be released?
    Karl Lessard
    @karllessard
    There is still a lot of PR waiting to be merged so there is no clear plan for it yet. Is there anything in particular you are waiting for?
    2 replies
    Karl Lessard
    @karllessard
    Hey everyone, just to let you know that we have just released a hot-fix (0.3.2) for the saved model bundle memory leak that was reported previously in this thread by @jakeri , please try it out!
    3 replies
    Charles Parker
    @charleslparker
    @karllessard - Just a datapoint for you: I'm working with code that loads and closes saved model bundles a lot and the memory behavior is a lot more respectable with 0.3.2. Thanks for releasing this.
    Karl Lessard
    @karllessard
    :+1: :+1:
    Gili Tzabari
    @cowwoc
    @saudet @karllessard Are you guys familiar with any mature implementations of the Temporal Fusion Transformer? Support seems to be very sketchy both for TensorFlow and PyTorch. I found one experimental implementation for TF 1.x, nothing really for 2.x. I also found 3 implementations for PyTorch but all 3 had problems (sample code didn't work for some, others outputted a straight line which other users complained about and no one was able to figure it out)
    Karl Lessard
    @karllessard
    Sorry @cowwoc , I can’t help you out here, maybe @Craigacp knows?
    Adam Pocock
    @Craigacp
    I've not seen any implementations, though I don't work with time series very often. From a quick read through of the paper it looks like an extremely complicated model.
    Try asking on the TF Forum - https://discuss.tensorflow.org/
    Gili Tzabari
    @cowwoc
    Thanks Adam
    torito
    @torito
    Hi, do you know when will be available java tensorflow 2.5.0 ? here it says is in snapshot, and there is any release yet https://github.com/tensorflow/java/#tensorflow-version-support
    raftaa
    @raftaa
    Hi, is there any example/documentation how to use GPU resources in Java? I integrated the tensorflow-core-api-0.3.2-windows-x86_64-gpu.jar etc. But are there any flags needed to be set? I modified the FasterRcnnInception eample to compute images with my own trained modell. Needs approx. 4sec per image while the python gpu-based computation with the same modell and image needs about 0.2 sec... Java log output tells me that "Adding visible gpu devices: 0" and "Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8680 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)".
    raftaa
    @raftaa
    Nevermind. Found the following discussion: tensorflow/java#140
    Adam Pocock
    @Craigacp
    I would expect the cnn based example to run faster on GPUs than CPUs. Did you compare against TF-Java on CPU? Getting the data prep done correctly and transferring images to the GPU is harder to get right in Java than it is in Python, there are more performance pitfalls.
    Also, are you measuring time after the JVM has warmed up? It has a more noticeable warmup time than Python as peak performance only happens a few hundred to thousand calls in after the JIT compiler has compiled most of the hot methods.
    raftaa
    @raftaa
    Thanks Adam, somehow it's running faster now. Honestly I have no clue which changes yielded to this result. Guess it has something to do with what you called "warming up the JVM": the first image still takes about 4sec. The following images however take 200ms. That's fast enough by now.
    Adam Pocock
    @Craigacp
    There are a few things that warm up, it could be the JVM (the first one won't be compiled it'll interpret the Java code which is slower, the first one may have to do a bunch more memory allocation etc), or it could be that we don't ship the right compute level bits for your GPU so when it loads TF the CUDA driver has to compile a bunch of things. I think our builds are the same as the python builds in that respect, but the point at which the CUDA compilation happens might be different.
    Karl Lessard
    @karllessard
    Under the TF hood, XLA does a bunch of JIT compilation as well
    raftaa
    @raftaa
    Hi. Sorry for another stupid beginner question but I didn't found any information how to read a *.pbtxt file in Java. Is there any parser that can be used? By now I just want to read my simple "label_map.pbtxt" file with some annotations. I'd write a simple parser by my own for this but as there are more complex pb files I'd like to use the generic approach. I found some discussions about this topic. Seems to be quite an issue in the past. Is it still the case in the newest Java TF version?
    Adam Pocock
    @Craigacp
    What protobufs are stored in the pbtxt file?
    raftaa
    @raftaa
    It's just an annotation for training a custom object detection - similar to https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-label-map
    Adam Pocock
    @Craigacp
    So that's this proto - https://github.com/tensorflow/models/blob/238922e98dd0e8254b5c0921b241a1f5a151782f/research/object_detection/protos/string_int_label_map.proto. You should be able to download that and compile it yourself with protoc to read that file.
    abelehx
    @abelehx
    https://github.com/tensorflow/tensorflow/blob/master/tensorflow/java/src/main/java/org/tensorflow/examples/LabelImage.java
    Is there an example that supports tensorFlow version 2.4.1, i.e., tensorFlow Java API version 0.3.1, where can I find it?
    Ben Roberson
    @ninjabem
    Hello, is there any public documentation on how memory is allocated/deallocated in the new java lib tensorflow-core vs the old libtensorflow?
    Karl Lessard
    @karllessard
    There is not a lot of official documentation describing this no, unfortunately. On the other hand, both versions support pretty much the same paradigm, where resources that are backed natively must be either explicitly released or using a try-with-resources block. Such resources are pretty much the same: tensors, saved model bundles, graphs, sessions and (only in the new version) concrete functions.