Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Adam Pocock
    @Craigacp
    I would expect the cnn based example to run faster on GPUs than CPUs. Did you compare against TF-Java on CPU? Getting the data prep done correctly and transferring images to the GPU is harder to get right in Java than it is in Python, there are more performance pitfalls.
    Also, are you measuring time after the JVM has warmed up? It has a more noticeable warmup time than Python as peak performance only happens a few hundred to thousand calls in after the JIT compiler has compiled most of the hot methods.
    raftaa
    @raftaa
    Thanks Adam, somehow it's running faster now. Honestly I have no clue which changes yielded to this result. Guess it has something to do with what you called "warming up the JVM": the first image still takes about 4sec. The following images however take 200ms. That's fast enough by now.
    Adam Pocock
    @Craigacp
    There are a few things that warm up, it could be the JVM (the first one won't be compiled it'll interpret the Java code which is slower, the first one may have to do a bunch more memory allocation etc), or it could be that we don't ship the right compute level bits for your GPU so when it loads TF the CUDA driver has to compile a bunch of things. I think our builds are the same as the python builds in that respect, but the point at which the CUDA compilation happens might be different.
    Karl Lessard
    @karllessard
    Under the TF hood, XLA does a bunch of JIT compilation as well
    raftaa
    @raftaa
    Hi. Sorry for another stupid beginner question but I didn't found any information how to read a *.pbtxt file in Java. Is there any parser that can be used? By now I just want to read my simple "label_map.pbtxt" file with some annotations. I'd write a simple parser by my own for this but as there are more complex pb files I'd like to use the generic approach. I found some discussions about this topic. Seems to be quite an issue in the past. Is it still the case in the newest Java TF version?
    Adam Pocock
    @Craigacp
    What protobufs are stored in the pbtxt file?
    raftaa
    @raftaa
    It's just an annotation for training a custom object detection - similar to https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-label-map
    Adam Pocock
    @Craigacp
    So that's this proto - https://github.com/tensorflow/models/blob/238922e98dd0e8254b5c0921b241a1f5a151782f/research/object_detection/protos/string_int_label_map.proto. You should be able to download that and compile it yourself with protoc to read that file.
    abelehx
    @abelehx
    https://github.com/tensorflow/tensorflow/blob/master/tensorflow/java/src/main/java/org/tensorflow/examples/LabelImage.java
    Is there an example that supports tensorFlow version 2.4.1, i.e., tensorFlow Java API version 0.3.1, where can I find it?
    Ben Roberson
    @ninjabem
    Hello, is there any public documentation on how memory is allocated/deallocated in the new java lib tensorflow-core vs the old libtensorflow?
    Karl Lessard
    @karllessard
    There is not a lot of official documentation describing this no, unfortunately. On the other hand, both versions support pretty much the same paradigm, where resources that are backed natively must be either explicitly released or using a try-with-resources block. Such resources are pretty much the same: tensors, saved model bundles, graphs, sessions and (only in the new version) concrete functions.
    Also in both versions, you can partially rely on the garbage collector in eager mode to get rid of small objects but closing resources or the whole eager session when done with them is recommended.
    Ben Roberson
    @ninjabem
    Oh, thanks for the response @karllessard! That's not the answer I was expecting. I have a high throughput application that's been in prod for about 3 years using libtensorflow and I'm testing a migration to the new tensorflow-core. With a minimal change to the new lib, I'm seeing GC times jump 20x. After diving into the gc logs, I'm seeing hundreds of thousands of PhantomReference being discovered and most of the increased GC time being used for the Object Copy phase. The magnitude of the Object Copy phase time seems to be positively correlated with the magnitude of the PhantomReference discovered. I don't have detailed knowledge of tensorflow-core internals, so I'm trying to understand what is going on and how to mitigate the symptoms I'm seeing. Any help would be greatly appreciated. Are there any JVM config tweaks or GC tweaks that might help clearing the PhantomReference more quickly?
    Adam Pocock
    @Craigacp
    What versions of TF-Java and Java are you using, and what GC settings do you currently have?
    Ben Roberson
    @ninjabem
    I'm using Java 11, TF-Java 0.3.2, my gc settings are -Xms18g -Xmx18g -XX:+UseG1GC -XX:+MaxGCPauseMillis=200
    I've tried using a few different Xms/Xmx heap settings to see if it just needed more memory to achieve a steady state but those experiments didn't affect the GC times.
    Adam Pocock
    @Craigacp
    If you're closing the tensors that are returned from your model then the references should be dead and be removed along with the rest of the memory. Do you have a code snippet somewhere we could look at?
    Ben Roberson
    @ninjabem
    Unfortunately I can't copy/paste internal code, but I might be able to build a mock that is similar. Improperly closed tensors (unclosed tensors) was one of my first thoughts as well but I don't think the symptoms point in that direction. If I wasn't closing the tensors it would present as a memory leak with the symptoms being a monotonically increasing number of PhantomReferences right? I'm not seeing that pattern. I'm seeing the number of PhantomReferences jump from 200k to 800k over 8 seconds and two young gen collections. Then I see it drop from 800k to 200k over ~4 minutes and ~60 young gen collections. So the tensors do appear to be closing and their resources being collected...eventually
    (BTW, I would be very happy to discover I'm doing something wrong, so I'm definitely open that possibility)
    Adam Pocock
    @Craigacp
    We do have a check in place to close resources using the GC, but you should definitely close them manually.
    The memory leak wouldn't be visible on the Java side as they are allocated on the native heap.
    Are you using eager mode or a session?
    Samuel Audet
    @saudet
    @ninjabem It sounds like something's not getting deallocated manually, and JavaCPP is forced to call System.gc() to clear enough memory. You can see if that's happening or not from the log, for example, by setting the "org.bytedeco.javacpp.logger.debug" system property to "true". In any case, to get best performance, you'll want to disable all that and deallocate everything manually anyway, see tensorflow/java#313 for some info about that.
    Ben Roberson
    @ninjabem
    @Craigacp I'm using a session. I am manually closing all input and output tensors using try-with-resources. I mocked up an example of how things are structured. Input tensors are wrapped in a batch and closed via try-with-resources. The output tensor is also closed via try-with-resources.
    Adam Pocock
    @Craigacp
    Thanks, does this mocked up version exhibit the same PhantomReference behaviour? Also, when you observed the PhantomReferences are they subclasses of org.bytedeco.javacpp.Pointer.DeallocatorReference?
    Ben Roberson
    @ninjabem
    I haven't done any stress testing of this mock yet. I'd have to fill out the feature packing part, mock a full training dataset, and then train a model. That's a bit of work. I was hoping this mock would show that I'm properly closing input and output tensors?
    Ben Roberson
    @ninjabem
    I took a few heap dumps and the #1 leak suspect (according to Eclipse MAT) is org.bytedeco.javacpp.Pointer$NativeDeallocator with several hundred thousand instances.
    @saudet Thanks for the debug log setting; I'll set it and see what I see. I read through the tensorflow/java#313 and I think I'm doing the right things. I'm using a long living session, and closing all my input/output tensors using try-with-resources. Is there anything else I should be doing that I might have missed in that thread?
    Samuel Audet
    @saudet
    Well, I'm not sure what they are referring to by dbx.close(), but maybe you have something similar in your code?
    abelehx
    @abelehx
    https://github.com/tensorflow/java-models/blob/master/tensorflow-examples-legacy/label_image/src/main/java/LabelImage.java
    is tesorflow java for 1.4. I'm looking for an example of LabelImage.java for Tensorfow Java version 0.3.1 or 0.3.2.Where can I find it? Thank you.
    Adam Pocock
    @Craigacp
    We haven't converted that example to 0.3.2, however there are many similar examples for 0.3.2 here https://github.com/tensorflow/java-models/tree/master/tensorflow-examples/src/main/java/org/tensorflow/model/examples.
    Karl Lessard
    @karllessard
    @ninjabem , I’m looking right now at your example, stupid question but just to clear out possible issues, I assume that the model targeted operation only returns a single output (which is "fancy/model:0” in your case)? Because all outputs must be explicitly closed even if they are not being used.
    Karl Lessard
    @karllessard

    Also (sorry if it is off topic) but in your predicator, you can easily navigate through the list of floats directly from the predTensor by casting it to a TFloat32, instead of passing by a tmp array.

    try (Tensor predTensor = (TFloat32)runner.fetch(this.graphOperation).run().get(0)) {
          predTensor.scalars().forEach(s -> predictions.add(s.getFloat()));
     }

    or something like that

    Karl Lessard
    @karllessard
    But I agree with you that if your model returns a single output, it looks to me that everything is being cleaned up properly. So you are saying that even this small example is showing a lot of time spent in GC?
    Ben Roberson
    @ninjabem
    @karllessard Thanks for the feedback! Yes, the model will only output a single tensor, but that tensor is a float array since it is batched. I'll double check the model definition but I'm pretty sure it only returns one. I haven't run this mock through a stress test (yet) as it'll require a bit more work to get it fully running.
    Adam Pocock
    @Craigacp
    I'm surprised there are instances of NativeDeallocator, as all the TensorFlow types don't subclass that as far as I can tell. Presumably those are allocated by JavaCPP internally as part of the byte pointers that pass around strings?
    Karl Lessard
    @karllessard
    I thought also that NdArray can allocate sometimes a great number of objects depending on what you do but they would not be backed by a NativeDeallocator in that case
    String tensors are being handled differently since TF2.4 so it’s possible we are observing some side-effects of it
    Karl Lessard
    @karllessard
    also @ninjabem did you tried with the latest snapshot (0.4.0-SNAPSHOT)? It has tensorflow/java@709f570 that is related to string tensor deallocation, I’ll be curious to see if that could improve your situation as well.
    Ben Roberson
    @ninjabem
    @karllessard I haven't tried with 0.4.0-SNAPSHOT but I'm happy to give it a go
    Samuel Audet
    @saudet
    @karllessard Normal C functions don't return objects with deallocators since there's no standard way to deallocate memory from random pointers.
    The log will tell us to what kind of Pointer they are attached to though, so if @ninjabem can let us know what those are, it could help.
    Ben Roberson
    @ninjabem
    I'm deploying a canary with 0.3.2 + org.bytedeco.javacpp.logger.debug=true right now and will do the same with 0.4.0-SNAPSHOT in a bit.
    Aish
    @aishfenton
    We're calling SavedModelBundle.exporter("/data/").withSession(sess).withSignature(mySig).export to save a model. It seems to save out fine (according to the cli tool). But when we're loading it from the Python side using tf.saved_model.load("/data/"), we're getting the following error in ValueError: Feeds must be tensors. Any ideas ?
    Assume it's probably something silly we're doing
    Adam Pocock
    @Craigacp
    What's the rest of the python stack trace look like?