Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Gili Tzabari
    @cowwoc
    @Craigacp https://www.tensorflow.org/guide/autodiff#3_took_gradients_through_an_integer_or_string says "Integers and strings are not differentiable"... Does that mean I can't use int* types at all?
    Or am I misunderstanding something?
    Adam Pocock
    @Craigacp
    It depends what you're using the int for. For example MNIST is usually stored as integers in the range 0-255, and you can feed that into the model. Usually the first step in the model is then to convert it into a float and proceed as normal. There as you aren't taking gradients of the conversion procedure it doesn't matter. Also it's useful to feed in other tensors to control model behaviour (e.g. to use an integer step or epoch counter to control the learning rate), again these usually aren't involved in the gradient updates so it doesn't matter.
    Gili Tzabari
    @cowwoc
    The integers in my case represent timestamps as time since epoch.
    Adam Pocock
    @Craigacp
    And you want to use them as features in your model?
    Gili Tzabari
    @cowwoc
    Yes. I'm dealing with behavior that is tied into weather and weather follows certain patterns as a function of time. I've also got outdoor temperature as an input but I'm thinking (just a guess) it can't hurt to add in the timestamp.
    I've actually also got a second case of integers... I've got inputs that are enums, so I converted their ordinal value to an int. There I can obviously just cast it to a float. It's the timestamps where things get more complicated.
    Adam Pocock
    @Craigacp
    I wouldn't pass in a timestamp to an ML system as a monotonically increasing integer. It's probably better to split it out into categoricals which represent months, days, possibly the season, along with the hour of the day. If you pass in the timestamp directly then the model has to expend capacity learning the cyclic behaviour and parsing the timestamp.
    Gili Tzabari
    @cowwoc
    Okay. So if later on I want a model that also predicts the timestamp of an event (e.g. it is currently 20 degrees, predict what time we will hit 23 degrees) should the output again contain the timestamp broken down into time categoricals?
    Adam Pocock
    @Craigacp
    Yeah I think that's probably easiest. Otherwise it's hard to parse the signal.
    Plus if the loss is split out into different chunks you can reward the model for predicting the correct hour & day even if it gets the number of minutes wrong.
    Whereas with a timestamp it's harder to design the loss function to do that.
    Gili Tzabari
    @cowwoc
    Hmm, I found an interesting tutorial at https://www.tensorflow.org/tutorials/structured_data/time_series#time ... they break down timestamps into sin/cos components which I would have never thought to do.
    So, what's the point of Tensorflow having integer, boolean, etc types if only float is really usable? Are they there to just let you convert integers to float on the graph (late binding)? And you always need to convert to float before feeding the values into an Input node?
    Adam Pocock
    @Craigacp
    Tensorflow is a computation graph and an autodiff system. The autodiff only applies to floats as gradients are harder to define on non-continuous spaces. But you can use the computational graph on other types just fine. For example if you're doing object detection that's going to return a bounding box on an image which needs to be integers to line up with the pixels, so the natural return type is an integer tensor. Also boolean is useful for controlling graph elements with tf.cond (i.e. if statements).
    You can compute functions of integer tensors without any trouble, but if you want to differentiate those functions to perform gradient descent that's where you hit the issue.
    Gili Tzabari
    @cowwoc
    Don't you have to performance gradient descent on all nodes in the graph for backprop to work?
    I mean, what's the point of having nodes in the graph that autodiff does not run on? When would that be fine?
    Adam Pocock
    @Craigacp
    All nodes between your inputs and outputs.
    Gili Tzabari
    @cowwoc
    Sorry, what? You're saying that you do or do not need all nodes between your inputs and output to be differentiable?
    Adam Pocock
    @Craigacp
    You need all the nodes that connect your outputs to the parameters you want to learn to be differentiable.
    Gili Tzabari
    @cowwoc
    Right. So when would you want to use non-differentiable nodes in Tensorflow? What lives outside the path between the input and output nodes?
    Adam Pocock
    @Craigacp
    You can add nodes which trigger printouts or saving based on specific computation conditions, you can construct the paths that you want to load data from, you can perform operations on the outputs of your ML model (e.g. the bounding box example, you might want to colour the boxes based on the probability of correct classification)
    All these things you can add into the computational graph.
    Gili Tzabari
    @cowwoc
    I see. Are there any online examples I could look at that would show this in action?
    Adam Pocock
    @Craigacp
    Also as I mentioned above you might have nodes which compute the learning rate, or control the presence or absence of dropout on some layers. These would be integer or boolean inputs and compute some function which might have a boolean output.
    Erm, well we build them under the covers in a bunch of bits of TF-Java (as does Keras in Python). I'm not sure I've seen explicit examples of this functionality in TF 2 examples, but to be honest I've not looked at much TF 2 example code.
    Gili Tzabari
    @cowwoc
    @Craigacp Looking at https://www.tensorflow.org/tutorials/structured_data/time_series#time they calculate both cos and sin of the timestamp. Is there a point to passing both to the model as input? Or do they only use one of them?
    Adam Pocock
    @Craigacp
    It looks like they use both. I agree that computing one from the other is possible and so there is redundant information in there, but it might make it easier for the network to learn. Feature engineering is an art more than a science.
    Gili Tzabari
    @cowwoc
    :thumbsup:
    Gili Tzabari
    @cowwoc
    Hi again. How do I deal with optional input data? For example, each training example may contain 2 to 4 input values (e.g. some weather stations have 4 temperature sensors, others 3, and so on). I tried passing math.nan in place of sensor data but this broke training (the loss function returned nan). What should I do in this case? Set the values to zero? Set them to random values? Ideally I want the model to skip them and not use them for training.
    Adam Pocock
    @Craigacp
    There's no one way of doing that with a neural net. Usually you set the missing value to something moderately sensible and have it learn around it, but in general it is model and task specific.
    Gili Tzabari
    @cowwoc
    Okay. Suppose normal reading range between -50 and 50. Am I better off assigning missing values some average value (e.g. 0) or maybe I'm better off assigning a value that would never occur in real life (e.g. -1000) and hope that the model will learn to suppress/mask such inputs?
    Adam Pocock
    @Craigacp
    It depends why it's missing.
    jxtps
    @jxtps
    You may want to have a separate piece of input that’s all 1s for where the data is valid, and 0s where it’s not - i.e. an input mask. You’d then need to incorporate that into your network architecture in some sensible way (very task/network specific). If this is actually on the output you can usually provide some weight to the sample so it gets de-weighted compared to the rest of the batch.
    (The point of the input mask being to make it easier for your network to learn to ignore that part of the data)
    Gili Tzabari
    @cowwoc
    Interesting. I think I will experiment with 3 different options: (1) a separate input mask per sensor (2) a single input indicating the number of sensors present (3) an (inline) invalid sensor value implying that masking should take place.
    Jakob Sultan Ericsson
    @jakeri

    Hello again,
    We have gone from TF Java 0.2.0 to 0.3.1 on our linux hosts.
    And basically just changed so that it compiles. We load and unload a bunch of different savedmodels.
    We are now experiencing OutOfMemory on bytedeco as if the memory is not reclaimed. We try to calculate the size of the incoming savedmodel and only keep models to roughly half of the available memory of the host (we have a guava cache that get approximate weight of the model and only hold models up to half of the available memory).
    We are closing all tensors, savedmodel-bundles etc (not really changing from 0.2.0).

    -Dorg.bytedeco.javacpp.maxbytes=25G
    -Dorg.bytedeco.javacpp.maxphysicalbytes=25G

    We are about to try 0.4.0-SNAPSHOT and also do some kind of more specific test case.

    23 replies
    Jakob Sultan Ericsson
    @jakeri

    We have continued to try to nail down our problem with memory. We believe it is something strange with Linux version of 0.3.1 (and 0.4.0-SNAPSHOT). Some memory is not deallocated and it is quite visible when you run large models.

    I’ve tried to build a “test case” for this. Maven project with a Dockerfile. The Dockerfile will download a somewhat large model from tfhub (efficientnet_b7_classification_1).

    The test is basically:

                for (int q = 0; q < 10; q++) {
                    //Download https://storage.googleapis.com/tfhub-modules/tensorflow/efficientnet/b7/classification/1.tar.gz 
                    //and unpack inside src/main/resources
                    Path savedModelSourcePath = Path.of(Thread.currentThread().getContextClassLoader().getResource("efficientnet_b7_classification_1").getFile());
                    try (SavedModelBundle savedModelBundle = SavedModelBundle.load(savedModelSourcePath.toString(), "serve")) {
                        System.out.printf("After loading model, physicalBytes: %d, totalBytes: %d\n", Pointer.physicalBytes(), Pointer.totalBytes());
                    }
                    Pointer.deallocateReferences(); //trying to force
                    System.gc(); //trying to force
                }

    When running on Linux, memory will increase and usually crash after 3-4 iterations (depending on bytedeco mem flags or host memory) but running same test on OSX test will usually pass and memory decreases from time to time.
    Do you think this is related to tensorflow/java#304 or something else?
    We are trying to do some more with Tensorflow debug-logging with allocation and deallocation to try to find some pointers where this could be.

    Adam Pocock
    @Craigacp
    You can run the whole thing inside valgrind on Linux which will show where the allocated memory is lost. There are issues in the C API's model load code where it leaks memory, though when I looked at it it wasn't enough of a leak to go pop in a few iterations (though I was testing a much smaller model). The response from Google who maintain the C API was not promising - tensorflow/tensorflow#48802.
    Gili Tzabari
    @cowwoc
    @Craigacp I suspect tensorflow/tensorflow#48802 might get more traction if someone were to attach a self-contained testcase, ideally in pure C. Maybe it doesn't matter, but the way the issue is now is not super easy for a newbie contributor to pick it up.
    Samuel Otter
    @samuelotter

    Hi, I'm also investigating this memory leak together with @jakeri.

    I'm not sure i completely grok all the interactions between tensorflow, javacpp and the java wrappers but it looks like the TF_Session handle isn't destroyed when closing the Session. This can be confirmed by running with tensorflow VLOG set to 2 which will log allocations and deallocations. Some things are never deallocated (memory allocated restore ops for example).

    If I understand things correctly Session::delete will be invoked when closing the session, which should release the reference and ultimately the Pointer should be deallocated, which should trigger the the DeleteDeallocator in AbstractTF_Session. This works when calling TF_Session::newSession since that will set the deallocator on the pointer before returning, but, when loading a saved model the TF_Session object is created internally in native code in TF_LoadSessionFromSavedModel and returned, which means the deallocator is never set on the Pointer, which means the DeleteDeallocator is never invoked.
    When manually calling TF_CloseSession and TF_DeleteSession on the native handles it leaks much less memory (still a little it I think, but significantly less).

    Samuel Audet
    @saudet
    You're right, that's an issue. TF_Session.loadSessionFromSavedModel() needs to be called instead.
    Adam Pocock
    @Craigacp
    The call to TF_Graph above that also returns a pointer without a deallocator right? It doesn't look like the session takes ownership of it such that it'll be deleted on session close, so we should probably change that over to TF_Graph.newGraph() too.
    Adam Pocock
    @Craigacp
    Then we'll need to make sure the session and the graph live past the pointer scope.
    Adam Pocock
    @Craigacp
    @samuelotter could you see if this branch fixes your issue - https://github.com/Craigacp/tensorflow-java/tree/saved-model-leak
    Samuel Audet
    @saudet
    Yes, it's also calling close() on the graph, so we have to change that as well.
    Adam Pocock
    @Craigacp
    SavedModelBundle.close closes the graph, shouldn't that be sufficient?
    Samuel Audet
    @saudet
    Looks like it:
    // Destroy an options object.  Graph will be deleted once no more
    // TFSession's are referencing it.
    public static native void TF_DeleteGraph(TF_Graph arg0);
    Not sure if that means it will crash if we try to deallocate it explicitly anyway. The C API is very user unfriendly.