Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Adam Pocock
    @Craigacp
    Yep, though if your input tensors are the same size you can reuse them and repeatedly set the values, which might be quicker than allocating a fresh one.
    odinsbane
    @odinsbane
    That does sound better.
    odinsbane
    @odinsbane
    Hmm, the cpu version seems to be have a few limitations.
    Adam Pocock
    @Craigacp
    What limitations?
    odinsbane
    @odinsbane
    The conv3d and channels_first so far.
    My GPU is remote, and behind a vpn, so I am testing with a CPU version.
    Adam Pocock
    @Craigacp
    Does that op throw an exception or it's just very slow?
    odinsbane
    @odinsbane
    I get an exception, that it is not supported in cpu mode.
    Including the data_format "channels_first" might cause a warning.
    Adam Pocock
    @Craigacp
    Ok. The channels first mode is mostly to accelerate things on GPUs and to make them fit better with tensor cores/cuDNN.
    Oh wait, I might have got that confused with channels last in pytorch.
    odinsbane
    @odinsbane
    So if I want to re-use the same tensor, it seems I can access the DataBuffer and write to it. Do I need to use offset? Eg input.asRawTensor().data().asFloats().offset(0).write( float_array );
    Subsequently, when I function.call(intputs) and I get a Map<String, Tensor> back. Do I need to close each tensor after each .call or will they be re-used?
    Karl Lessard
    @karllessard
    You don't need to access the tensor buffer, for example you can simply wrap your source data inside a new FloatDataBuffer without copying it and write it to the tensor like this:
    tensor.write(DataBuffers.of(floatArray, true, false))
    Karl Lessard
    @karllessard
    For closing the outputs of a function, it depends which version you are using. If you use one of TF Java release (<=0.4.1), then yes you need to close them all individually. If you use the latest snapshot (0.5.0-SNAPSHOT), the function returns a Result class instead of a Map, that can be closed alone to close all of the outputs.
    odinsbane
    @odinsbane
    I'm using 0.41 and I don't find the tensor.write method.
    Karl Lessard
    @karllessard
    The method is on FloatNdArray, which is extended by TFloat32
    In my example, tensor is an instance of TFloat32
    odinsbane
    @odinsbane
    Okay, I think I have TFloat32's but I am just calling them Tensor. You think it is better to cast and use the write method?
    Karl Lessard
    @karllessard
    I think so, yes. If your cast is not safe then you could probably stick to your solution and it should work as well but never tried it.
    odinsbane
    @odinsbane
    Right on, I am to the point of getting data out. I just have to sort out my dimension issues. :D
    odinsbane
    @odinsbane
    For a prediction that takes 9 minutes in python, it took about 6 minutes in java.
    Karl Lessard
    @karllessard
    :+1: that's really nice to hear, 33% faster! That's quite a prediction though, isn't this in seconds? :)
    odinsbane
    @odinsbane
    It's a time series of images, 5 3D volumes, going through a large unet. I'm glad that it is faster, because with nvidia-smi you can see that the gpu is working a bit harder.
    Karl Lessard
    @karllessard
    that's awesome
    odinsbane
    @odinsbane
    Is there a way to specify the amount of GPU memory to use, my card has 4G, but the log says I've got a device with 1G.
    odinsbane
    @odinsbane
    I specified TF_FORCE_ALLOW_GROWTH and it works. I can load a larger model.
    Karl Lessard
    @karllessard
    :+1:
    tom denton
    @sdenton4
    Hey, I had a few questions about TF+Java... I've got some collaborators who have some audio analysis software written in Java (for desktop not android), with an existing system for plugins (https://ravensoundsoftware.com/). We're interested in running TF SavedModels via a plugin. Basic idea will be to specify some model metadata in a json file (or whatevs), load up the SavedModel, push in some audio, and get back some inference outputs.
    We're interested in allowing users to use CUDA GPUs, which makes it seem like SavedModel >> TFLite
    So I assume we'll just link against TF Java for a particular platform, build the plugin, and publish binaries for a few different platorms on the website.
    Q1) If one were to build a super-minimal SavedModel executor in TF+Java, what's the output binary size?
    tom denton
    @sdenton4
    Q2) Is TF+Java even the right thing to use? We could also wrap the C++ TF APIs as an alternative... But it's not clear to me whether TF+Java is mostly just doing that already.
    Karl Lessard
    @karllessard
    Hi Tom, yes if you look at the tensorflow-core-api, it basically wraps up the TF C API to make it more convenient for Java users. For the size, TF itself is quite big and will take the vast majority of the output binary size. On some platforms, that can reach ~100M.
    tom denton
    @sdenton4
    Ok, great. That's very helpful, thanks! I think 100M is still quite reasonable for desktop users, and I'll pass it along.
    Karl Lessard
    @karllessard
    :+1:
    noahisch
    @noahisch

    Hi Tensorflow experts, I had a few questions about improving latency.

    I am using TF+Java to serve a sequential model in docker, but the latency is a bit higher than I would hope, right now I am only using CPU, and I am wondering how best to improve the latency.

    The model is used scores 100 candidates at a time and right now I am using threads to call the SavedModel, each with a single candidate. The Model takes about 2ms on average to score a candidate.

    1. Would batching the input and sending all 100 candidates at once to the model have better performance (sending a 100x1 feature instead of a 1x1)? Will the Model use multiple CPUS if they are available?
    2. Is adding a GPU to docker the most effective way of decreasing latency?
    3. I was reading out TFLite and how that can reduce latency (https://www.tensorflow.org/lite/performance/model_optimization), but it doesn't seem like serving TFLite is supported in JVM unless on Android. Is this true? If it is, is optimizing the model like in https://www.tensorflow.org/model_optimization the best option to improve performance?

    Also, my features are exclusively doubles, ints, and vectors of ints, which I convert to Tensors like:

    TInt64 createInt64Tensor(List<? extends Number> value) {
        TInt64 tensor = TInt64.tensorOf(Shape.of(1, value.size()));
        IntStream.range(0, value.size()).forEach(i -> tensor.setLong(value.get(i).longValue(), 0, i));
        return tensor;
    }

    and

    TFloat64 createFloat64Tensor(Double value) {
        TFloat64 tensor = TFloat64.tensorOf(Shape.of(1, 1));
        tensor.setDouble(value, 0, 0);
        return tensor;
    }

    Is there a more efficient way?

    Thank you!!

    Adam Pocock
    @Craigacp
    1. yes usually batching will improve performance assuming you have more threads for TF to use.
    2. Depending on how much computation is going on and what your model is, GPUs will likely speed things up, but you may need larger batches to fully exploit the extra speed.
    3. We don't support tflite in TF-Java, tflite uses a completely separate runtime.
    noahisch
    @noahisch

    Hi Adam, thanks for the help! It seems like quantization might also have big latency wins.

    Do you have any advice for Tensor creation?
    If I'm batching then certain features will be cloned across all candidates. For instance, if I have a list [1, 2, 3, 4], what is the most efficient way to transform it into a (100, 4) Int64 Tensor, where each row is [1,2,3,4]?

    Thanks!

    Karl Lessard
    @karllessard
    @noahisch , which version of TF Java are you using? We’ve been reported some latencies on 0.4.x which are still under investigation
    Also looking at the code, probably it is only missing from the snippets but make sure that you’ll close the created tensors or you’ll end up having memory leaks
    noahisch
    @noahisch
    I am using 0.4.1, yes I am closing the tensors.
    Karl Lessard
    @karllessard
    Ok. Not saying that is the reason (and yes, batching will speed up your pipeline) but it could be interesting to see if by downgrading to 0.3.3 you observe an increase in speed, here’s the issue I’m referring to: tensorflow/java#461
    Adam Pocock
    @Craigacp
    Quantization may also have big wins depending on your hardware. Modern Intel CPUs have specific instructions for 8-bit int dot products which can speed things up.
    Samuel Audet
    @saudet
    @noahisch It's pretty easy to build TF Lite on Linux, Mac, Windows, since Google supports those platforms for their Python builds. I'm making Java builds available as part of the JavaCPP Presets for TensorFlow Lite here: https://github.com/bytedeco/javacpp-presets/tree/master/tensorflow-lite
    noahisch
    @noahisch

    @karllessard I tried downgrading to 0.3.3 and it fails to load my saved model

    org.tensorflow.exceptions.TFInvalidArgumentException: NodeDef mentions attr 'offset' not in

    but that's okay, I will first try batching, then quantization, then tflite if needed.

    One other thing I noticed is that it takes my program about as much time to create the tensors as it does to run the model. I only have 27 input tensors, 25 of which are a single long/double and 2 that are vectors of ints.

    I tried updating my Tensor code as suggested above from:

    TInt64 createInt64Tensor(List<? extends Number> value) {
        TInt64 tensor = TInt64.tensorOf(Shape.of(1, value.size()));
        IntStream.range(0, value.size()).forEach(i -> tensor.setLong(value.get(i).longValue(), 0, i));
        return tensor;
    }

    to

      public static TInt64 createInt64Tensor(List<Long> value) {
        TInt64 tensor = TInt64.tensorOf(Shape.of(1, value.size()));
        tensor.write(DataBuffers.of(value.toArray(new Long[0]), true, false));
        return tensor;
      }

    and it takes about .4ms longer per run. Is this expected? And is there a better way to create the Tensor then intstreams?

    austinzh
    @austinzh
    @rnett Just found out that after this tensorflow/java#233 PR, SavedModelBundle.function(key) return SessionFunction instead of ConcreteFunction as before. Would you mind kindly point to a place I can found the reasoning behind it?
    Karl Lessard
    @karllessard

    If I recall correctly, that was to match closer the nomenclature used in Python. In Python, a ConcreteFunction could be called from any graph, while ConcreteFunction in Java was originally to describe a self-contained graph&session that you can invoke using a given signature. So that latter was renamed to SessionFunction, and both now share a common interface called TensorFunction.

    Be aware though that now calling a ConcreteFunction from a signature will run it eagerly, i.e. with additional latency.

    As the tensorflow-java API is now stabilizing, I don’t expect a lot of these changes to happen in the next releases, sorry for the inconvenience.

    austinzh
    @austinzh
    Thanks @karllessard, We are aware of the risk. Just want to know the reasoning behind it and make a better adaptation.
    Karl Lessard
    @karllessard

    Great. I do think though that we should have a way to build a SessionFunction the same way we use to create ConcreteFunction in the past, i.e. without having to pass explicitly a Session and let the function allocate one.

    A small addition I'm planning to make, unless someone is faster than me :)