function.call(intputs)
and I get a Map<String, Tensor> back. Do I need to close each tensor after each .call
or will they be re-used?
Result
class instead of a Map
, that can be closed alone to close all of the outputs.
tensor
is an instance of TFloat32
Hi Tensorflow experts, I had a few questions about improving latency.
I am using TF+Java to serve a sequential model in docker, but the latency is a bit higher than I would hope, right now I am only using CPU, and I am wondering how best to improve the latency.
The model is used scores 100 candidates at a time and right now I am using threads to call the SavedModel, each with a single candidate. The Model takes about 2ms on average to score a candidate.
Also, my features are exclusively doubles, ints, and vectors of ints, which I convert to Tensors like:
TInt64 createInt64Tensor(List<? extends Number> value) {
TInt64 tensor = TInt64.tensorOf(Shape.of(1, value.size()));
IntStream.range(0, value.size()).forEach(i -> tensor.setLong(value.get(i).longValue(), 0, i));
return tensor;
}
and
TFloat64 createFloat64Tensor(Double value) {
TFloat64 tensor = TFloat64.tensorOf(Shape.of(1, 1));
tensor.setDouble(value, 0, 0);
return tensor;
}
Is there a more efficient way?
Thank you!!
Hi Adam, thanks for the help! It seems like quantization might also have big latency wins.
Do you have any advice for Tensor creation?
If I'm batching then certain features will be cloned across all candidates. For instance, if I have a list [1, 2, 3, 4], what is the most efficient way to transform it into a (100, 4) Int64 Tensor, where each row is [1,2,3,4]?
Thanks!
@karllessard I tried downgrading to 0.3.3 and it fails to load my saved model
org.tensorflow.exceptions.TFInvalidArgumentException: NodeDef mentions attr 'offset' not in
but that's okay, I will first try batching, then quantization, then tflite if needed.
One other thing I noticed is that it takes my program about as much time to create the tensors as it does to run the model. I only have 27 input tensors, 25 of which are a single long/double and 2 that are vectors of ints.
I tried updating my Tensor code as suggested above from:
TInt64 createInt64Tensor(List<? extends Number> value) {
TInt64 tensor = TInt64.tensorOf(Shape.of(1, value.size()));
IntStream.range(0, value.size()).forEach(i -> tensor.setLong(value.get(i).longValue(), 0, i));
return tensor;
}
to
public static TInt64 createInt64Tensor(List<Long> value) {
TInt64 tensor = TInt64.tensorOf(Shape.of(1, value.size()));
tensor.write(DataBuffers.of(value.toArray(new Long[0]), true, false));
return tensor;
}
and it takes about .4ms longer per run. Is this expected? And is there a better way to create the Tensor then intstreams?
SessionFunction
instead of ConcreteFunction
as before. Would you mind kindly point to a place I can found the reasoning behind it?
If I recall correctly, that was to match closer the nomenclature used in Python. In Python, a ConcreteFunction
could be called from any graph, while ConcreteFunction
in Java was originally to describe a self-contained graph&session that you can invoke using a given signature. So that latter was renamed to SessionFunction
, and both now share a common interface called TensorFunction
.
Be aware though that now calling a ConcreteFunction
from a signature will run it eagerly, i.e. with additional latency.
As the tensorflow-java API is now stabilizing, I don’t expect a lot of these changes to happen in the next releases, sorry for the inconvenience.
Great. I do think though that we should have a way to build a SessionFunction
the same way we use to create ConcreteFunction
in the past, i.e. without having to pass explicitly a Session
and let the function allocate one.
A small addition I'm planning to make, unless someone is faster than me :)