NdArray
to wrap image bytes, and then expand it with 1 dimension. Like this is done in Python
: self.model.predict(np.expand_dims(im, axis=0))
I've run into a weird Out Of Memory issue with TFJ 0.3.1. I initialize a couple of models on server startup & process a couple of images. CPU, not GPU. That works. Then the server idles for some time (half an hour?). Then if I process another image, it sometimes throws java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (57976M) > maxPhysicalBytes (37568M)
at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:695)
. The server was typically using <10 gigs (as reported by Java, not Pointer.physicalBytes
) before this, so it's very weird that it would shoot up to almost 60 gigs with no TF usage!?
This all started happening after I introduced a GPU microservice - the servers used to process one image per couple of seconds with CPU inference and never crashed, but now that they only process an image if the GPU microservice is being too slow (= fallback), they've all of a sudden started crashing!? The GPU microservice is holding up fine - it's processing say 1 image per second or something and not crashing.
Runtime.getRuntime.totalMemory() - Runtime.getRuntime.freeMemory()
.
-XX:+DisableExplicitGC
etc. ZGC maps the memory multiple times, which is known to inflate the resident set size (https://mail.openjdk.java.net/pipermail/zgc-dev/2018-November/000540.html), throwing off JavaCPP's calculation.
Hello, we run Tensorflow Java 0.2.0 (TF 2.3.1). And we have a model that produces an image out of the model as a byte[]. This is fine when running the model in python. We can write the bytes to a file. But when trying to do the same thing using TF Java we run into getting too few bytes from the output. I think I have managed to boil it down to unit test with TString.
public void testTFNdArray() throws Exception {
ClassLoader contextClassLoader = Thread.currentThread().getContextClassLoader();
byte[] readAllBytes = Files.readAllBytes(Path.of(contextClassLoader.getResource("img_12.png").getFile()));
NdArray<byte[]> vectorOfObjects = NdArrays.vectorOfObjects(readAllBytes);
Tensor<TString> tensorOfBytes = TString.tensorOfBytes(vectorOfObjects);
TString data = tensorOfBytes.data();
byte[] asBytes = tensorOfBytes.data().asBytes().getObject(0);
System.out.println("Bytes original file: " + readAllBytes.length);
System.out.println("NdArray byte[] length: " + vectorOfObjects.getObject().length);
System.out.println("Tensor numbytes: " + tensorOfBytes.numBytes());
System.out.println("TString size: " + data.size());
System.out.println("Bytes with reading from TString (WRONG): " + asBytes.length);
}
This is the problem that I get with running through a real model. How should we be able to get the full byte[] out again?
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0000000133cf0fbb, pid=19618, tid=5891
#
# JRE version: OpenJDK Runtime Environment (11.0.2+9) (build 11.0.2+9)
# Java VM: OpenJDK 64-Bit Server VM (11.0.2+9, mixed mode, tiered, compressed oops, g1 gc, bsd-amd64)
# Problematic frame:
# C [libtensorflow_cc.2.dylib+0x8228fbb] TF_TensorData+0xb
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/jakob/proj/tfjava031/hs_err_pid19618.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#