Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Adam Pocock
    @Craigacp
    TFRecords are the preferred input/output format for training data in TF. We can load them in Java, but I don't know if anyone has tried to write them in Java (they are just protobufs).
    Gili Tzabari
    @cowwoc
    Right. There is a TfRecordReader but not corresponding TfRecordWriter.
    Gili Tzabari
    @cowwoc
    Is there a TensorFlow for Python-specific gitter channel I could join to ask questions? Specifically, I'm wondering whether it's possible for me to use the TF 2.x high-level API to create a Sequential() with a mixed input type. Some of the input is a int32. Other parts are float64. Any ideas?
    Jason Zaman
    @perfinion
    specifically Squential()? its pretty trivial to have several inputs if you use the functional API yeah
    Gili Tzabari
    @cowwoc
    Okay, what do I lose if I move over to the functional API? Ease-of-use? Would I lose much of it?
    Jason Zaman
    @perfinion
    lose nothing, the functional API is better than Sequential anyway, for a trivial model with only a couple layers its really just a difference between if you put your layers in a list or just call them in order, for multi-input or multi-output you cant do that with sequential
    Gili Tzabari
    @cowwoc
    Okay, thank you.
    Jason Zaman
    @perfinion
    there are some nice examples on the keras docs, it should be pretty simple to change a Sequential model to use the functional API then from there you can add other stuff easily enough
    Gili Tzabari
    @cowwoc
    Okay. I was under the impression that TF 2.x took over all future Keras API development. Are you saying that the latter is still undergoing independent development and I should look at their docs instead of the TensorFlow ones?
    Jason Zaman
    @perfinion
    the docs on keras.io are still relevant
    but yeah tf.keras is where the majority of the dev is done
    Gili Tzabari
    @cowwoc
    Are the docs on keras.io fully up-to-date with the tf.keras API? Or are they older?
    Jason Zaman
    @perfinion
    not sure about all of them, but i have definitely seen plenty of updates recently yeah
    Gili Tzabari
    @cowwoc

    Okay, thank you.

    Another question: My dataset is dynamically generated from a database. All online tutorials I saw focus on pulling in an existing dataset (e.g. MNIST) or from CSV files on disk. Neither seems like a good fit for my case. Should I be "streaming" data from the database to the model for training somehow? Or am I expected to construct a fixed-sized tensor and populate it column by column based on the database resultset?

    Jason Zaman
    @perfinion
    oh heh. yeah i hate that part of all the intro docs, for real things your dataset doesnt fit in memory so yeah absolutely should be streamed in
    use hte tf.data.Dataset api and it should read in data then you .map() to preprocess how your model wants it (eg resize the images or whatever)
    SIG-IO has lots of other extensions for pulling data from other places if the builtin things arent good enough
    Gili Tzabari
    @cowwoc
    Thank you. That's very helpful. And I assume that if I want to shuffle the streamed dataset I'll have to shuffle on the DB end, right?
    Somehow... I'm not even sure how to do that to be honest :)
    Jason Zaman
    @perfinion
    nah, you just ds = ds.shuffle(1000)
    the tf.data API is really nice
    Gili Tzabari
    @cowwoc
    Doesn't that need to pull down the entire dataset before it can shuffle it?
    Jason Zaman
    @perfinion
    it pulls 1000 items then randomly picks from that so it isnt exactly shuffling the entire epoch but close enough
    you can just make hte window bigger or smaller depending on memory
    Gili Tzabari
    @cowwoc
    Ah. Makes sense.
    Is there a way for me to provide a "generator" that will return elements that go into a dataset? I've got complex query logic already written in Java so I can use jpype to pass a Java Stream over to the python API but then I have no way to convert it to a dataset.
    Jason Zaman
    @perfinion
    the python API has https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_generator , not sure about from java tho
    there might be a more direct way that other people know
    Gili Tzabari
    @cowwoc
    That will do. Are there any downsides to this function? I could have sworn I read that it was deprecated or led to bad performance or something, but I can't find it now.
    Jason Zaman
    @perfinion
    that specific thing uses tf.numpy_function which yeah has performance implications, but there are probably better ways
    Gili Tzabari
    @cowwoc
    Hmm, using generator + lambda results in 5x performance drop: tensorflow/tensorflow#37488
    Jason Zaman
    @perfinion
    you could probably pull a batch of them at a time and make a Dataset then .interleave() those datasets together
    or interface directly to tf.data from java to C++ isntead of via python
    Gili Tzabari
    @cowwoc
    Yeah, https://stackoverflow.com/a/60717965/14731 seems to say the same
    I mean, they say you can just pull a small batch at a time, train, then pull the next batch and so on. If I'm already using the functional API this seems like the way to go.
    Jason Zaman
    @perfinion
    well those are unrelated, sequential or functional api both end up with a tf.keras.Model object which you just do model.fit() on the same way
    also you can pull from the DB in many threads then interleave into the main dataset which goes to the GPU
    if that ends up not fast enough, look into just packing the data into TFRecords and train off those, uses more disk space but might be worth it
    Gili Tzabari
    @cowwoc
    I'll try to start small. I'll pull a bit of data from the DB, stuff it into a tf.TensorArray, train, rinse and repeat. If performance becomes an issue I will revisit it but I don't think this will happen in the near future.
    Jakob Sultan Ericsson
    @jakeri
    Are snapshots of PRs published anywhere? I would like to test tensorflow/java#322 without building it myself. :-)
    Adam Pocock
    @Craigacp
    We don't build snapshots of PRs.
    As that doesn't touch the native code you should be able to build it with the -Pdev flag and it'll pull down the latest snapshot of the native code.
    1 reply
    Karl Lessard
    @karllessard
    Well looks like it’s ready to get merged anyway, so let’s do both :)
    8 replies
    Jakob Sultan Ericsson
    @jakeri
    Even better. :)
    Gili Tzabari
    @cowwoc
    Given tensors a = [1, 2, 3] and b = [4, 5] how do I construct a tensor c which is [1, 2, 3, 4, 5]. I know I can tf.reshape and tf.concat but is there an easier way to do this in a single step? I'm using the python API in case that makes a difference.
    Jason Zaman
    @perfinion
    tf.stack
    Jakob Sultan Ericsson
    @jakeri

    Both good and bad about the latest 0.4.0-SNAPSHOT

    The good thing, TString does not core dump anymore on 0.4.0-SNAPSHOT.

    The bad thing, results on OSX is random/garbage.

    We have a unit test that loads a savedmodel (CNN classification using MNIST).
    I have run through the model using python and extracted out the resulting tensors. Our unit test loads this model using TF Java and runs the same data through.
    When I use TF Java on Linux our results match nicely (unit test pass everytime) but on OSX the results are way off and the result is random on every run.

    TF Java Linux (from our build server)

    Test best category: 0 (should be 0) categorical: CategoricalPredictionCell [categorical={0=1.0, 1=2.5633107E-22, 2=1.5087728E-8, 3=2.6744433E-16, 4=2.867041E-14, 5=1.9830472E-16, 6=2.6495522E-10, 7=6.265893E-15, 8=7.546605E-10, 9=4.7946207E-9}]

    TF Java OSX (locally)

    Test best category: 2 (should be 0) categorical: CategoricalPredictionCell [categorical={0=0.0, 1=0.0, 2=1.0, 3=0.0, 4=0.0, 5=0.0, 6=0.0, 7=0.0, 8=0.0, 9=0.0}]

    And for reference python output values (without the matching category)

    1.0000000e+00f, 2.5633206e-22f, 1.5087728e-08f, 2.6744229e-16f, 2.8670517e-14f, 1.9830397e-16f, 2.6495622e-10f, 6.2658695e-15f, 7.5466194e-10f, 4.7946207e-09f

    We did not experience these kind of problems when we were running on 0.2.0.
    And we are not using any GPU or MKL extensions.

    Adam Pocock
    @Craigacp
    Could you open another issue and post the model & test there?
    Presumably this isn't using a TString at all?