by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Hao
    @zsdonghao
    batch size here means you separate the whole dataset to batch_size segment
    check tl.iternate.ptb_iternaxxx
    ruzrobert
    @ruzrobert
    I understand, that batch_size is just for parallel computations.
    But I can't understand how we can even predict next one word using one word (seq-len=1).. Tell me what am I missing here?
    Hao
    @zsdonghao
    to predict next word, your sequence length should be 1, any problem?
    see this picture, many to many
    ruzrobert
    @ruzrobert
    I see, and I understand what it is..
    But how is it working or why we are using that.
    To predict word in a text, we need to know some previous words, at least 2 - 3 for example. But how we can do that using one word? Is it same as predicting next price using one previous price?
    Or I just don't know what PTB dataset really is..
    Hao
    @zsdonghao
    you can input multiple words to predict the next word
    i think this example is better to understand that ~~
    ruzrobert
    @ruzrobert
    All the time I thought, that to use any NN model after training input data should be the same as it was in the training time.
    For example I am feeding prices with seq_len of 20 (20 prices, history).
    So to predict them after training with other prices, I must feed exactly 20 prices (seq-len=20).
    Or is it not necessary?
    Hao
    @zsdonghao
    for RNN, training and testing can be different
    ruzrobert
    @ruzrobert
    That is something really new for me :)
    Hao
    @zsdonghao
    because RNN can pass the cell state to next step~
    yeah, that part is quite confusing~
    ruzrobert
    @ruzrobert
    Okay, it makes a big difference.
    But why we are testing our PTB with seq_len=1 then? Is it not more logical to use 20 again?
    Hao
    @zsdonghao
    for language dataset, we usually generate output step by step
    because you need to feed the new output as the input of next step
    ruzrobert
    @ruzrobert
    Now I understand, thank you. And the last was:
    Does the validation process affects on weights/model training? If not, whats the purpose of it, then ? Why we can't just use test data for it?
    Hao
    @zsdonghao
    you mean why we have 2 sets? train, val, test?
    ruzrobert
    @ruzrobert
    yes, why we don't use test set for validation
    Hao
    @zsdonghao
    val set is for tuning the hyper-parameters
    there are a lot of answers about this online
    ruzrobert
    @ruzrobert
    So when we are training we are tuning them. When we are validating we are also tuning them?
    I know and I have seen these answers.
    But I thought only this line does this: train_op = optimizer.apply_gradients(zip(grads, tvars))
    But we are not calling it when validating. That confuses me.
    ruzrobert
    @ruzrobert
    Okay, I checked your tutorials and some TensorFlow tutorials and examples.
    And no, there is no any hyper-parameters tuning when validation set is used.. so.. I am confused.
    Hao
    @zsdonghao
    yeah, we need to tune the hyper-parameters manually
    ruzrobert
    @ruzrobert

    @zsdonghao So we are just not using this approach in machine learning?
    http://stackoverflow.com/questions/41903062/tensorflow-how-the-validation-set-improves-the-learning-curve
    All I see here, is that HP tuning using validation set is too heavy for ML and most popular use is for early stopping and automatic learning rate adjustment.
    I have seen in theory papers, that validation set is used for h-param tuning, but I don't see popular use of it in practice.

    Does this mean that if dataset is large, then we do not actually need to use that h-param tuning using validation set ?

    ruzrobert
    @ruzrobert
    Also, in the same example (tutorial_ptb_lstm.py), as I understood, we are using stateless LSTM - we are manually resetting states after each batch. Am I right?
    I also thought, that this is done automatically - I thought states are resetting after each batch automatically by default. Of course how TF can know that we have finished our batch.. Or this is really done automatically, but only when using Estimator (tf.contrib.learn) ?
    Pedro Rodriguez
    @EntilZha
    I just checked out TensorLayer and wanted to see if my first thoughts were accurate. I started with vanilla TF (+ lots of custom boilerplate) then moved to using Keras more. I most liked the simplification/reduced duplication in the model definition along with the nice things around training, but found debugging more difficult + another set of abstractions (on top of TF, don't care about Theano backend option) to learn in order to implement new layers. It seems like TensorLayer might be a good fit for what I am looking for since my focus in on research so being able to dig into/define novel models is important, does that seem accurate and any drawbacks I should know about?
    ruzrobert
    @ruzrobert
    @EntilZha Well, Pedro, as I understood, it is better to use RAW TensorFlow, despite the complexity of it. Raw TensorFlow will be less buggy and more stable, than all these libraries
    Pedro Rodriguez
    @EntilZha
    True, but you tradeoff productivity (e.g. .fit() call and callbacks save a lot of time). I am wondering if I am right in thinking that TF is close enough to bare TF not to pay too much for that abstraction while still getting nice stuff. Seems like it, but wondering if anyone has used both TensorLayer and Keras a lot that might have a thought
    ruzrobert
    @ruzrobert
    @EntilZha What types of tasks and what types of layer technologies (RNN, CNN, etc) are you using TF for ?
    Pedro Rodriguez
    @EntilZha
    Mostly either RNNs or feedforward networks (averaging words), but increasingly memory network architectures. Tasks are basically all NLP related
    Being specific, mostly on question answering
    ruzrobert
    @ruzrobert
    I have used TensorLayer one time, but then I had very big problems with stateful LSTM in it - my app just hangs, and I don't know why, code is same as in example, so I switched back to the Raw TensorFlow, and it's all ok again
    Pedro Rodriguez
    @EntilZha
    Ya, tbh I am most interested in the training utilities
    ruzrobert
    @ruzrobert
    I am not going to be a super expert in it, I just wrapped TensorFlow in my own class, so it would look like a simple Keras like code, that allows me to control and understand everything.
    Talking specifically about TensorLayer, I liked it, it makes code simple, but it is not so short as Keras
    khadekirti
    @khadekirti
    I wanted to understand how is Tensorlayer working. In RNN, Text Generator code, we train the model, update the model using the variables - Network, LSTM1, LSTM2. Next, while testing we use Netowrk_test, LSTM1_test and LSTM2_test. If these tow are not connected what is the point of training?
    Since 'EmbeddingAttentionSeq2seqWrapper' for translation is not working for latest TF versions, what is the work around for this?
    Hao
    @zsdonghao
    @khadekirti hi i am working on it, will release new example soon
    Nitish Bhardwaj
    @nitish11
    how to run tensolayer code on GPU?
    Hao
    @zsdonghao
    install tensor flow-gpu
    ontheway16
    @ontheway16
    I want to train sr system with my own images. In my images, there are target objects. Therefore I do not need to enhance whole image in good quality but just the targeted objects of 5-6 different type. image backgrounds are mostly very clean so mostly there are only target objects in HR images. Should I use HR images as a whole in training, or use only the crops of target objects found in HR image set?
    Hao
    @zsdonghao
    hi all, please join Slack, gitter is not longer maintain by the core team now ~
    ontheway16
    @ontheway16
    hi, tried it but failed.
    BokuToTuZenU
    @Hika-Kondo
    Hi all , please join Slack