## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
Hao
@zsdonghao
batch size here means you separate the whole dataset to batch_size segment
check tl.iternate.ptb_iternaxxx
ruzrobert
@ruzrobert
I understand, that batch_size is just for parallel computations.
But I can't understand how we can even predict next one word using one word (seq-len=1).. Tell me what am I missing here?
Hao
@zsdonghao
to predict next word, your sequence length should be 1, any problem?
see this picture, many to many
ruzrobert
@ruzrobert
I see, and I understand what it is..
But how is it working or why we are using that.
To predict word in a text, we need to know some previous words, at least 2 - 3 for example. But how we can do that using one word? Is it same as predicting next price using one previous price?
Or I just don't know what PTB dataset really is..
Hao
@zsdonghao
you can input multiple words to predict the next word
i think this example is better to understand that ~~
ruzrobert
@ruzrobert
All the time I thought, that to use any NN model after training input data should be the same as it was in the training time.
For example I am feeding prices with seq_len of 20 (20 prices, history).
So to predict them after training with other prices, I must feed exactly 20 prices (seq-len=20).
Or is it not necessary?
Hao
@zsdonghao
for RNN, training and testing can be different
ruzrobert
@ruzrobert
That is something really new for me :)
Hao
@zsdonghao
because RNN can pass the cell state to next step~
yeah, that part is quite confusing~
ruzrobert
@ruzrobert
Okay, it makes a big difference.
But why we are testing our PTB with seq_len=1 then? Is it not more logical to use 20 again?
Hao
@zsdonghao
for language dataset, we usually generate output step by step
because you need to feed the new output as the input of next step
ruzrobert
@ruzrobert
Now I understand, thank you. And the last was:
Does the validation process affects on weights/model training? If not, whats the purpose of it, then ? Why we can't just use test data for it?
Hao
@zsdonghao
you mean why we have 2 sets? train, val, test?
ruzrobert
@ruzrobert
yes, why we don't use test set for validation
Hao
@zsdonghao
val set is for tuning the hyper-parameters
ruzrobert
@ruzrobert
So when we are training we are tuning them. When we are validating we are also tuning them?
I know and I have seen these answers.
But I thought only this line does this: train_op = optimizer.apply_gradients(zip(grads, tvars))
But we are not calling it when validating. That confuses me.
ruzrobert
@ruzrobert
Okay, I checked your tutorials and some TensorFlow tutorials and examples.
And no, there is no any hyper-parameters tuning when validation set is used.. so.. I am confused.
Hao
@zsdonghao
yeah, we need to tune the hyper-parameters manually
ruzrobert
@ruzrobert

@zsdonghao So we are just not using this approach in machine learning?
http://stackoverflow.com/questions/41903062/tensorflow-how-the-validation-set-improves-the-learning-curve
All I see here, is that HP tuning using validation set is too heavy for ML and most popular use is for early stopping and automatic learning rate adjustment.
I have seen in theory papers, that validation set is used for h-param tuning, but I don't see popular use of it in practice.

Does this mean that if dataset is large, then we do not actually need to use that h-param tuning using validation set ?

ruzrobert
@ruzrobert
Also, in the same example (tutorial_ptb_lstm.py), as I understood, we are using stateless LSTM - we are manually resetting states after each batch. Am I right?
I also thought, that this is done automatically - I thought states are resetting after each batch automatically by default. Of course how TF can know that we have finished our batch.. Or this is really done automatically, but only when using Estimator (tf.contrib.learn) ?
Pedro Rodriguez
@EntilZha
I just checked out TensorLayer and wanted to see if my first thoughts were accurate. I started with vanilla TF (+ lots of custom boilerplate) then moved to using Keras more. I most liked the simplification/reduced duplication in the model definition along with the nice things around training, but found debugging more difficult + another set of abstractions (on top of TF, don't care about Theano backend option) to learn in order to implement new layers. It seems like TensorLayer might be a good fit for what I am looking for since my focus in on research so being able to dig into/define novel models is important, does that seem accurate and any drawbacks I should know about?
ruzrobert
@ruzrobert
@EntilZha Well, Pedro, as I understood, it is better to use RAW TensorFlow, despite the complexity of it. Raw TensorFlow will be less buggy and more stable, than all these libraries
Pedro Rodriguez
@EntilZha
True, but you tradeoff productivity (e.g. .fit() call and callbacks save a lot of time). I am wondering if I am right in thinking that TF is close enough to bare TF not to pay too much for that abstraction while still getting nice stuff. Seems like it, but wondering if anyone has used both TensorLayer and Keras a lot that might have a thought
ruzrobert
@ruzrobert
@EntilZha What types of tasks and what types of layer technologies (RNN, CNN, etc) are you using TF for ?
Pedro Rodriguez
@EntilZha
Mostly either RNNs or feedforward networks (averaging words), but increasingly memory network architectures. Tasks are basically all NLP related
Being specific, mostly on question answering
ruzrobert
@ruzrobert
I have used TensorLayer one time, but then I had very big problems with stateful LSTM in it - my app just hangs, and I don't know why, code is same as in example, so I switched back to the Raw TensorFlow, and it's all ok again
Pedro Rodriguez
@EntilZha
Ya, tbh I am most interested in the training utilities
ruzrobert
@ruzrobert
I am not going to be a super expert in it, I just wrapped TensorFlow in my own class, so it would look like a simple Keras like code, that allows me to control and understand everything.
Talking specifically about TensorLayer, I liked it, it makes code simple, but it is not so short as Keras
I wanted to understand how is Tensorlayer working. In RNN, Text Generator code, we train the model, update the model using the variables - Network, LSTM1, LSTM2. Next, while testing we use Netowrk_test, LSTM1_test and LSTM2_test. If these tow are not connected what is the point of training?
Since 'EmbeddingAttentionSeq2seqWrapper' for translation is not working for latest TF versions, what is the work around for this?
Hao
@zsdonghao
@khadekirti hi i am working on it, will release new example soon
Nitish Bhardwaj
@nitish11
how to run tensolayer code on GPU?
Hao
@zsdonghao
install tensor flow-gpu
Hao
@zsdonghao
ontheway16
@ontheway16
I want to train sr system with my own images. In my images, there are target objects. Therefore I do not need to enhance whole image in good quality but just the targeted objects of 5-6 different type. image backgrounds are mostly very clean so mostly there are only target objects in HR images. Should I use HR images as a whole in training, or use only the crops of target objects found in HR image set?
Hao
@zsdonghao
hi all, please join Slack, gitter is not longer maintain by the core team now ~
ontheway16
@ontheway16
hi, tried it but failed.
HikaruKondo
@bokutotu
Hi all , please join Slack