Hi all, i am trying to run this code of text summarizationSEP = 0 # Padding or separator token
EOS = 1 # End of sentence token
for (article, summary) in stream:
joint = np.array(list(article) + [EOS, SEP] + list(summary) + [EOS])
mask =  (len(list(article)) + 2) +  (len(list(summary)) + 1) # Accounting for EOS and SEP
yield joint, joint, np.array(mask)
input_pipeline = trax.data.Serial(
# Tokenizes trax.data.Tokenize(vocab_dir='vocab_dir1/', vocab_file='vocab_pa'), # Uses function defined above preprocess, # Filters out examples longer than 2048 trax.data.FilterByLength(2048)
train_stream = input_pipeline(train_stream_fn())
train_input, train_target, train_mask = next(train_stream)
assert sum((train_input - train_target)**2) == 0
Hey guys, I'm enjoying learning about Trax and I got stuck on this issue. As I understand there is a key difference between Trax and Keras/Tensorflow/Pytorch in the way they handle RNN layers like GRU and LSTM. In Keras,
n_units is the number of individual cells we have in your sequence layer and the dimension of the weight matrix is automatically obtained from the dimension of the input/embedding layer. The number of individual cells (
n_units in Keras) needs to be equal to the
max_len of all your input sequences.
Now in Trax, as I understand the number of cells does not need to be fixed and
n_units is actually the dimension of the weight matrix. It looks like the number of cells can actually differ from batch to batch. It needs to be constant within a batch to not mess up the gradients which makes sense.
This difference is quite confusing and I can't find any detailed documentation on it. I have so many questions about this. Why do we need to specify
n_units in Trax if it has to be equal to the embedding dimension? What happens with the number of cells? How can it be different from batch to batch? What happens in the many-to-many type problem, where we'll have a different number of outputs for each batch? What happens when we convert the model to Keras which expects a constant number of cells? Etc... To me, it feels like this is a huge fundamental difference between Trax and more established environments yet I can't find any documentation on this.
If somebody could shed some light on this, I'd really appreciate this. I'd love to read some thoughts from the autors for why they designed it this way as well as for the pros/cons of each appraoch. It's still blowing my mind that you can get away doing it this way! Thanks!
I'm new not only trax but also ML. Hence, I decided to utilize a complete package for my work.
In my case, two models are instantiated in both 'eval' and 'train' mode.
'eval' mode is to generate samples, which are training data. I need
fast_model = tl.Accelerate(eval_model) to sample from the model in each training step.
fast_model.replicate_weights(model_train) is called in my training data generator to synchronize the two separated models.
However, I got stuck on the error
Invalid argument: CopyToDevice called on deleted or donated buffer.
What is the best way to sample during the training process?
i'm training a reformer model on a custom dataset, the code is pretty much identical to https://github.com/google/trax/blob/master/trax/examples/NMT_with_Transformers_Reformers_using_Trax.ipynb
when i run the training, i keep getting "nan" for CrossEntropyLoss after 200 steps
here's a sample of the log:
Step 2300: Ran 100 train steps in 21.84 secs
Step 2300: train CrossEntropyLoss | nan
Step 2300: eval CrossEntropyLossWithLogSoftmax | nan
Step 2300: eval WeightedCategoryAccuracy | 0.00000000
Step 2400: Ran 100 train steps in 21.36 secs
Step 2400: train CrossEntropyLoss | nan
Step 2400: eval CrossEntropyLossWithLogSoftmax | nan
Step 2400: eval WeightedCategoryAccuracy | 0.00000000
here's my model:
model = trax.models.Reformer( input_vocab_size=33600, d_model=512, d_ff=2048, dropout=0.1, n_heads=8, n_encoder_layers=6, n_decoder_layers=6, max_len=2048, mode='train') train_task = training.TrainTask( # use the train batch stream as labeled data labeled_data=train_batch_stream, # use the cross entropy loss with LogSoftmax loss_layer=tl.CrossEntropyLoss(), # use the Adafactor optimizer with learning rate of 0.001 optimizer=trax.optimizers.Adafactor(learning_rate=0.001, epsilon1=1e-30), # have 500 warmup steps lr_schedule=trax.lr.multifactor(constant=1.0, warmup_steps=500), # have a checkpoint every 100 steps n_steps_per_checkpoint=100, # saving a checkpoint every 1000 steps on the output_dir n_steps_per_permanent_checkpoint=1000 )
python trainer.py --gpu_cluster_chief_ip=10.0.0.5 --gpu_cluster_n_hosts=2 --gpu_cluster_host_id=0on one host, and the same on the other with
Run this tutorial to develop an NMT model: https://colab.research.google.com/github/OmarAlsaqa/trax/blob/master/trax/examples/NMT_with_Transformers_Reformers_using_Trax.ipynb.
I would like to export the model as a tf object so as to deploy it to tensorflow serving. I followed this tutorial:
However, I got this error: StagingError: Exception encountered when calling layer "as_keras_3" (type AsKeras).
What is the benefit of providing a signature when calling the
init() method of a model? Depending on the definition of the model, the signature is usually part of it. For example, when defining an
Embedding layer, then surely the input is of shape
(batch_size), where the values range from 0 to
embed_size. In fact, in the code of
Embedding, they simply delete the input signature:
So, what is the point behind it?
TransformerEncoderand provide all the necessary information like
n_classes, etc., why do I need to further provide input signature when I call
run(1000)to finish a training epoch. Usually, one provides an iterator and doesn't know how many batches there are in the training data, so how do I go about telling Loop to go through the training data X epochs?
# Create a Transformer model. # Pre-trained model config in gs://trax-ml/models/translation/ende_wmt32k.gin model = trax.models.Transformer( input_vocab_size=33300, d_model=512, d_ff=2048, n_heads=8, n_encoder_layers=6, n_decoder_layers=6, max_len=2048, mode='predict') # Initialize using pre-trained weights. model.init_from_file('gs://trax-ml/models/translation/ende_wmt32k.pkl.gz', weights_only=True) # Tokenize a sentence. sentence = 'It is nice to learn new things today!' tokenized = list(trax.data.tokenize(iter([sentence]), # Operates on streams. vocab_dir='gs://trax-ml/vocabs/', vocab_file='ende_32k.subword')) # Decode from the Transformer. tokenized = tokenized[None, :] # Add batch dimension. tokenized_translation = trax.supervised.decoding.autoregressive_sample( model, tokenized, temperature=0.0) # Higher temperature: more diverse results. # De-tokenize, tokenized_translation = tokenized_translation[:-1] # Remove batch and EOS. translation = trax.data.detokenize(tokenized_translation, vocab_dir='gs://trax-ml/vocabs/', vocab_file='ende_32k.subword') print(translation)
How can I visualize attention weights/scores on my input sequences? I want to see where each attention head is paying attention on my input sequence at each input processed. I'm using a personalized Transformer decoder on my model, containing only the Causal Attention and this layer has a name that I can use, if needed.
In Keras we can do this: https://stackoverflow.com/questions/53867351/how-to-visualize-attention-weights. Is there some related way to do this in Trax?
Has anyone else had trouble with initializing a model from a file? When I start with a file the model isn't callable without a JAX error, and I see the weights have a different type than they would from running init.
model.init(trax.shapes.signature(test)) print(type(model.weights[<class 'jaxlib.xla_extension.DeviceArray'> model.init_from_file('r_v6/model.pkl.gz') print(type(model.weights[ ][ ])) >> <class 'numpy.ndarray'>][ ])) >>