Hi all, i am trying to run this code of text summarizationSEP = 0 # Padding or separator token
EOS = 1 # End of sentence token
def preprocess(stream):
for (article, summary) in stream:
joint = np.array(list(article) + [EOS, SEP] + list(summary) + [EOS])
mask = [0] (len(list(article)) + 2) + [1] (len(list(summary)) + 1) # Accounting for EOS and SEP
yield joint, joint, np.array(mask)
input_pipeline = trax.data.Serial(
# Tokenizes
trax.data.Tokenize(vocab_dir='vocab_dir1/',
vocab_file='vocab_pa'),
# Uses function defined above
preprocess,
# Filters out examples longer than 2048
trax.data.FilterByLength(2048)
)
train_stream = input_pipeline(train_stream_fn())
train_input, train_target, train_mask = next(train_stream)
assert sum((train_input - train_target)**2) == 0
Hey guys, I'm enjoying learning about Trax and I got stuck on this issue. As I understand there is a key difference between Trax and Keras/Tensorflow/Pytorch in the way they handle RNN layers like GRU and LSTM. In Keras, n_units
is the number of individual cells we have in your sequence layer and the dimension of the weight matrix is automatically obtained from the dimension of the input/embedding layer. The number of individual cells (n_units
in Keras) needs to be equal to the max_len
of all your input sequences.
Now in Trax, as I understand the number of cells does not need to be fixed and n_units
is actually the dimension of the weight matrix. It looks like the number of cells can actually differ from batch to batch. It needs to be constant within a batch to not mess up the gradients which makes sense.
This difference is quite confusing and I can't find any detailed documentation on it. I have so many questions about this. Why do we need to specify n_units
in Trax if it has to be equal to the embedding dimension? What happens with the number of cells? How can it be different from batch to batch? What happens in the many-to-many type problem, where we'll have a different number of outputs for each batch? What happens when we convert the model to Keras which expects a constant number of cells? Etc... To me, it feels like this is a huge fundamental difference between Trax and more established environments yet I can't find any documentation on this.
If somebody could shed some light on this, I'd really appreciate this. I'd love to read some thoughts from the autors for why they designed it this way as well as for the pros/cons of each appraoch. It's still blowing my mind that you can get away doing it this way! Thanks!
Hello everyone!,
I'm new not only trax but also ML. Hence, I decided to utilize a complete package for my work.
In my case, two models are instantiated in both 'eval' and 'train' mode.
'eval' mode is to generate samples, which are training data. I need fast_model = tl.Accelerate(eval_model)
to sample from the model in each training step.
Also, fast_model.replicate_weights(model_train)
is called in my training data generator to synchronize the two separated models.
However, I got stuck on the error Invalid argument: CopyToDevice called on deleted or donated buffer
.
What is the best way to sample during the training process?
Thanks.
i'm training a reformer model on a custom dataset, the code is pretty much identical to https://github.com/google/trax/blob/master/trax/examples/NMT_with_Transformers_Reformers_using_Trax.ipynb
when i run the training, i keep getting "nan" for CrossEntropyLoss after 200 steps
here's a sample of the log:
Step 2300: Ran 100 train steps in 21.84 secs
Step 2300: train CrossEntropyLoss | nan
Step 2300: eval CrossEntropyLossWithLogSoftmax | nan
Step 2300: eval WeightedCategoryAccuracy | 0.00000000
Step 2400: Ran 100 train steps in 21.36 secs
Step 2400: train CrossEntropyLoss | nan
Step 2400: eval CrossEntropyLossWithLogSoftmax | nan
Step 2400: eval WeightedCategoryAccuracy | 0.00000000
here's my model:
model = trax.models.Reformer(
input_vocab_size=33600,
d_model=512, d_ff=2048, dropout=0.1,
n_heads=8, n_encoder_layers=6, n_decoder_layers=6,
max_len=2048, mode='train')
train_task = training.TrainTask(
# use the train batch stream as labeled data
labeled_data=train_batch_stream,
# use the cross entropy loss with LogSoftmax
loss_layer=tl.CrossEntropyLoss(),
# use the Adafactor optimizer with learning rate of 0.001
optimizer=trax.optimizers.Adafactor(learning_rate=0.001, epsilon1=1e-30),
# have 500 warmup steps
lr_schedule=trax.lr.multifactor(constant=1.0, warmup_steps=500),
# have a checkpoint every 100 steps
n_steps_per_checkpoint=100,
# saving a checkpoint every 1000 steps on the output_dir
n_steps_per_permanent_checkpoint=1000
)
python trainer.py --gpu_cluster_chief_ip=10.0.0.5 --gpu_cluster_n_hosts=2 --gpu_cluster_host_id=0
on one host, and the same on the other with gpu_cluster_host_id=1
.
Run this tutorial to develop an NMT model: https://colab.research.google.com/github/OmarAlsaqa/trax/blob/master/trax/examples/NMT_with_Transformers_Reformers_using_Trax.ipynb.
I would like to export the model as a tf object so as to deploy it to tensorflow serving. I followed this tutorial:
https://trax-ml.readthedocs.io/en/latest/notebooks/tf_numpy_and_keras.html#2.-Convert-Trax-to-Keras
However, I got this error: StagingError: Exception encountered when calling layer "as_keras_3" (type AsKeras).
What is the benefit of providing a signature when calling the init()
method of a model? Depending on the definition of the model, the signature is usually part of it. For example, when defining an Embedding
layer, then surely the input is of shape (batch_size)
, where the values range from 0 to embed_size
. In fact, in the code of Embedding
, they simply delete the input signature:
So, what is the point behind it?
TransformerEncoder
and provide all the necessary information like vocab_size
, n_classes
, etc., why do I need to further provide input signature when I call init()
?
run(1000)
to finish a training epoch. Usually, one provides an iterator and doesn't know how many batches there are in the training data, so how do I go about telling Loop to go through the training data X epochs?
# Create a Transformer model.
# Pre-trained model config in gs://trax-ml/models/translation/ende_wmt32k.gin
model = trax.models.Transformer(
input_vocab_size=33300,
d_model=512, d_ff=2048,
n_heads=8, n_encoder_layers=6, n_decoder_layers=6,
max_len=2048, mode='predict')
# Initialize using pre-trained weights.
model.init_from_file('gs://trax-ml/models/translation/ende_wmt32k.pkl.gz',
weights_only=True)
# Tokenize a sentence.
sentence = 'It is nice to learn new things today!'
tokenized = list(trax.data.tokenize(iter([sentence]), # Operates on streams.
vocab_dir='gs://trax-ml/vocabs/',
vocab_file='ende_32k.subword'))[0]
# Decode from the Transformer.
tokenized = tokenized[None, :] # Add batch dimension.
tokenized_translation = trax.supervised.decoding.autoregressive_sample(
model, tokenized, temperature=0.0) # Higher temperature: more diverse results.
# De-tokenize,
tokenized_translation = tokenized_translation[0][:-1] # Remove batch and EOS.
translation = trax.data.detokenize(tokenized_translation,
vocab_dir='gs://trax-ml/vocabs/',
vocab_file='ende_32k.subword')
print(translation)
[Attention Visualization]
Hi guys,
How can I visualize attention weights/scores on my input sequences? I want to see where each attention head is paying attention on my input sequence at each input processed. I'm using a personalized Transformer decoder on my model, containing only the Causal Attention and this layer has a name that I can use, if needed.
In Keras we can do this: https://stackoverflow.com/questions/53867351/how-to-visualize-attention-weights. Is there some related way to do this in Trax?
Hi All,
Has anyone else had trouble with initializing a model from a file? When I start with a file the model isn't callable without a JAX error, and I see the weights have a different type than they would from running init.
model.init(trax.shapes.signature(test))
print(type(model.weights[0][0]))
>> <class 'jaxlib.xla_extension.DeviceArray'>
model.init_from_file('r_v6/model.pkl.gz')
print(type(model.weights[0][0]))
>> <class 'numpy.ndarray'>