I stopped a training midway, and changed the corpus (to a slightly bigger, cleaned one), but now I'm having trouble resuming my training. The error I get:
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: StringToNumberOp could not correctly convert string: XXXXXXXXXXXX
XXXXX tends to be untokenized text which doesn't even exist in my corpus (tokenized or untokenized) and changes randomly on each run.
Using the command to train (my YAML file is the same): onmt-main --model_type TransformerBig --config v4_big.yml --auto_config train --with_eval
model_dir: model_v4_big/
data:
train_features_file: v4_big.cleaned.zh
train_labels_file: v4_big.cleaned.en
example_weights: v4_big.cleaned.score
eval_features_file: v4_big_val_old.tok.zh
eval_labels_file: v4_big_val_old.tok.en
source_vocabulary: sp5_zh.opennmttf.txt
target_vocabulary: sp5_en.opennmttf.txt
train:
max_step: 5000000
save_checkpoints_steps: 5000
batch_size: 4096
batch_type: tokens
maximum_features_length: 250
maximum_labels_length: 250
eval:
scorers: bleu
steps: 5000
export_on_best: bleu
early_stopping:
min_improvement: 0.001
steps: 20
Hi, we are trying to run OpenNMT-tf models in Triton using the exported savedmodel. It works perfectly on GPU, but on CPU we are seeing the following error message:
InferenceServerException: The CPU implementation of FusedBatchNorm only supports NHWC tensor format for now.
[[{{node transformer_base_1/self_attention_encoder_1/self_attention_encoder_layer_6/transformer_layer_wrapper_30/layer_norm_33/FusedBatchNormV3}}]]
Did anyone run into the same and knows a solution for this issue? Thank you!
python3 train.py \
-data $data_path/final \
-encoder_type brnn \
-enc_layers 2 \
-decoder_type rnn \
-dec_layers 2 \
-rnn_size 256 \
-global_attention general \
-batch_size 32 \
-word_vec_size 256 \
-bridge \
-copy_attn \
-reuse_copy_attn \
-train_steps 20000 \
-save_checkpoint_steps 10000 \
-save_model $data_path/final-model
Hello, I'm new to opennmt-tf and would like to ask a stupid question.
I tried training a chinese-to-english translation with around 10000 lines of training sample and basically followed the procedures in the quick start tutorial (https://opennmt.net/OpenNMT-tf/quickstart.html) except that the vocabulary was the output of sentencepiece
model (trained with the same training sample). Everything seems worked fine except that the inferred result (after running !onmt-main --config data.yml --auto_config infer --features_file src_test.txt
) giving all <unk>
.
Just wonder if this is expected? (Perhaps too small in the training sample size?) Hope anyone can throw me some light. Thanks in advance.
import opennmt
import tensorflow as tf
from opennmt.utils.misc import merge_dict
class MyCustomRnn(opennmt.models.SequenceToSequence):
"""Defines a medium-sized bidirectional LSTM encoder-decoder model."""
def auto_config(self, num_devices=1):
config = super(MyCustomRnn, self).auto_config(num_devices=num_devices)
return merge_dict(config, {
"params": {
"optimizer": "AdamOptimizer",
"learning_rate": 0.0002,
"param_init": 0.1,
"clip_gradients": 5.0,
"beam_width":11,
},
"train": {
"batch_size": 64,
"maximum_features_length": 80,
"maximum_labels_length": 80
}
})
def init(self):
super(MyCustomRnn, self).init(
source_inputter=opennmt.inputters.WordEmbedder(
vocabulary_file_key="source_words_vocabulary",
embedding_size=128),
target_inputter=opennmt.inputters.WordEmbedder(
vocabulary_file_key="target_words_vocabulary",
embedding_size=128),
encoder=opennmt.encoders.BidirectionalRNNEncoder(
num_layers=1,
num_units=128,
reducer=opennmt.layers.ConcatReducer(),
cell_class=tf.nn.rnn_cell.LSTMCell,
dropout=0.3,
residual_connections=False),
decoder=opennmt.decoders.AttentionalRNNDecoder(
num_layers=1,
num_units=128,
bridge=opennmt.layers.CopyBridge(),
attention_mechanism_class=tf.contrib.seq2seq.LuongAttention,
cell_class=tf.nn.rnn_cell.LSTMCell,
dropout=0.3,
residual_connections=False))
model = MyCustomRnn
For a Neural Machine Translation (NMT) task, my input data has relational information. Probably I can use Graph Neural Network (GNN) and use a Graph2Seq model. But I can't find a good generational model for GNN.
So I want to use a Transformer model. But then the massive problem is how can I embed structural information in a Transformer? Is there any open source artefact for Relational Transformer that I can use out of the box?
I would like to do try out different hyperparameters and compare their performance. There are libraries like the Hyperopt, etc. for tuning.
I would be interested to know is there a programmatic way to try out automated hyperparameter tuning. Any ideas welcome.