I can state this is slow after doing benchmarking with other NMT library, in this case using tensor2tensor
which blazingly fast. the loss also way faster to converge compared to onmt
.
I’m using similar datasets and architecture
Not sure if there are some parts in the onmt
that i’m not yet configured
how significant the fix of the batch size in multi GPU training? i means, did you perform some benchmarking for this one?
Actually there are still some odd things going on... Still investigating and might need to refine the behavior in a patch version. It's unclear what TensorFlow is doing under the hood.
Is everything comparable? How much slower in your benchmark?
not everything, but the definition of slow are: 1) how fast the loss going down (in terms of number of steps to reach), the overall loss after ~50k steps still 1-ish something while t2t
are 0.5-ish something. 2) how fast to stepping (per 100 steps), This one might 5x slower.
These measurement is oversimplified, however i just can feel it.
Actually there are still some odd things going on... Still investigating and might need to refine the behavior in a patch version. It's unclear what TensorFlow is doing under the hood.
Noted
score
is usually used to score an existing prediction. You could define a custom model that extends the base SequenceClassifier
, something like:class MyClassifier(onmt.models.SequenceClassifier):
def __init__(self):
super(MyClassifier, self).__init__(...)
def call(self, *args, **kwargs):
logits, _ = super(MyClassifier, self).super(*args, **kwargs)
predictions = dict(probs=tf.nn.softmax(logits))
return logits, predictions
def print_prediction(self, prediction, params=None, stream=None):
print(prediction["probs"], file=stream)