Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    emesha92
    @emesha92

    I can state this is slow after doing benchmarking with other NMT library, in this case using tensor2tensor which blazingly fast. the loss also way faster to converge compared to onmt.

    I’m using similar datasets and architecture

    Not sure if there are some parts in the onmt that i’m not yet configured

    Guillaume Klein
    @guillaumekln
    Is everything comparable? How much slower in your benchmark?
    Guillaume Klein
    @guillaumekln

    how significant the fix of the batch size in multi GPU training? i means, did you perform some benchmarking for this one?

    Actually there are still some odd things going on... Still investigating and might need to refine the behavior in a patch version. It's unclear what TensorFlow is doing under the hood.

    emesha92
    @emesha92

    Is everything comparable? How much slower in your benchmark?

    not everything, but the definition of slow are: 1) how fast the loss going down (in terms of number of steps to reach), the overall loss after ~50k steps still 1-ish something while t2t are 0.5-ish something. 2) how fast to stepping (per 100 steps), This one might 5x slower.

    These measurement is oversimplified, however i just can feel it.

    Actually there are still some odd things going on... Still investigating and might need to refine the behavior in a patch version. It's unclear what TensorFlow is doing under the hood.

    Noted

    Guillaume Klein
    @guillaumekln
    The training loss is not comparable with Tensor2Tensor. They are subtracting a constant to offset the label smoothing.
    For the time to do 100 steps, I would not expect it to be 5x slower (unless TF2 is 5x slower than TF1 which I did not find to be the case). Is it possible to compare tokens per second?
    emesha92
    @emesha92
    For my training, onmt takes 12622 for source words, and takes 2625 for the target words
    emesha92
    @emesha92
    i can’t find any from the tensor2tensor logs for the words per sec
    Stergiadis Manos
    @steremma
    Hey guys, trivial question which for some reason I couldn't find the answer too: When computing how many samples were seen at step X, should I multiply X * batch_size or X * effective_batch_size?
    Guillaume Klein
    @guillaumekln
    If effective_batch_size is set, then effective_batch_size, else it is batch_size.
    Stergiadis Manos
    @steremma
    thanks!
    Stergiadis Manos
    @steremma
    followup question: I am using the Classifier module and I would love to get the predicted probabilities rather than the class at inference time - yet infer does the latter. Any quick solution or pointer?
    Guillaume Klein
    @guillaumekln
    Do you mean the probabilities over all classes?
    Stergiadis Manos
    @steremma

    its binary classification but generally yes. Is score what I am looking for perhaps?

    Probably not: NotImplementedError: This model does not define a score function

    Guillaume Klein
    @guillaumekln
    score is usually used to score an existing prediction. You could define a custom model that extends the base SequenceClassifier, something like:
    class MyClassifier(onmt.models.SequenceClassifier):
    
      def __init__(self):
        super(MyClassifier, self).__init__(...)
    
      def call(self, *args, **kwargs):
        logits, _ = super(MyClassifier, self).super(*args, **kwargs)
        predictions = dict(probs=tf.nn.softmax(logits))
        return logits, predictions
    
      def print_prediction(self, prediction, params=None, stream=None):
        print(prediction["probs"], file=stream)
    Stergiadis Manos
    @steremma
    Will do. Would it make an interesting PR? I also made my own catalog version that hardcodes a Transformer encoder for personal use but its not based on any paper so that one is probably out of scope.
    Guillaume Klein
    @guillaumekln
    Not sure. Users have often very different requirements. I prefer making the code easily extensible than adding more options.
    Stergiadis Manos
    @steremma
    :+1:
    Anna Samiotou
    @annasamt
    image.png
    Hello, does the "train_alignments" work in opennmt-tf v2? when I add it in the config file (under data) I get this error :
    When I comment it out, training runs fine. Thanks in advance
    Guillaume Klein
    @guillaumekln
    Hi, most likely there is a mismatch between your alignment file and your training file. Can you check for any error?
    Anna Samiotou
    @annasamt
    I checked but I see no error. This is a test train with 100 source and 100 target segments (in separate files), transformer model, sentencepiece, mixed precision. As for the alignments, I have used the OpenNMTTokenizer in conservative mode to tokenize them, pasted source/target with " ||| " delimiter, and run fast_align to produce the pharaoh format. The alignment file also comprises 100 lines.
    Anna Samiotou
    @annasamt
    Possibly the error lies on the different tokenization methods i.e. that the alignment (by fast_align) was run on the (same) training files but tokenized by OpenNMTTokenizer (conservative) while the training files, for training the model, are sent as raw text to sentencepiece inside OpenNMT-tf. Any ideas on this?
    Guillaume Klein
    @guillaumekln
    The alignment model should be trained on the tokenized training files.
    Anna Samiotou
    @annasamt
    I understand. The thing is that tokenization of training files happens on-the-fly inside OpenNMT-tf which is more straightforward in my case, because when I do it offline, 1) I have to then de-tokenize myself 2) for protected tokens inside the ⦅⦆ I get funny encoding for these special parenthesis as we have previously discussed in http://forum.opennmt.net/t/issue-with-special-character-u-ff5f/2965/5
    Guillaume Klein
    @guillaumekln
    If you want to train with alignments you would need to tokenize offline. Did you look into using the OpenNMT Tokenizer directly? It is fast and have a convenient Python API. There should not be any encoding issues.
    Anna Samiotou
    @annasamt
    Thanks, I will look into it. So to clarify furher: 1) to train an NMT model with alignments, for the alignment I have to use the same type of tokenization that I do for the training, but I need to do it offline 2) For the training can I still keep the online tok method? or do I have to use the (offline) tokenized files? 3) Finally, does the same apply for word embeddings, if I want to train with embeddings as well?
    Guillaume Klein
    @guillaumekln
    1) Yes.
    2) You could still use the online method but since your already tokenized the files it's probably better to reuse them.
    3) Yes. You want your pretrained embedding vocabulary to cover your training vocabulary.
    Anna Samiotou
    @annasamt
    Many thanks for the support!
    Anna Samiotou
    @annasamt
    One more doubt: suppose I use the offline tokenized files for the training. For inference, could I use the online tokenization (with the same tok method, obviously) to avoid tokenizing source and detokenizing the prediction? If yes, how to specifiy this ( that is, applying tok only to infererence) in the config.yml file? Could I specify "source_tokenization:" under "infer:" ?
    Guillaume Klein
    @guillaumekln
    Mmh good question. Right now if you configure the tokenization in the YAML file it will apply it for all files: training, evaluation, and test.
    You could however define the tokenization configuration in a separate file and use it only for inference using the Multiple configuration files approach
    Anna Samiotou
    @annasamt
    @guillaumekln Mmm, I thought so; and the online tokenization is sooo handy. Anyway, we can't have it all, I suppose. If training with alignments and/or word embeddings improves the system performance I guess it is worth the extra effort.
    @guillaumekln Ah, OK, good point! I will try the multiple config option for inference. Thanks!
    Rahaf Alnajjar
    @MaryaAI
    Hi,
    I used to run OpenNMT-tf on GPU and
    Now I want to run it on CPU, Are there any restriction on that?? Does it work on CPU?
    Guillaume Klein
    @guillaumekln
    Hi, yes it works on CPU. It is just slower.
    Anna Samiotou
    @annasamt
    Hi, to run opennmt-tf inference on CPU only, do we use the model as such or is there a release option as in lua? Or is it enough to specify in the inference command --num_gpus 0?
    Guillaume Klein
    @guillaumekln
    If there is no GPU on the system, it will automatically use the CPU. There is no specific option.
    Anna Samiotou
    @annasamt
    So if the system does have at least one GPU there is no way to use CPU-only for the inference, right?
    Guillaume Klein
    @guillaumekln
    You can set the environment variable CUDA_VISIBLE_DEVICES, e.g. CUDA_VISIBLE_DEVICES= onmt-main ...
    Anna Samiotou
    @annasamt
    OK, I'll try using this env var without a value this time. Thx
    Also if the OpenNMT-tf model v2x is trained on GPU, can inference still be run on CPU-only?
    Guillaume Klein
    @guillaumekln
    Yes, checkpoints do not include any information regarding the training device.
    Anna Samiotou
    @annasamt
    OK great
    image.png
    I have a different question: While training, with omnt-tf v2x, I get this message in the log which does not hinder the training , however I wonder about its meaning/consequencies. Any idea?
    I read that this is probably due to the fact that the number of iterations performed on the dataset is greater than the number of batches in the dataset. Is this a problem? Or should I ignore the message?
    Guillaume Klein
    @guillaumekln
    You should ignore this message. It just means that we finished iterating on the dataset.
    Anna Samiotou
    @annasamt
    Thx
    newger
    @newger18
    Hi, @guillaumekln, I meet the same problem that @emesha92 mentioned. The GPU utils is very low,most of the time 0%. sample_buffer_size is 500000, gpu num is 8, Accumulate gradients of 2 iterations to reach effective batch size of 25000. I wonder if the preparation of data is too slow?