Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Guillaume Klein
    @guillaumekln
    As you can see, there are only wheels for Linux: https://pypi.org/project/pyonmttok/#files
    alrudak
    @alrudak
    What parameter should I use to run instance on specific GPU ? (0,1,2, etc)
    Jordi Mas
    @jordimas
    Hello
    I'm using SetentencePiece as tokenizer to train an OpenNMT model
    I will like that when I ask the model to translate something in upper case "HELLO" is able to preserve the case in the translation.
    I was expecting this to be a configuration of the Tokenizer but I have not been able to found it. Any help or hint is appreciated.
    Guillaume Klein
    @guillaumekln
    Hi. You can either add training examples in uppercase or look into the case_markup option from the Tokenizer.
    Jordi Mas
    @jordimas
    Thanks!
    Jordi Mas
    @jordimas
    Hello
    I updated from OpenNMT 2.40 to 2.70 and "replace_unknown_target=True" has stop working. And now I get <unk> tag instead of the source tags for words of vocabulary. Is possible that a regression has been introduced after 2.40? Thanks
    (also fails for me with the latest version 2.13)
    Guillaume Klein
    @guillaumekln
    Hi. I'm not aware of this regression, but it's possible. Can you find the first version between 2.4 and 2.7 that stopped working for this option?
    Jordi Mas
    @jordimas
    I will. Give me some hours since I will not be able to focus on this until weekend. Thanks
    Damien Daspit
    @ddaspit
    I am trying to understand exactly how effective_batch_size works. The auto config for a Transformer model is effective_batch_size: 25000 and batch_size: 3072. This means that 9 iterations are required to accumulate the gradients to reach a batch size of 25000 on a single GPU. So does that mean that the actual effective batch size is 3072 * 9 = 27648? If this is true, then I would expect that if I set batch_size to 8192, the actual effective batch size would be 8192 * 4 = 32768. This feels like enough of a difference in effective batch size that it would have an impact on training. Is this accurate?
    Guillaume Klein
    @guillaumekln
    Yes. It simply finds the first multiple of batch_sizethat is greater than or equal to effective_batch_size. It's true that it can overshoot the requested effective batch size in some cases.
    We typically want to avoid changing the user provided batch_size since increasing it would result in OOM and decreasing it would result in under utilization of compute resources.
    Guillaume Klein
    @guillaumekln
    Maybe we can add a flag to allow the system to change the batch size and make the effective batch size more accurate.
    Anna Samiotou
    @annasamt
    Hello. We'd like to experiment with guided alignments for TransformerBig model. Is the expected beneficial impact of training the model with guided alignment related to efficiency, performance or quality? Having read https://opennmt.net/OpenNMT-tf/alignments.html#alignments, https://forum.opennmt.net/t/guided-alignment-and-weighted-datasets/4084 and https://forum.opennmt.net/t/opennmt-tf-how-to-use-alignments-and-phares-tables/2209/5, we understand that guided alignment constraints the attention to be an alignment, that it could be used by decoding features for replacing unk (however since we use subword encoding it doesn't seem needed for this). and that alignment information can be retrieved for additional postprocessing. However, we do not yet have a clear understanding of the expected benefits. Thanks in advance.
    Guillaume Klein
    @guillaumekln
    Hi. The primary benefit is that one attention head can now be used as alignment. Guided alignment does not improve efficiency (it's actually less efficient) and I don't think there is evidence that it improves quality in any way. If you don't need the model to produce alignments, then you probably don't need to use guided alignment.
    Anna Samiotou
    @annasamt
    Thanks. Indeed we had experimented with guided alignment in some older models, without achieving any added-value. But we wanted to test it with our current, improved models as well. For the primary benefit you mentioned, I understand that by using an attention head as alignment, and as explicit alignment may help in determining which target words were generated by which source words, this could result to better attention weights and therefore better performance, no?
    Guillaume Klein
    @guillaumekln
    Maybe it can improve performance but I don't have any numbers to confirm or deny this. My guess is that it does not help, mostly because it only constrains a single attention head out of 96 possible source-target attention heads in the TransformerBig decoder.
    Anna Samiotou
    @annasamt
    Many thanks for your input, Guillaume.
    Gerardo Cervantes
    @gcervantes8
    Hello, the company I work for tasked me with converting some OpenNMT models to TensorFlow Lite. I was able to get the NMTSmallV1, NMTMediumV1, NMTBigV1, and Luong Attention models working, and I recently got approval to release the code. I plan to do a pull request tomorrow with the changes in case this is something you want to include in your codebase.
    Daniel Marín
    @dmar1n
    Hi! When I create joint vocabularies (i.e. same vocabulary for source and target) and, in the training config. I point to the same vocabulary file (as recommended in the documentation for shared embeddings), the training performance drops to half compared to the performance reached when using separate vocabularies (in terms of words/sec). Even if I augment the batch size, I don't manage to get the same performance as before. Is it normal? I would expect at least the same words/s ratio, or at least some GPU memory gain, but it does not seem to be the case...
    Guillaume Klein
    @guillaumekln
    Hi. That's strange. Are you using the latest version? How are you running the training? Mixed precision, multi-GPU?
    Daniel Marín
    @dmar1n
    Yes, I use mixed precision with 4 gpus (the memory is fully used in all GPUs). I'm using TF 2.3 and OpenNMT-tf 2.17.
    I also use the runner with auto_config (train defaults are mostly unchanged)
    Guillaume Klein
    @guillaumekln
    Do you mind opening an issue on the GitHub repository?
    Daniel Marín
    @dmar1n
    Sure, I will try to add further info there. Thanks
    Guillaume Klein
    @guillaumekln
    Thanks
    Daniel Marín
    @dmar1n
    Hi @guillaumekln, I created the issue in GitHub, but I might close it and open a new one. After further tests, it seems the performance drop is related to the combination of TF 2.3 and OpenNMT 2.17 (at least, just downgrading to OpenNMT 2.15 seems to fix the problem). In a different machine, I had tested the TF 2.4 and OpenNMT 2.17, and there was even a performance increase of around 15% (as other users had pointed out), but it seems that with TF 2.3 there is an issue.
    Guillaume Klein
    @guillaumekln
    Ok. I'm trying with TensorFlow 2.3 and OpenNMT-tf 2.17 but I don't reproduce the issue. You might need to include more information such as the training configuration, model definition, and full training logs
    Daniel Marín
    @dmar1n
    I have updated the GitHub issue. I confirm the problem is not related to the shared embeddings, as I originally thought, but to the upgrade to OpenNMT 2.17 under TensorFlow 2.3.1. With the exact same configuration and OpenNMT 2.16, the performance is restored. Let me know if you need more info.
    alrudak
    @alrudak
    Hello, we try to run CTranslate2 on A100 GPU and get errors:
    result = self._model.translate_batch(batch_tokens)
    RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED
    does anyone know how to fix it ?
    | NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2
    Guillaume Klein
    @guillaumekln
    Hi. How did you convert the model and build the Translator object? Can you post these details in this issue: OpenNMT/CTranslate2#414
    alrudak
    @alrudak

    Before CUBLAS_NOT_SUPPORTED we got "Out of memory” error.

    We run 2 models, each of 300mb. But in Nvidia-SMI I saw that only 1GB of 40GB is used and then get “out of memory"

    To convert models to CTRanslate I used that command (to create 8 bit models)

    ct2-opennmt-tf-converter --model_path INPUT_ONMT_MODEL_DIR --model_spec TransformerBig --output_dir OUTPUT_DIR --quantization int8

    May be it’s because of “fabric manager”, that need to run with A100 GPU ?

    https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-460-32-03/index.html

    The Translator object creates like:

    model = ctranslate2.Translator(path, device=DEVICE)

    The package version:

    opennmt/ctranslate2:latest-ubuntu18-cuda11.0

    I posted these details to #414 thread on Github, but it marked as closed. Just want to make sure that someone will look on that deeper
    Guillaume Klein
    @guillaumekln
    Thanks for the info. I reopened the issue. We'll continue the discussion there.
    Gerardo Cervantes
    @gcervantes8
    Is it normal for a Transformer model to take much longer to train; 0.06 steps per second for Transformer vs. 3.15 steps per second for NMTBigV1? I am also running with TensorFlow version 2.1 so that could be why I'm getting this
    Guillaume Klein
    @guillaumekln
    What are the reported source and target tokens per second? These are better to compare performance.
    Gerardo Cervantes
    @gcervantes8
    For NMTBigV1 I get around 2800, for Transformer I get around 1205. I'm getting very similar numbers between source and target words per second
    Gerardo Cervantes
    @gcervantes8
    I'm noticing that when I do mixed precision with Transformers, the words per second for source and target jumps from 1205 to around 3150. But the steps per second is just slightly lower at 0.05 steps per second. I'm a little bit confused of the difference between words per second and steps per second, I'm still trying to understand that difference.
    Guillaume Klein
    @guillaumekln
    I think you set a very small batch size for the Transformer and one step accumulates many batches, hence the low number of steps per second.
    Gerardo Cervantes
    @gcervantes8
    Interesting. I will try with a bigger batch size, the batch size I'm using now is 64. Thank you.
    Gerardo Cervantes
    @gcervantes8
    Running with batch size of 512 and with mixed precision gave me a words per second of 6300, it also increased the steps per second to 0.22 which is much faster! Is this closer to the speed I should expect when training with a Transformer model?
    Guillaume Klein
    @guillaumekln
    Are you using auto_config? If yes, the Transformer batch size is defined in number of tokens, not examples. The default value is 3072 for example. You can also set batch_size=0 and the training will select a batch size for you.
    Gerardo Cervantes
    @gcervantes8
    I am doing auto_config. That may be why! I'll try with batch size 0 and report the speed
    Gerardo Cervantes
    @gcervantes8
    I tried batch size of 0 but got out of memory errors, I reduced the size of the vocabularies and tried batch size of 64 with batch type as examples and got around 8100 source and target words per second with 0.022 steps per second
    Guillaume Klein
    @guillaumekln
    I suggest sticking with batch type as tokens for Transformers. Otherwise you also need to override the effective_batch_size parameter that controls how many batches are accumulated.
    If you are getting started with Transformer models, you could just remove all the parameters that you defined (except the data) and use auto_config. See for example the quickstart: https://opennmt.net/OpenNMT-tf/quickstart.html
    Gerardo Cervantes
    @gcervantes8
    Thanks! Using Auto-config boosted the steps per second to about 0.35.