Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Kristine Mae M. Adlaon
    @kadlaon
    Not sure though about the one in the image. I printed the size of my vocab and the embedding shape. What could be this difference? 24000 and 24001
    Guillaume Klein
    @guillaumekln
    Your vocabulary size is 24000. The embedding contains an additional index for all tokens that are not in your vocabulary (also know as the UNK token). Hence the embedding size is 24001.
    Kristine Mae M. Adlaon
    @kadlaon
    Oh right! Sorry I forgot about the <unk>. Thank you again! :)
    Hung Nguyen
    @hungns135_gitlab
    I would like to treat unknown words like Names as UNKs and they should be replaced by their sources. I do set replace_unknown_target to True. However, the result seems not like what I expects, all Names replaces by some word which are not correct. Am I missing something? I don't see <UNK> in my vocab as well as the prediction. Is it normal ? Thank you
    Guillaume Klein
    @guillaumekln
    replace_unknown_target uses the model attention to select the corresponding source token. However, it is well known that Transformer attention usually can not be used as target-source alignments. You should either constrain the attention to be an alignment or use subword tokenization (like SentencePiece) to avoid UNK. Note that the UNK token does not appear in the vocab but is automatically added when starting the training.
    Hung Nguyen
    @hungns135_gitlab
    Thank you.
    xmart-sol
    @xmart-sol
    I continue to train a Transformer in a 'train' mode. However, it keeps averaging latest checkpoints and stop there instead of continue to train. How can I overcome this?
    Guillaume Klein
    @guillaumekln
    You probably need to increase max_step in the training parameters. There should be a warning about this somewhere in the logs. We just improved that for the next version: a more visible error message will be shown, see OpenNMT/OpenNMT-tf@21df1c7
    xmart-sol
    @xmart-sol
    Got it! Thanks
    Memduh Gökırmak
    @MemduhG
    Is pyonmttok still unsupported on mac?
    Ive tried both with normal pip install and with downloading and pip installing the wheels available on pypi
    Guillaume Klein
    @guillaumekln
    As you can see, there are only wheels for Linux: https://pypi.org/project/pyonmttok/#files
    alrudak
    @alrudak
    What parameter should I use to run instance on specific GPU ? (0,1,2, etc)
    Jordi Mas
    @jordimas
    Hello
    I'm using SetentencePiece as tokenizer to train an OpenNMT model
    I will like that when I ask the model to translate something in upper case "HELLO" is able to preserve the case in the translation.
    I was expecting this to be a configuration of the Tokenizer but I have not been able to found it. Any help or hint is appreciated.
    Guillaume Klein
    @guillaumekln
    Hi. You can either add training examples in uppercase or look into the case_markup option from the Tokenizer.
    Jordi Mas
    @jordimas
    Thanks!
    Jordi Mas
    @jordimas
    Hello
    I updated from OpenNMT 2.40 to 2.70 and "replace_unknown_target=True" has stop working. And now I get <unk> tag instead of the source tags for words of vocabulary. Is possible that a regression has been introduced after 2.40? Thanks
    (also fails for me with the latest version 2.13)
    Guillaume Klein
    @guillaumekln
    Hi. I'm not aware of this regression, but it's possible. Can you find the first version between 2.4 and 2.7 that stopped working for this option?
    Jordi Mas
    @jordimas
    I will. Give me some hours since I will not be able to focus on this until weekend. Thanks
    Damien Daspit
    @ddaspit
    I am trying to understand exactly how effective_batch_size works. The auto config for a Transformer model is effective_batch_size: 25000 and batch_size: 3072. This means that 9 iterations are required to accumulate the gradients to reach a batch size of 25000 on a single GPU. So does that mean that the actual effective batch size is 3072 * 9 = 27648? If this is true, then I would expect that if I set batch_size to 8192, the actual effective batch size would be 8192 * 4 = 32768. This feels like enough of a difference in effective batch size that it would have an impact on training. Is this accurate?
    Guillaume Klein
    @guillaumekln
    Yes. It simply finds the first multiple of batch_sizethat is greater than or equal to effective_batch_size. It's true that it can overshoot the requested effective batch size in some cases.
    We typically want to avoid changing the user provided batch_size since increasing it would result in OOM and decreasing it would result in under utilization of compute resources.
    Guillaume Klein
    @guillaumekln
    Maybe we can add a flag to allow the system to change the batch size and make the effective batch size more accurate.
    Anna Samiotou
    @annasamt
    Hello. We'd like to experiment with guided alignments for TransformerBig model. Is the expected beneficial impact of training the model with guided alignment related to efficiency, performance or quality? Having read https://opennmt.net/OpenNMT-tf/alignments.html#alignments, https://forum.opennmt.net/t/guided-alignment-and-weighted-datasets/4084 and https://forum.opennmt.net/t/opennmt-tf-how-to-use-alignments-and-phares-tables/2209/5, we understand that guided alignment constraints the attention to be an alignment, that it could be used by decoding features for replacing unk (however since we use subword encoding it doesn't seem needed for this). and that alignment information can be retrieved for additional postprocessing. However, we do not yet have a clear understanding of the expected benefits. Thanks in advance.
    Guillaume Klein
    @guillaumekln
    Hi. The primary benefit is that one attention head can now be used as alignment. Guided alignment does not improve efficiency (it's actually less efficient) and I don't think there is evidence that it improves quality in any way. If you don't need the model to produce alignments, then you probably don't need to use guided alignment.
    Anna Samiotou
    @annasamt
    Thanks. Indeed we had experimented with guided alignment in some older models, without achieving any added-value. But we wanted to test it with our current, improved models as well. For the primary benefit you mentioned, I understand that by using an attention head as alignment, and as explicit alignment may help in determining which target words were generated by which source words, this could result to better attention weights and therefore better performance, no?
    Guillaume Klein
    @guillaumekln
    Maybe it can improve performance but I don't have any numbers to confirm or deny this. My guess is that it does not help, mostly because it only constrains a single attention head out of 96 possible source-target attention heads in the TransformerBig decoder.
    Anna Samiotou
    @annasamt
    Many thanks for your input, Guillaume.
    Gerardo Cervantes
    @gcervantes8
    Hello, the company I work for tasked me with converting some OpenNMT models to TensorFlow Lite. I was able to get the NMTSmallV1, NMTMediumV1, NMTBigV1, and Luong Attention models working, and I recently got approval to release the code. I plan to do a pull request tomorrow with the changes in case this is something you want to include in your codebase.
    Daniel Marín
    @dmar1n
    Hi! When I create joint vocabularies (i.e. same vocabulary for source and target) and, in the training config. I point to the same vocabulary file (as recommended in the documentation for shared embeddings), the training performance drops to half compared to the performance reached when using separate vocabularies (in terms of words/sec). Even if I augment the batch size, I don't manage to get the same performance as before. Is it normal? I would expect at least the same words/s ratio, or at least some GPU memory gain, but it does not seem to be the case...
    Guillaume Klein
    @guillaumekln
    Hi. That's strange. Are you using the latest version? How are you running the training? Mixed precision, multi-GPU?
    Daniel Marín
    @dmar1n
    Yes, I use mixed precision with 4 gpus (the memory is fully used in all GPUs). I'm using TF 2.3 and OpenNMT-tf 2.17.
    I also use the runner with auto_config (train defaults are mostly unchanged)
    Guillaume Klein
    @guillaumekln
    Do you mind opening an issue on the GitHub repository?
    Daniel Marín
    @dmar1n
    Sure, I will try to add further info there. Thanks
    Guillaume Klein
    @guillaumekln
    Thanks
    Daniel Marín
    @dmar1n
    Hi @guillaumekln, I created the issue in GitHub, but I might close it and open a new one. After further tests, it seems the performance drop is related to the combination of TF 2.3 and OpenNMT 2.17 (at least, just downgrading to OpenNMT 2.15 seems to fix the problem). In a different machine, I had tested the TF 2.4 and OpenNMT 2.17, and there was even a performance increase of around 15% (as other users had pointed out), but it seems that with TF 2.3 there is an issue.
    Guillaume Klein
    @guillaumekln
    Ok. I'm trying with TensorFlow 2.3 and OpenNMT-tf 2.17 but I don't reproduce the issue. You might need to include more information such as the training configuration, model definition, and full training logs
    Daniel Marín
    @dmar1n
    I have updated the GitHub issue. I confirm the problem is not related to the shared embeddings, as I originally thought, but to the upgrade to OpenNMT 2.17 under TensorFlow 2.3.1. With the exact same configuration and OpenNMT 2.16, the performance is restored. Let me know if you need more info.
    alrudak
    @alrudak
    Hello, we try to run CTranslate2 on A100 GPU and get errors:
    result = self._model.translate_batch(batch_tokens)
    RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED
    does anyone know how to fix it ?
    | NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2
    Guillaume Klein
    @guillaumekln
    Hi. How did you convert the model and build the Translator object? Can you post these details in this issue: OpenNMT/CTranslate2#414
    alrudak
    @alrudak

    Before CUBLAS_NOT_SUPPORTED we got "Out of memory” error.

    We run 2 models, each of 300mb. But in Nvidia-SMI I saw that only 1GB of 40GB is used and then get “out of memory"

    To convert models to CTRanslate I used that command (to create 8 bit models)

    ct2-opennmt-tf-converter --model_path INPUT_ONMT_MODEL_DIR --model_spec TransformerBig --output_dir OUTPUT_DIR --quantization int8

    May be it’s because of “fabric manager”, that need to run with A100 GPU ?

    https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-460-32-03/index.html

    The Translator object creates like:

    model = ctranslate2.Translator(path, device=DEVICE)

    The package version:

    opennmt/ctranslate2:latest-ubuntu18-cuda11.0

    I posted these details to #414 thread on Github, but it marked as closed. Just want to make sure that someone will look on that deeper
    Guillaume Klein
    @guillaumekln
    Thanks for the info. I reopened the issue. We'll continue the discussion there.