Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Guillaume Klein
    @guillaumekln
    sampling_topk is not about speed but to produce random outputs. Is that what you want to do? If not, you should not set this parameter.
    yutongli
    @yutongli
    thanks! @guillaumekln
    Kristine Mae M. Adlaon
    @kadlaon
    Hi. How can I extract the embeddings of my source and target data after training the model?
    Guillaume Klein
    @guillaumekln
    Kristine Mae M. Adlaon
    @kadlaon
    image.png
    Thank you @guillaumekln ! Got it.
    Not sure though about the one in the image. I printed the size of my vocab and the embedding shape. What could be this difference? 24000 and 24001
    Guillaume Klein
    @guillaumekln
    Your vocabulary size is 24000. The embedding contains an additional index for all tokens that are not in your vocabulary (also know as the UNK token). Hence the embedding size is 24001.
    Kristine Mae M. Adlaon
    @kadlaon
    Oh right! Sorry I forgot about the <unk>. Thank you again! :)
    Hung Nguyen
    @hungns135_gitlab
    I would like to treat unknown words like Names as UNKs and they should be replaced by their sources. I do set replace_unknown_target to True. However, the result seems not like what I expects, all Names replaces by some word which are not correct. Am I missing something? I don't see <UNK> in my vocab as well as the prediction. Is it normal ? Thank you
    Guillaume Klein
    @guillaumekln
    replace_unknown_target uses the model attention to select the corresponding source token. However, it is well known that Transformer attention usually can not be used as target-source alignments. You should either constrain the attention to be an alignment or use subword tokenization (like SentencePiece) to avoid UNK. Note that the UNK token does not appear in the vocab but is automatically added when starting the training.
    Hung Nguyen
    @hungns135_gitlab
    Thank you.
    xmart-sol
    @xmart-sol
    I continue to train a Transformer in a 'train' mode. However, it keeps averaging latest checkpoints and stop there instead of continue to train. How can I overcome this?
    Guillaume Klein
    @guillaumekln
    You probably need to increase max_step in the training parameters. There should be a warning about this somewhere in the logs. We just improved that for the next version: a more visible error message will be shown, see OpenNMT/OpenNMT-tf@21df1c7
    xmart-sol
    @xmart-sol
    Got it! Thanks
    Memduh Gökırmak
    @MemduhG
    Is pyonmttok still unsupported on mac?
    Ive tried both with normal pip install and with downloading and pip installing the wheels available on pypi
    Guillaume Klein
    @guillaumekln
    As you can see, there are only wheels for Linux: https://pypi.org/project/pyonmttok/#files
    alrudak
    @alrudak
    What parameter should I use to run instance on specific GPU ? (0,1,2, etc)
    Jordi Mas
    @jordimas
    Hello
    I'm using SetentencePiece as tokenizer to train an OpenNMT model
    I will like that when I ask the model to translate something in upper case "HELLO" is able to preserve the case in the translation.
    I was expecting this to be a configuration of the Tokenizer but I have not been able to found it. Any help or hint is appreciated.
    Guillaume Klein
    @guillaumekln
    Hi. You can either add training examples in uppercase or look into the case_markup option from the Tokenizer.
    Jordi Mas
    @jordimas
    Thanks!
    Jordi Mas
    @jordimas
    Hello
    I updated from OpenNMT 2.40 to 2.70 and "replace_unknown_target=True" has stop working. And now I get <unk> tag instead of the source tags for words of vocabulary. Is possible that a regression has been introduced after 2.40? Thanks
    (also fails for me with the latest version 2.13)
    Guillaume Klein
    @guillaumekln
    Hi. I'm not aware of this regression, but it's possible. Can you find the first version between 2.4 and 2.7 that stopped working for this option?
    Jordi Mas
    @jordimas
    I will. Give me some hours since I will not be able to focus on this until weekend. Thanks
    Damien Daspit
    @ddaspit
    I am trying to understand exactly how effective_batch_size works. The auto config for a Transformer model is effective_batch_size: 25000 and batch_size: 3072. This means that 9 iterations are required to accumulate the gradients to reach a batch size of 25000 on a single GPU. So does that mean that the actual effective batch size is 3072 * 9 = 27648? If this is true, then I would expect that if I set batch_size to 8192, the actual effective batch size would be 8192 * 4 = 32768. This feels like enough of a difference in effective batch size that it would have an impact on training. Is this accurate?
    Guillaume Klein
    @guillaumekln
    Yes. It simply finds the first multiple of batch_sizethat is greater than or equal to effective_batch_size. It's true that it can overshoot the requested effective batch size in some cases.
    We typically want to avoid changing the user provided batch_size since increasing it would result in OOM and decreasing it would result in under utilization of compute resources.
    Guillaume Klein
    @guillaumekln
    Maybe we can add a flag to allow the system to change the batch size and make the effective batch size more accurate.
    Anna Samiotou
    @annasamt
    Hello. We'd like to experiment with guided alignments for TransformerBig model. Is the expected beneficial impact of training the model with guided alignment related to efficiency, performance or quality? Having read https://opennmt.net/OpenNMT-tf/alignments.html#alignments, https://forum.opennmt.net/t/guided-alignment-and-weighted-datasets/4084 and https://forum.opennmt.net/t/opennmt-tf-how-to-use-alignments-and-phares-tables/2209/5, we understand that guided alignment constraints the attention to be an alignment, that it could be used by decoding features for replacing unk (however since we use subword encoding it doesn't seem needed for this). and that alignment information can be retrieved for additional postprocessing. However, we do not yet have a clear understanding of the expected benefits. Thanks in advance.
    Guillaume Klein
    @guillaumekln
    Hi. The primary benefit is that one attention head can now be used as alignment. Guided alignment does not improve efficiency (it's actually less efficient) and I don't think there is evidence that it improves quality in any way. If you don't need the model to produce alignments, then you probably don't need to use guided alignment.
    Anna Samiotou
    @annasamt
    Thanks. Indeed we had experimented with guided alignment in some older models, without achieving any added-value. But we wanted to test it with our current, improved models as well. For the primary benefit you mentioned, I understand that by using an attention head as alignment, and as explicit alignment may help in determining which target words were generated by which source words, this could result to better attention weights and therefore better performance, no?
    Guillaume Klein
    @guillaumekln
    Maybe it can improve performance but I don't have any numbers to confirm or deny this. My guess is that it does not help, mostly because it only constrains a single attention head out of 96 possible source-target attention heads in the TransformerBig decoder.
    Anna Samiotou
    @annasamt
    Many thanks for your input, Guillaume.
    Gerardo Cervantes
    @gcervantes8
    Hello, the company I work for tasked me with converting some OpenNMT models to TensorFlow Lite. I was able to get the NMTSmallV1, NMTMediumV1, NMTBigV1, and Luong Attention models working, and I recently got approval to release the code. I plan to do a pull request tomorrow with the changes in case this is something you want to include in your codebase.