Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Guillaume Klein
    @guillaumekln
    There is no tutorial for a web interface. This is not in the scope of the project.
    yutongli
    @yutongli
    Hi @guillaumekln I have trained a good transformer model but I saw some lengthy inference examples. for example, the source query is 10-characters long, but the generated inference by the model can be 200 characters, some portions are just repeating. Is there a way to control the length of the inference to be generated by the transformer model?
    (BTW, I had the sequence_length set to 80 during training. )
    Guillaume Klein
    @guillaumekln
    Hi. Look for maximum_decoding_length in the parameters: https://opennmt.net/OpenNMT-tf/configuration.html. Maybe length_penalty can also help.
    However, when this issue happens it usually means the input was unexpected for the model and the training may lack this type of examples.
    yutongli
    @yutongli
    Hi @guillaumekln , thanks so much for your response, it's very helpful!! Just a couple of quick follow-up questions: 1. I think the maximum_decoding_length and length_penalty are parameters affecting both training and inferencing, am I correct? 2. does the 'length' here refer to the number of characters, the number or subwords, or the number of words?
    Guillaume Klein
    @guillaumekln
    1. They are inference only parameters. 2. The length is the number of tokens, as defined by your tokenization.
    yutongli
    @yutongli
    I see, thanks so much!
    @guillaumekln for another parameter 'maximum_features_length', does the length here also refer to the number of tokens, as defined by the tokenization?
    Guillaume Klein
    @guillaumekln
    Yes, all length parameters are defined in terms of number of tokens.
    yutongli
    @yutongli
    Thank you @guillaumekln !!
    yutongli
    @yutongli
    Hi @guillaumekln I've trained a Transformer model using onmt and have also run infer command to do the inference. all went good. now I have a requirement to bring the model into production which supports tensorflow, so I exported the model into TF SavedModel format (https://opennmt.net/OpenNMT-tf/serving.html). However for the inferencing part, some beam or decoder related parameters may affect the inference latency, such as beam_width, n_best, maximum_decoding_length, how I could incorporate these onmt parameters given an exported TF SavedModel package, in the tensorflow production environment? onmt inferencing refers to a config file where we specify those parameters, would a similar config file needed for doing inference given a SavedModel on a machine where only tensorflow is supported?
    Thanks in advance!
    Guillaume Klein
    @guillaumekln
    Hi. The SavedModel format is a frozen computation graph so you can't change these values. Usually you tune these parameters using the infer command, before running the export.
    yutongli
    @yutongli
    Thanks a lot!
    yutongli
    @yutongli
    Hi @guillaumekln May I have a simple question to confirm with you? For the Transformer model, is beam_width=1 actually the greedy search mode? (meaning no beam search)
    Guillaume Klein
    @guillaumekln
    Hi. That is correct.
    yutongli
    @yutongli
    One additional question @guillaumekln :) I read the comment for the parameter 'sampling_topk': # (optional) Sample predictions from the top K most likely tokens (requires beam_width to 1). If 0, sample from the full output distribution (default: 1).
    My question is: With greedy search model, how should I set "sampling_topk" to maximize the speed?
    Just should be the default value 1, right?
    yutongli
    @yutongli
    Can mixed precision be used in exporting ONMT model to TF SavedModel? or it's only for training?
    yutongli
    @yutongli
    I searched the mixed precision a little bit, seems to me that it's only for training
    Guillaume Klein
    @guillaumekln
    sampling_topk is not about speed but to produce random outputs. Is that what you want to do? If not, you should not set this parameter.
    yutongli
    @yutongli
    thanks! @guillaumekln
    Kristine Mae M. Adlaon
    @kadlaon
    Hi. How can I extract the embeddings of my source and target data after training the model?
    Guillaume Klein
    @guillaumekln
    Kristine Mae M. Adlaon
    @kadlaon
    image.png
    Thank you @guillaumekln ! Got it.
    Not sure though about the one in the image. I printed the size of my vocab and the embedding shape. What could be this difference? 24000 and 24001
    Guillaume Klein
    @guillaumekln
    Your vocabulary size is 24000. The embedding contains an additional index for all tokens that are not in your vocabulary (also know as the UNK token). Hence the embedding size is 24001.
    Kristine Mae M. Adlaon
    @kadlaon
    Oh right! Sorry I forgot about the <unk>. Thank you again! :)
    Hung Nguyen
    @hungns135_gitlab
    I would like to treat unknown words like Names as UNKs and they should be replaced by their sources. I do set replace_unknown_target to True. However, the result seems not like what I expects, all Names replaces by some word which are not correct. Am I missing something? I don't see <UNK> in my vocab as well as the prediction. Is it normal ? Thank you
    Guillaume Klein
    @guillaumekln
    replace_unknown_target uses the model attention to select the corresponding source token. However, it is well known that Transformer attention usually can not be used as target-source alignments. You should either constrain the attention to be an alignment or use subword tokenization (like SentencePiece) to avoid UNK. Note that the UNK token does not appear in the vocab but is automatically added when starting the training.
    Hung Nguyen
    @hungns135_gitlab
    Thank you.
    xmart-sol
    @xmart-sol
    I continue to train a Transformer in a 'train' mode. However, it keeps averaging latest checkpoints and stop there instead of continue to train. How can I overcome this?
    Guillaume Klein
    @guillaumekln
    You probably need to increase max_step in the training parameters. There should be a warning about this somewhere in the logs. We just improved that for the next version: a more visible error message will be shown, see OpenNMT/OpenNMT-tf@21df1c7
    xmart-sol
    @xmart-sol
    Got it! Thanks
    Memduh Gökırmak
    @MemduhG
    Is pyonmttok still unsupported on mac?
    Ive tried both with normal pip install and with downloading and pip installing the wheels available on pypi
    Guillaume Klein
    @guillaumekln
    As you can see, there are only wheels for Linux: https://pypi.org/project/pyonmttok/#files
    alrudak
    @alrudak
    What parameter should I use to run instance on specific GPU ? (0,1,2, etc)
    Jordi Mas
    @jordimas
    Hello
    I'm using SetentencePiece as tokenizer to train an OpenNMT model
    I will like that when I ask the model to translate something in upper case "HELLO" is able to preserve the case in the translation.
    I was expecting this to be a configuration of the Tokenizer but I have not been able to found it. Any help or hint is appreciated.
    Guillaume Klein
    @guillaumekln
    Hi. You can either add training examples in uppercase or look into the case_markup option from the Tokenizer.
    Jordi Mas
    @jordimas
    Thanks!
    Jordi Mas
    @jordimas
    Hello
    I updated from OpenNMT 2.40 to 2.70 and "replace_unknown_target=True" has stop working. And now I get <unk> tag instead of the source tags for words of vocabulary. Is possible that a regression has been introduced after 2.40? Thanks
    (also fails for me with the latest version 2.13)
    Guillaume Klein
    @guillaumekln
    Hi. I'm not aware of this regression, but it's possible. Can you find the first version between 2.4 and 2.7 that stopped working for this option?
    Jordi Mas
    @jordimas
    I will. Give me some hours since I will not be able to focus on this until weekend. Thanks
    Damien Daspit
    @ddaspit
    I am trying to understand exactly how effective_batch_size works. The auto config for a Transformer model is effective_batch_size: 25000 and batch_size: 3072. This means that 9 iterations are required to accumulate the gradients to reach a batch size of 25000 on a single GPU. So does that mean that the actual effective batch size is 3072 * 9 = 27648? If this is true, then I would expect that if I set batch_size to 8192, the actual effective batch size would be 8192 * 4 = 32768. This feels like enough of a difference in effective batch size that it would have an impact on training. Is this accurate?
    Guillaume Klein
    @guillaumekln
    Yes. It simply finds the first multiple of batch_sizethat is greater than or equal to effective_batch_size. It's true that it can overshoot the requested effective batch size in some cases.
    We typically want to avoid changing the user provided batch_size since increasing it would result in OOM and decreasing it would result in under utilization of compute resources.