Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    NL
    @nslatysheva
    Hey @guillaumekln, I'm interested in understanding translation errors made by trained models, specifically by (1) looking at attention weights from transformer heads and (2) finding training examples with similar hidden state vectors (I can compute similarity myself, just need to know how to access the raw numbers at different parts of the network). Any advice? :)
    Guillaume Klein
    @guillaumekln
    You probably need to dive into the model code and place print statements when you need them. Just remember that the model is executed in graph mode so you need to use TensorFlow print function: https://www.tensorflow.org/api_docs/python/tf/print
    NL
    @nslatysheva
    thanks, will dive in :) just curious, does there exist any overview/presentation/tutorial as an intro to the code structure?
    yutongli
    @yutongli
    @guillaumekln Thanks for getting back to me. I monitored the CPU and memory usage for the inference job for some time and the CPU is around 150%, and memory is about 10%. Does this mean the job is running on GPU? How high could indicate the job running on CPU?
    Guillaume Klein
    @guillaumekln
    You can use nvidia-smi to see processes running on the GPU. If it was running on the CPU, I think TensorFlow would be using all CPU cores by default.
    @nslatysheva There is no such tutorial, but the code is not that big.
    yutongli
    @yutongli
    @guillaumekln Thanks very much! After some research, I managed to make the inference job to only run on CPUs, by controlling the GPU Visibility via Nvidia CUDA environment variable. (Now the CPU usage shows ~2700%, other than 150% previously. Also the GPU usage remains 0% per monitoring.) However, the inference output(predictions) does not seem to be dumped gradually and incrementally. It seems that the job keeps working hard behind the scene, holding output in memory for a very long time, without dumping at a regular pace. (Per my observation, the regular dumping happens in the last 2 hours before the job completed, given the duration of the entire job is about 30 hours). Can we specify any parameters to control the dumping during inference? if so, would that speed up the entire processing?
    Guillaume Klein
    @guillaumekln
    You can control this behavior but disabling it will actually make the overall decoding slower. See the parameter infer > length_bucket_width in https://opennmt.net/OpenNMT-tf/configuration.html. It is set to 5 with auto_config but you can disable it with 0.
    Sirogha
    @Sirogha
    Hello. I try to train en-ru model with sentencepiece mode.
    When i completed build vocab with BPE mode, so i did't find letter Z. It's strange, because this letter appear more than 2 million times in my source. How it can be?
    Memduh Gökırmak
    @MemduhG
    I'm getting this error when I try to run onmt-build-vocab:
    AttributeError: module 'tensorflow_core._api.v2.random' has no attribute 'Generator'
    Guillaume Klein
    @guillaumekln
    @MemduhG What TensorFlow version do you have installed?
    @Sirogha How did you look for the letter in the vocabulary?
    yutongli
    @yutongli
    @guillaumekln I have trained a transformer model using opennmt-tf and want to serve it in production for real time inference. I am considering https://github.com/OpenNMT/CTranslate2, is Intel MKL the minimum requirement for building CTranslate2? If so and we end up with not being able to have CTranslate2 in production environment, would you please advise anything else, all i want to target is to bring the trained transformer model into production environment, so anything that can better the real time inference will be highly helpful! Thanks
    yutongli
    @yutongli
    btw, the production is c++ environment
    Guillaume Klein
    @guillaumekln
    Yes, CTranslate2 only requires Intel MKL for CPU translation. It seems to be exactly what you need.
    yutongli
    @yutongli
    Thanks!
    Soumya Chennabasavaraj
    @soumyacbr
    I have trained a transformer model, Now I'm doing the inference. But the inference is stuck after translating some 20 sentences. what could be the problem ? has anyone faced this?. Plus it does not even throw any error. Its just stick after translating 20th sentence.
    Guillaume Klein
    @guillaumekln
    You should probably just let it run. The test file is reordered internally to increase efficiency.
    Soumya Chennabasavaraj
    @soumyacbr
    Yes I left it to run. Finally it did run. Thanks
    Anna Samiotou
    @annasamt_twitter
    Hello, does OpenNMT-tf support protected sequences/placeholders i.e. ⦅URL:http://www.opennmt.net⦆as described in https://opennmt.net/OpenNMT/tools/tokenization/#special-characters? Provided that SP/BPE or unigram is deployed through OpenNTM tokenizer. Thanks in advance
    Guillaume Klein
    @guillaumekln
    Hi, you would need to remove the value part (:http://www.opennmt.net in this example) before calling OpenNMT-tf. The remaining part ⦅URL⦆ will be treated as any other tokens during training/inference.
    Anna Samiotou
    @annasamt_twitter
    OK, thanks.
    Yunès
    @jbyunes
    Hi, I tried to tag (featured) both source and target languages with OpenNMT but didn't succeed. PyTorch just crashed at translation step (training was ok); and TensorFlow docs says that "target_inputter – A opennmt.inputters.Inputter to process the target data. Currently, only the opennmt.inputters.WordEmbedder is supported." which means (to me) that we can't tag the target. How could I try to tag both source and target with OpenNMT ? Is this possible ? May someone help me ?
    Guillaume Klein
    @guillaumekln
    Hi, tags are not supported on the target side in OpenNMT-tf. In OpenNMT-py, there is an open pull request for this feature: OpenNMT/OpenNMT-py#1710
    Yunès
    @jbyunes
    Thanks @guillaumekln. In OpenNMT-py what puzzled me is that the training was perfect (at least no error) but at the translation step it seems that the crash was due to a non uniform word added by the system at the end of the sentence (looks like a end-of-sentence word that is of course? non featured).
    Yunès
    @jbyunes
    @guillaumekln Is it a non-sense to imagine tagging by adding suffix to original words? Do you know if that makes sense? I mean "go_VERB" for example? This would increase the vocab size but does it makes sense or some internals may disturb the processes?
    Guillaume Klein
    @guillaumekln
    As you pointed out the biggest issue is the vocabulary, but if it fits, the model will manage to make sense of your inputs. Alternatively, if your sequences are not too long you could simply merge the 2 streams, e.g. : John NNP goes VRB
    Yunès
    @jbyunes
    @guillaumekln why not ? I will try something like that. Interesting.
    alrudak
    @alrudak
    Is it possible to run one language pair (docker container ) on the first GPU and another language pair on second GPU separately if the server has 2 GPUs ?
    Guillaume Klein
    @guillaumekln
    Sure. Just run 2 separate training processes, each running on a different GPU. You can restrict the GPU visible to each process with this approach: https://opennmt.net/OpenNMT-tf/faq.html#how-can-i-restrict-the-tensorflow-runtime-to-specific-gpu
    alrudak
    @alrudak
    Thanks!
    Sirogha
    @Sirogha
    Hello. I try use Ctranslate2. Can you help me:
    translator.translate_batch([["▁H", "ello", "▁world", "!"]]) --- this function (translate_batch) async or i need parallel algorithm for using it?
    Guillaume Klein
    @guillaumekln
    Hi. This function is synchronous. To run multiple translations in parallel, you can set inter_threads to a larger value and call translate_batch from multiple Python threads.
    If you have more questions regarding CTranslate2, please post in https://gitter.im/OpenNMT/CTranslate2 or in the forum
    Sirogha
    @Sirogha
    Sorry. Thanks!
    rrifaldiu
    @rrifaldiu
    Hello, I want to ask some question about embedding. I wonder how does the Embedding from WordEmbedder class trained? I looked at the source code but still don't understand. And is it possible to export the embedding from the trained WordEmbedder class into the pretrained word embedding format? Thanks in advance
    Edit: As an additional information, I trained my model using the Transformer class
    Rahaf Alnajjar
    @MaryaAI
    Hi, Where I can find a simple web Interfaces source code to implement OpenNMT-tf model??
    Guillaume Klein
    @guillaumekln
    I'm not sure to understand. What are you looking for exactly?
    Rahaf Alnajjar
    @MaryaAI
    I want to deploy my model in a web interface (Are there any tutorials?)
    Guillaume Klein
    @guillaumekln
    There is no tutorial for a web interface. This is not in the scope of the project.
    yutongli
    @yutongli
    Hi @guillaumekln I have trained a good transformer model but I saw some lengthy inference examples. for example, the source query is 10-characters long, but the generated inference by the model can be 200 characters, some portions are just repeating. Is there a way to control the length of the inference to be generated by the transformer model?
    (BTW, I had the sequence_length set to 80 during training. )
    Guillaume Klein
    @guillaumekln
    Hi. Look for maximum_decoding_length in the parameters: https://opennmt.net/OpenNMT-tf/configuration.html. Maybe length_penalty can also help.
    However, when this issue happens it usually means the input was unexpected for the model and the training may lack this type of examples.
    yutongli
    @yutongli
    Hi @guillaumekln , thanks so much for your response, it's very helpful!! Just a couple of quick follow-up questions: 1. I think the maximum_decoding_length and length_penalty are parameters affecting both training and inferencing, am I correct? 2. does the 'length' here refer to the number of characters, the number or subwords, or the number of words?
    Guillaume Klein
    @guillaumekln
    1. They are inference only parameters. 2. The length is the number of tokens, as defined by your tokenization.
    yutongli
    @yutongli
    I see, thanks so much!
    @guillaumekln for another parameter 'maximum_features_length', does the length here also refer to the number of tokens, as defined by the tokenization?
    Guillaume Klein
    @guillaumekln
    Yes, all length parameters are defined in terms of number of tokens.
    yutongli
    @yutongli
    Thank you @guillaumekln !!
    yutongli
    @yutongli
    Hi @guillaumekln I've trained a Transformer model using onmt and have also run infer command to do the inference. all went good. now I have a requirement to bring the model into production which supports tensorflow, so I exported the model into TF SavedModel format (https://opennmt.net/OpenNMT-tf/serving.html). However for the inferencing part, some beam or decoder related parameters may affect the inference latency, such as beam_width, n_best, maximum_decoding_length, how I could incorporate these onmt parameters given an exported TF SavedModel package, in the tensorflow production environment? onmt inferencing refers to a config file where we specify those parameters, would a similar config file needed for doing inference given a SavedModel on a machine where only tensorflow is supported?