Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Guillaume Klein
    @guillaumekln
    Is it the complete error log?
    Michael A. Martin
    @mmartin9684-sil
    Yes, it is.
    The only update in the Pipfile was the OpenNMT-tf package. The same error occurs with the most recent release (2.9.1), as well as with the prior 2.8.1 release.
    Guillaume Klein
    @guillaumekln
    The package mecab-python3 is a new dependency of sacrebleu. I see that they don't publish packages for Windows and this likely why you are seeing an error. I will look to pin sacrebleu to a previous version. In the meantime you could try to manually install mecab-python3.
    Michael A. Martin
    @mmartin9684-sil
    Thank you for this feedback. It seems that sacrebleu 1.4.4 doesn't have the dependency on mecab-python3, so using that older release works around this issue. Many thanks!
    arunnambiar27
    @arunnambiar27
    Can anyone help with how to create this python prog OpenNMT-tf/third_party/learn_bpe.py ? I am new to opennm and trying out default module
    Also how can i give my own data to translate?
    Guillaume Klein
    @guillaumekln
    learn_bpe can be found here: https://github.com/rsennrich/subword-nmt/
    To use your own data, look at the quickstart and replace the downloaded files by your own: https://opennmt.net/OpenNMT-tf/quickstart.html
    arunnambiar27
    @arunnambiar27
    Thank you @guillaumekln . Can you specify which files have to be replaced ?
    Also, How to use the pre-trained english-german dictionary model provided in openNMT-tf
    arunnambiar27
    @arunnambiar27
    How to stop the training after some checkpoint?
    Guillaume Klein
    @guillaumekln
    Looks like you have lots of questions. I suggest that you open a topic on the forum so that it is easier to answer.
    yutongli
    @yutongli
    Hi @guillaumekln , could I have a quick question about the inference. I've trained a good transformer model, and previously I ran the inference(about 23million datapoints; batch size 64) against the model, the inference job went smoothly, though it took maybe 8-10 hours. Now I am running inference with a larger data set(150million datapoints), the job got killed after running roughly 26 hours. Since the inference just does sequential processing, why larger data set would cause an crash after running longer time? Anything specific that I should pay attention to for inference?
    Guillaume Klein
    @guillaumekln
    The inference does not filter the data. So the first thing to check is that if you have very long sentences in your data that can cause out of memory issues.
    yutongli
    @yutongli
    @guillaumekln Thanks for your feedback. Actually I normalized/filtered the data before inference, so number of characters of each data point was between 3 and 70, inclusive. I checked the cpu and memory usage and found that for the inference job, the cpu maintains 95-110% usage and memory is always around 50%. Only a single inference job runs on the node (the node has 32-core cpu, 8 gpus, 128G memory). Any clue?
    it seems that the inference is actually taken care by CPU only, are we able to run inference with GPUs? like I did for the transformer model training with all the 8 GPUs on the node
    yutongli
    @yutongli
    BTW, I also noticed that you mentioned in a closed topic that 'You could instead split your file and run separate inference processes to leverage multiple GPUs.'. I also splitted the large data set to be 6 pieces, so each contains about 25m data entries. But starting the 2nd inference job on the same node threw exception and failed. Is there anything I missed which I should specify to leverage multiple GPUs to run inference?
    Renjith Sasidharan
    @renjithsasidharan
    Hello @guillaumekln, I wanted to ask about the effectiveness of transformer model on small dataset(200K). I have been training a small transformer model (1 layer, 512 dim, 4 heads). I am trying to extract amount, date from OCR text from receipts, so the source sentences are very long(~500 words) and target sentences are just one word. I have run the training for 100K iterations, but the loss seems very high (~1.5). Should I keep running it for longer? Is a transformer model as effective as an RNN on a small dataset like mine?
    Guillaume Klein
    @guillaumekln
    @yutongli Most likely the inference is running on GPU otherwise you would have a higher CPU usage. If you want to run multiple inference jobs, you should restrict the GPU visibility for each process with CUDA_VISIBLE_DEVICES (you'll find more info on Google). As for the original issue (killed job), is the memory usage increasing?
    @renjithsasidharan Hi. I see that you posted the same question on the forum. Let's continue the discussion there.
    NL
    @nslatysheva
    Hey @guillaumekln, I'm interested in understanding translation errors made by trained models, specifically by (1) looking at attention weights from transformer heads and (2) finding training examples with similar hidden state vectors (I can compute similarity myself, just need to know how to access the raw numbers at different parts of the network). Any advice? :)
    Guillaume Klein
    @guillaumekln
    You probably need to dive into the model code and place print statements when you need them. Just remember that the model is executed in graph mode so you need to use TensorFlow print function: https://www.tensorflow.org/api_docs/python/tf/print
    NL
    @nslatysheva
    thanks, will dive in :) just curious, does there exist any overview/presentation/tutorial as an intro to the code structure?
    yutongli
    @yutongli
    @guillaumekln Thanks for getting back to me. I monitored the CPU and memory usage for the inference job for some time and the CPU is around 150%, and memory is about 10%. Does this mean the job is running on GPU? How high could indicate the job running on CPU?
    Guillaume Klein
    @guillaumekln
    You can use nvidia-smi to see processes running on the GPU. If it was running on the CPU, I think TensorFlow would be using all CPU cores by default.
    @nslatysheva There is no such tutorial, but the code is not that big.
    yutongli
    @yutongli
    @guillaumekln Thanks very much! After some research, I managed to make the inference job to only run on CPUs, by controlling the GPU Visibility via Nvidia CUDA environment variable. (Now the CPU usage shows ~2700%, other than 150% previously. Also the GPU usage remains 0% per monitoring.) However, the inference output(predictions) does not seem to be dumped gradually and incrementally. It seems that the job keeps working hard behind the scene, holding output in memory for a very long time, without dumping at a regular pace. (Per my observation, the regular dumping happens in the last 2 hours before the job completed, given the duration of the entire job is about 30 hours). Can we specify any parameters to control the dumping during inference? if so, would that speed up the entire processing?
    Guillaume Klein
    @guillaumekln
    You can control this behavior but disabling it will actually make the overall decoding slower. See the parameter infer > length_bucket_width in https://opennmt.net/OpenNMT-tf/configuration.html. It is set to 5 with auto_config but you can disable it with 0.
    Sirogha
    @Sirogha
    Hello. I try to train en-ru model with sentencepiece mode.
    When i completed build vocab with BPE mode, so i did't find letter Z. It's strange, because this letter appear more than 2 million times in my source. How it can be?
    Memduh Gökırmak
    @MemduhG
    I'm getting this error when I try to run onmt-build-vocab:
    AttributeError: module 'tensorflow_core._api.v2.random' has no attribute 'Generator'
    Guillaume Klein
    @guillaumekln
    @MemduhG What TensorFlow version do you have installed?
    @Sirogha How did you look for the letter in the vocabulary?
    yutongli
    @yutongli
    @guillaumekln I have trained a transformer model using opennmt-tf and want to serve it in production for real time inference. I am considering https://github.com/OpenNMT/CTranslate2, is Intel MKL the minimum requirement for building CTranslate2? If so and we end up with not being able to have CTranslate2 in production environment, would you please advise anything else, all i want to target is to bring the trained transformer model into production environment, so anything that can better the real time inference will be highly helpful! Thanks
    yutongli
    @yutongli
    btw, the production is c++ environment
    Guillaume Klein
    @guillaumekln
    Yes, CTranslate2 only requires Intel MKL for CPU translation. It seems to be exactly what you need.
    yutongli
    @yutongli
    Thanks!
    Soumya Chennabasavaraj
    @soumyacbr
    I have trained a transformer model, Now I'm doing the inference. But the inference is stuck after translating some 20 sentences. what could be the problem ? has anyone faced this?. Plus it does not even throw any error. Its just stick after translating 20th sentence.
    Guillaume Klein
    @guillaumekln
    You should probably just let it run. The test file is reordered internally to increase efficiency.
    Soumya Chennabasavaraj
    @soumyacbr
    Yes I left it to run. Finally it did run. Thanks
    Anna Samiotou
    @annasamt_twitter
    Hello, does OpenNMT-tf support protected sequences/placeholders i.e. ⦅URL:http://www.opennmt.net⦆as described in https://opennmt.net/OpenNMT/tools/tokenization/#special-characters? Provided that SP/BPE or unigram is deployed through OpenNTM tokenizer. Thanks in advance
    Guillaume Klein
    @guillaumekln
    Hi, you would need to remove the value part (:http://www.opennmt.net in this example) before calling OpenNMT-tf. The remaining part ⦅URL⦆ will be treated as any other tokens during training/inference.
    Anna Samiotou
    @annasamt_twitter
    OK, thanks.
    Yunès
    @jbyunes
    Hi, I tried to tag (featured) both source and target languages with OpenNMT but didn't succeed. PyTorch just crashed at translation step (training was ok); and TensorFlow docs says that "target_inputter – A opennmt.inputters.Inputter to process the target data. Currently, only the opennmt.inputters.WordEmbedder is supported." which means (to me) that we can't tag the target. How could I try to tag both source and target with OpenNMT ? Is this possible ? May someone help me ?
    Guillaume Klein
    @guillaumekln
    Hi, tags are not supported on the target side in OpenNMT-tf. In OpenNMT-py, there is an open pull request for this feature: OpenNMT/OpenNMT-py#1710
    Yunès
    @jbyunes
    Thanks @guillaumekln. In OpenNMT-py what puzzled me is that the training was perfect (at least no error) but at the translation step it seems that the crash was due to a non uniform word added by the system at the end of the sentence (looks like a end-of-sentence word that is of course? non featured).
    Yunès
    @jbyunes
    @guillaumekln Is it a non-sense to imagine tagging by adding suffix to original words? Do you know if that makes sense? I mean "go_VERB" for example? This would increase the vocab size but does it makes sense or some internals may disturb the processes?
    Guillaume Klein
    @guillaumekln
    As you pointed out the biggest issue is the vocabulary, but if it fits, the model will manage to make sense of your inputs. Alternatively, if your sequences are not too long you could simply merge the 2 streams, e.g. : John NNP goes VRB
    Yunès
    @jbyunes
    @guillaumekln why not ? I will try something like that. Interesting.
    alrudak
    @alrudak
    Is it possible to run one language pair (docker container ) on the first GPU and another language pair on second GPU separately if the server has 2 GPUs ?
    Guillaume Klein
    @guillaumekln
    Sure. Just run 2 separate training processes, each running on a different GPU. You can restrict the GPU visible to each process with this approach: https://opennmt.net/OpenNMT-tf/faq.html#how-can-i-restrict-the-tensorflow-runtime-to-specific-gpu
    alrudak
    @alrudak
    Thanks!