Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Anna Samiotou
    @annasamt
    @guillaumekln Ah, OK, good point! I will try the multiple config option for inference. Thanks!
    Rahaf Alnajjar
    @MaryaAI
    Hi,
    I used to run OpenNMT-tf on GPU and
    Now I want to run it on CPU, Are there any restriction on that?? Does it work on CPU?
    Guillaume Klein
    @guillaumekln
    Hi, yes it works on CPU. It is just slower.
    Anna Samiotou
    @annasamt
    Hi, to run opennmt-tf inference on CPU only, do we use the model as such or is there a release option as in lua? Or is it enough to specify in the inference command --num_gpus 0?
    Guillaume Klein
    @guillaumekln
    If there is no GPU on the system, it will automatically use the CPU. There is no specific option.
    Anna Samiotou
    @annasamt
    So if the system does have at least one GPU there is no way to use CPU-only for the inference, right?
    Guillaume Klein
    @guillaumekln
    You can set the environment variable CUDA_VISIBLE_DEVICES, e.g. CUDA_VISIBLE_DEVICES= onmt-main ...
    Anna Samiotou
    @annasamt
    OK, I'll try using this env var without a value this time. Thx
    Also if the OpenNMT-tf model v2x is trained on GPU, can inference still be run on CPU-only?
    Guillaume Klein
    @guillaumekln
    Yes, checkpoints do not include any information regarding the training device.
    Anna Samiotou
    @annasamt
    OK great
    image.png
    I have a different question: While training, with omnt-tf v2x, I get this message in the log which does not hinder the training , however I wonder about its meaning/consequencies. Any idea?
    I read that this is probably due to the fact that the number of iterations performed on the dataset is greater than the number of batches in the dataset. Is this a problem? Or should I ignore the message?
    Guillaume Klein
    @guillaumekln
    You should ignore this message. It just means that we finished iterating on the dataset.
    Anna Samiotou
    @annasamt
    Thx
    newger
    @newger18
    Hi, @guillaumekln, I meet the same problem that @emesha92 mentioned. The GPU utils is very low,most of the time 0%. sample_buffer_size is 500000, gpu num is 8, Accumulate gradients of 2 iterations to reach effective batch size of 25000. I wonder if the preparation of data is too slow?
    Rahaf Alnajjar
    @MaryaAI
    @guillaumekln thanks
    Rahaf Alnajjar
    @MaryaAI
    I have another question, I want to save the output of translation to txt file but the documentation mentions that "The predictions will be printed on the standard output." and the output is printed as cell output on google colab any suggestion to that issue?
    Guillaume Klein
    @guillaumekln
    @MaryaAI See the command line option --predictions_file. Alternatively you can also redirect the standard output to a file with >.
    @newger18 This sounds odd. Are you training a Transformer model? What is the reported "words per second"?
    Rahaf Alnajjar
    @MaryaAI
    Thanks so much! @guillaumekln
    last question, If I want to change configuration. Can I put all configuration in data.yml ?? like this?
    "!onmt-main --model_type Transformer --config data.yml --auto_config train --with_eval"
    In this way it will take auto config or the configs in my data.yml file?
    Guillaume Klein
    @guillaumekln
    It will take both, but the values defined in the user YAML configuration takes priority over the automatic configuration.
    Rahaf Alnajjar
    @MaryaAI
    Thank you for your guidance
    sergei-from-vironit
    @sergei-from-vironit
    Hello i try to train TransformerBig model. WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 5008 vs previous value: 5008. And get warning You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
    What can be it?
    Guillaume Klein
    @guillaumekln
    This is just warning as a consequence of gradient accumulation (TensorFlow thinks we forgot to increase the training step). I suggest upgrading to V2.
    Milad Nozari
    @mnvoh
    Hello guys. Has anyone successfully used a trained onmt model in GoLang?
    Milad Nozari
    @mnvoh
    OK, so I resolved most of my issues and even got the output, but here's the thing. Apparently TF's bindings for Go doesn't support/implement GatherTree which means I have to set beam_width to 1 when exporting my model. The thing is that the result of beam_width set to 1 is utter nonsense compared to when beam_width is set to say 5.
    Any suggestions/tips/resources would be appreciated. Thanks
    Guillaume Klein
    @guillaumekln
    Yeah that's a limitation. I think the TensorFlow team is working on enabling dynamic loading of custom ops (such as GatherTree). In the meantime, you should get usable output with greedy search by training longer/on more data.
    Milad Nozari
    @mnvoh
    @guillaumekln Thanks for the answer man. Actually I'm at 10k steps of training, but you saying that reaching the end of training makes it better was a relief, because I was thinking about implementing GatherTree myself, which not only is beyond my expertise, I was afraid that that wasn't the only operation missing from Go's bindings.
    sergei-from-vironit
    @sergei-from-vironit
    Hello. I train model with opennmt-tf 1.24.0 version. And try to serving this model by tensorflow/serving:2.1.0-rc1-gpu. And get this kind error. What can be it?
    2020-01-08 09:57:52.021820: E external/org_tensorflow/tensorflow/core/grappler/optimizers/meta_optimizer.cc:561] function_optimizer failed: Not found: Op type not registered 'Addons>GatherTree' in binary running on model-serve-dev-fast-658d8d96b9-vqmdt. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
    2020-01-08 09:57:52.132840: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at partitioned_function_ops.cc:113 : Not found: Op type not registered 'Addons>GatherTree' in binary running on model-serve-dev-fast-658d8d96b9-vqmdt. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed
    Guillaume Klein
    @guillaumekln
    Hi,. So you exported your model using OpenNMT-tf 2.x? If yes, you should use a custom serving image opennmt/tensorflow-serving:2.0.0-gpu which includes additional op. See here for more info: https://github.com/OpenNMT/OpenNMT-tf/tree/master/examples/serving/tensorflow_serving#custom-tensorflow-serving-image
    sergei-from-vironit
    @sergei-from-vironit
    hm
    but i not set beam_width.
    Why i should create custom serving api?
    Guillaume Klein
    @guillaumekln
    The StackOverflow answer provides 2 solutions. Which one did you choose?
    sergei-from-vironit
    @sergei-from-vironit
    I choose 2.
    Now i try your tensorflow/serving-api and it is ok.
    Thanks!
    emesha92
    @emesha92

    Hi,. So you exported your model using OpenNMT-tf 2.x? If yes, you should use a custom serving image opennmt/tensorflow-serving:2.0.0-gpu which includes additional op. See here for more info: https://github.com/OpenNMT/OpenNMT-tf/tree/master/examples/serving/tensorflow_serving#custom-tensorflow-serving-image

    Is the openmt’s tf serving built using optimized version or not?

    Guillaume Klein
    @guillaumekln
    It uses the same build flags as the tensorflow/serving images.
    emesha92
    @emesha92
    ok noted, thanks @guillaumekln
    sergei-from-vironit
    @sergei-from-vironit
    Hello all. May be anybody know, why task for train is stuck?
    image.png
    ...is stuck on this row "If using Keras pass *_constraint arguments to layers."
    Guillaume Klein
    @guillaumekln
    Hi! The training is running on CPU. The latest OpenNMT-tf version uses TensorFlow 2.1 which requires CUDA 10.1.
    sergei-from-vironit
    @sergei-from-vironit
    And i try start train task on 2 video card (RTX2080), but when OpenNMT-tf start "init data" phase, so my PC rebooting.