Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Guillaume Klein
    @guillaumekln
    @newger18 This sounds odd. Are you training a Transformer model? What is the reported "words per second"?
    Rahaf Alnajjar
    @MaryaAI
    Thanks so much! @guillaumekln
    last question, If I want to change configuration. Can I put all configuration in data.yml ?? like this?
    "!onmt-main --model_type Transformer --config data.yml --auto_config train --with_eval"
    In this way it will take auto config or the configs in my data.yml file?
    Guillaume Klein
    @guillaumekln
    It will take both, but the values defined in the user YAML configuration takes priority over the automatic configuration.
    Rahaf Alnajjar
    @MaryaAI
    Thank you for your guidance
    sergei-from-vironit
    @sergei-from-vironit
    Hello i try to train TransformerBig model. WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 5008 vs previous value: 5008. And get warning You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
    What can be it?
    Guillaume Klein
    @guillaumekln
    This is just warning as a consequence of gradient accumulation (TensorFlow thinks we forgot to increase the training step). I suggest upgrading to V2.
    Milad Nozari
    @mnvoh
    Hello guys. Has anyone successfully used a trained onmt model in GoLang?
    Milad Nozari
    @mnvoh
    OK, so I resolved most of my issues and even got the output, but here's the thing. Apparently TF's bindings for Go doesn't support/implement GatherTree which means I have to set beam_width to 1 when exporting my model. The thing is that the result of beam_width set to 1 is utter nonsense compared to when beam_width is set to say 5.
    Any suggestions/tips/resources would be appreciated. Thanks
    Guillaume Klein
    @guillaumekln
    Yeah that's a limitation. I think the TensorFlow team is working on enabling dynamic loading of custom ops (such as GatherTree). In the meantime, you should get usable output with greedy search by training longer/on more data.
    Milad Nozari
    @mnvoh
    @guillaumekln Thanks for the answer man. Actually I'm at 10k steps of training, but you saying that reaching the end of training makes it better was a relief, because I was thinking about implementing GatherTree myself, which not only is beyond my expertise, I was afraid that that wasn't the only operation missing from Go's bindings.
    sergei-from-vironit
    @sergei-from-vironit
    Hello. I train model with opennmt-tf 1.24.0 version. And try to serving this model by tensorflow/serving:2.1.0-rc1-gpu. And get this kind error. What can be it?
    2020-01-08 09:57:52.021820: E external/org_tensorflow/tensorflow/core/grappler/optimizers/meta_optimizer.cc:561] function_optimizer failed: Not found: Op type not registered 'Addons>GatherTree' in binary running on model-serve-dev-fast-658d8d96b9-vqmdt. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
    2020-01-08 09:57:52.132840: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at partitioned_function_ops.cc:113 : Not found: Op type not registered 'Addons>GatherTree' in binary running on model-serve-dev-fast-658d8d96b9-vqmdt. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed
    Guillaume Klein
    @guillaumekln
    Hi,. So you exported your model using OpenNMT-tf 2.x? If yes, you should use a custom serving image opennmt/tensorflow-serving:2.0.0-gpu which includes additional op. See here for more info: https://github.com/OpenNMT/OpenNMT-tf/tree/master/examples/serving/tensorflow_serving#custom-tensorflow-serving-image
    sergei-from-vironit
    @sergei-from-vironit
    hm
    but i not set beam_width.
    Why i should create custom serving api?
    Guillaume Klein
    @guillaumekln
    The StackOverflow answer provides 2 solutions. Which one did you choose?
    sergei-from-vironit
    @sergei-from-vironit
    I choose 2.
    Now i try your tensorflow/serving-api and it is ok.
    Thanks!
    emesha92
    @emesha92

    Hi,. So you exported your model using OpenNMT-tf 2.x? If yes, you should use a custom serving image opennmt/tensorflow-serving:2.0.0-gpu which includes additional op. See here for more info: https://github.com/OpenNMT/OpenNMT-tf/tree/master/examples/serving/tensorflow_serving#custom-tensorflow-serving-image

    Is the openmt’s tf serving built using optimized version or not?

    Guillaume Klein
    @guillaumekln
    It uses the same build flags as the tensorflow/serving images.
    emesha92
    @emesha92
    ok noted, thanks @guillaumekln
    sergei-from-vironit
    @sergei-from-vironit
    Hello all. May be anybody know, why task for train is stuck?
    image.png
    ...is stuck on this row "If using Keras pass *_constraint arguments to layers."
    Guillaume Klein
    @guillaumekln
    Hi! The training is running on CPU. The latest OpenNMT-tf version uses TensorFlow 2.1 which requires CUDA 10.1.
    sergei-from-vironit
    @sergei-from-vironit
    And i try start train task on 2 video card (RTX2080), but when OpenNMT-tf start "init data" phase, so my PC rebooting.
    image.png
    hm
    may be you right.
    But i start train task with this command: " CUDA_VISIBLE_DEVICES=0 onmt-main --model_type TransformerBig --config config.yml --auto_config train --num_gpus 1"
    Ok. I see. Don't load cuda library. Thanks ^)
    sergei-from-vironit
    @sergei-from-vironit
    image.png
    So strange. My PC go to in reboot, while in saving checkpoint phase on 2 video card. May be any ideas?
    Guillaume Klein
    @guillaumekln
    I remember this could happen when the power supply is not powerful enough.
    sergei-from-vironit
    @sergei-from-vironit
    Can i limit GPU load ?
    sergei-from-vironit
    @sergei-from-vironit
    I load 2 task on cards (1 task on 1 card). And no restart.
    image.png
    May be some problem in parallel tasks?
    Guillaume Klein
    @guillaumekln
    Apparently you can try limiting the maximum power used by the cards with nvidia-smi -pl (I never used that so can't advise more).
    sergei-from-vironit
    @sergei-from-vironit
    Ok. Thanks, i try it
    sergei-from-vironit
    @sergei-from-vironit
    Hello. I try do two test: with 1kk dataset and 2kk data set. Do I have to change the configuration? (may be more steps if i use longer dataset)
    Guillaume Klein
    @guillaumekln
    Hi, if you want to compare the 2 results you probably want to keep the same configuration and the same number of training steps.
    sergei-from-vironit
    @sergei-from-vironit
    I did this test, and result very different.
    May be one dataset worse than second.
    But may be we need remake some parameters )
    Guillaume Klein
    @guillaumekln
    Better result with more data? This looks expected.