Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    sergei-from-vironit
    @sergei-from-vironit
    Now i try your tensorflow/serving-api and it is ok.
    Thanks!
    emesha92
    @emesha92

    Hi,. So you exported your model using OpenNMT-tf 2.x? If yes, you should use a custom serving image opennmt/tensorflow-serving:2.0.0-gpu which includes additional op. See here for more info: https://github.com/OpenNMT/OpenNMT-tf/tree/master/examples/serving/tensorflow_serving#custom-tensorflow-serving-image

    Is the openmt’s tf serving built using optimized version or not?

    Guillaume Klein
    @guillaumekln
    It uses the same build flags as the tensorflow/serving images.
    emesha92
    @emesha92
    ok noted, thanks @guillaumekln
    sergei-from-vironit
    @sergei-from-vironit
    Hello all. May be anybody know, why task for train is stuck?
    image.png
    ...is stuck on this row "If using Keras pass *_constraint arguments to layers."
    Guillaume Klein
    @guillaumekln
    Hi! The training is running on CPU. The latest OpenNMT-tf version uses TensorFlow 2.1 which requires CUDA 10.1.
    sergei-from-vironit
    @sergei-from-vironit
    And i try start train task on 2 video card (RTX2080), but when OpenNMT-tf start "init data" phase, so my PC rebooting.
    image.png
    hm
    may be you right.
    But i start train task with this command: " CUDA_VISIBLE_DEVICES=0 onmt-main --model_type TransformerBig --config config.yml --auto_config train --num_gpus 1"
    Ok. I see. Don't load cuda library. Thanks ^)
    sergei-from-vironit
    @sergei-from-vironit
    image.png
    So strange. My PC go to in reboot, while in saving checkpoint phase on 2 video card. May be any ideas?
    Guillaume Klein
    @guillaumekln
    I remember this could happen when the power supply is not powerful enough.
    sergei-from-vironit
    @sergei-from-vironit
    Can i limit GPU load ?
    sergei-from-vironit
    @sergei-from-vironit
    I load 2 task on cards (1 task on 1 card). And no restart.
    image.png
    May be some problem in parallel tasks?
    Guillaume Klein
    @guillaumekln
    Apparently you can try limiting the maximum power used by the cards with nvidia-smi -pl (I never used that so can't advise more).
    sergei-from-vironit
    @sergei-from-vironit
    Ok. Thanks, i try it
    sergei-from-vironit
    @sergei-from-vironit
    Hello. I try do two test: with 1kk dataset and 2kk data set. Do I have to change the configuration? (may be more steps if i use longer dataset)
    Guillaume Klein
    @guillaumekln
    Hi, if you want to compare the 2 results you probably want to keep the same configuration and the same number of training steps.
    sergei-from-vironit
    @sergei-from-vironit
    I did this test, and result very different.
    May be one dataset worse than second.
    But may be we need remake some parameters )
    Guillaume Klein
    @guillaumekln
    Better result with more data? This looks expected.
    sergei-from-vironit
    @sergei-from-vironit
    no. worse result with more dataset
    Anna Samiotou
    @annasamt
    image.png
    Hello, I have trained OpenNMT-tf models with version 2.1.1 in a Linux/Ubuntu 18.04 GPU set up . I have now installed latest version 2.6.0 on a new VM which has CPU-only. In principle, is it possible to run inference from the latter CPU-only machine on the above-mentioned GPU trained models? Btw, when running inference, I do get a couple of error messages reg. missing libraries (please see screenshot) but I am not sure whether I should ignore them if I only use CPU
    Guillaume Klein
    @guillaumekln
    From the logs, it seems you are loading a checkpoint trained wtih version 1.x and not 2.1.1. Is that expected?
    (Yes, you should ignore warnings about missing NVIDIA libraries when running on CPU)
    Anna Samiotou
    @annasamt
    No, the models were trained with v2x for sure
    Guillaume Klein
    @guillaumekln
    What is the content of the model_dir directory that you defined in the configuration?
    Anna Samiotou
    @annasamt
    image.png
    Is this what you mean, or the contents of data.yml
    Guillaume Klein
    @guillaumekln
    Yeah that is what I wanted to see. Do you also have a file named checkpoint? If so, what is its content?
    Anna Samiotou
    @annasamt
    image.png
    Yes, I do. But this one refers to the averaged checkpoint of 27K steps (i.e. the 20K checkpoint file was overwritten by this one). But I decided to use the 20K checkpoint for the inference - is this OK?
    Guillaume Klein
    @guillaumekln
    That could be an issue as we try to load the last checkpoint by default. However, this does not match the error log. Could you try using the --checkpoint_path command line option and point to the 20k checkpoint?
    Anna Samiotou
    @annasamt
    I think that this exactly what I did - Please look at the command in the "script-NMT-OpenNMT-tf.sh" : onmt-main --config $1/data.yml --auto_config --model_type Transformer --checkpoint_path $1/model.ckpt-20000 infer --features_file $2/s.txt --predictions_file $2/t.txt
    Anna Samiotou
    @annasamt
    Is version OpenNMT-tf v2.6.0 backward compatible with v 2.2.1? Perhaps if I install the latter in the VM instead? Perhaps the fail reg. TensorSliceReader constructor will then disappear?
    Anna Samiotou
    @annasamt
    Well, in https://github.com/OpenNMT/OpenNMT-tf I see that "The project is production-oriented and comes with backward compatibility guarantees." so please ignore my previous message
    Guillaume Klein
    @guillaumekln
    You need to set ckpt-20000, not model.ckpt-20000 on the command line.
    Anna Samiotou
    @annasamt
    Seems to work now! Thanks for the tip! So did this change with 2x of with the latest 2.6.0?
    Guillaume Klein
    @guillaumekln
    The model prefix changed when moving from 1.x to 2.x.
    Anna Samiotou
    @annasamt
    OK, it was a "left-over" from the script I used with 1x.. and did not notice.. Merci bien!