Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Guillaume Klein
    @guillaumekln
    Yeah that is what I wanted to see. Do you also have a file named checkpoint? If so, what is its content?
    Anna Samiotou
    @annasamt
    image.png
    Yes, I do. But this one refers to the averaged checkpoint of 27K steps (i.e. the 20K checkpoint file was overwritten by this one). But I decided to use the 20K checkpoint for the inference - is this OK?
    Guillaume Klein
    @guillaumekln
    That could be an issue as we try to load the last checkpoint by default. However, this does not match the error log. Could you try using the --checkpoint_path command line option and point to the 20k checkpoint?
    Anna Samiotou
    @annasamt
    I think that this exactly what I did - Please look at the command in the "script-NMT-OpenNMT-tf.sh" : onmt-main --config $1/data.yml --auto_config --model_type Transformer --checkpoint_path $1/model.ckpt-20000 infer --features_file $2/s.txt --predictions_file $2/t.txt
    Anna Samiotou
    @annasamt
    Is version OpenNMT-tf v2.6.0 backward compatible with v 2.2.1? Perhaps if I install the latter in the VM instead? Perhaps the fail reg. TensorSliceReader constructor will then disappear?
    Anna Samiotou
    @annasamt
    Well, in https://github.com/OpenNMT/OpenNMT-tf I see that "The project is production-oriented and comes with backward compatibility guarantees." so please ignore my previous message
    Guillaume Klein
    @guillaumekln
    You need to set ckpt-20000, not model.ckpt-20000 on the command line.
    Anna Samiotou
    @annasamt
    Seems to work now! Thanks for the tip! So did this change with 2x of with the latest 2.6.0?
    Guillaume Klein
    @guillaumekln
    The model prefix changed when moving from 1.x to 2.x.
    Anna Samiotou
    @annasamt
    OK, it was a "left-over" from the script I used with 1x.. and did not notice.. Merci bien!
    Anna Samiotou
    @annasamt
    Hello again, for adding more train data sets to an existing model (checkpoint) and train it further, not necessarily for fine-tuning but rather for adding more parallel examples to the supervised learning, is the best practice to add the new train data sets to the config file (e.g. data.yml) or do update-vocab (merge) or both?
    Also, once a model is trained and evaluated., and has met expectations, is it safe to add the excluded from the original training validation and test data sets?
    Guillaume Klein
    @guillaumekln
    Hi. If you are using BPE or SentencePiece, update-vocab is rarely necessary. So if you have a new data, you could simply add them to the existing training file and continue the training.
    For the second question, do you mean to use the test set as training data?
    Anna Samiotou
    @annasamt
    1) I use SentencePiece, and I see your point. 2) So concatenate the new data to the existing src/tgt train files instead of adding the new files in the config 3) Yes, when testing of final model is through and results satisfactory, to use the test set as train data. I guess not the val set, since when retraining it is needed and should not be included in the train data.
    Emmanuel Ohana
    @eohana
    Hello guys,
    I'm currently working on a wrapper for the OpenNMT Tokenizer in order to use it as TensorFlow ops.
    I wanted to know if :
    a) is there any interest for it in the community ? it's still in an early phase, so it would be awesome if i could get some feedback on it
    b) could i be able to publish the code and distribute it as a pip package ? since it embeds a custom build of the onmt tokenizer i wanted to ask before doing anything
    Thanks !
    Guillaume Klein
    @guillaumekln

    Hi Emmanuel,

    a) I think there are some interests. We had similar plans in the past but this proved to be a bit complex to build and maintain over time. Other initiatives such as https://github.com/tensorflow/text could be an alternative for some use cases.
    b) Sure. The Tokenizer is MIT licensed so you can do anything you want as long as you credit the original project.

    Please keep me updated and let me know if there are any changes in the Tokenizer that would make your work easier.

    Emmanuel Ohana
    @eohana
    Awesome ! Thanks for your reply Guillaume.
    I just created the repo (https://github.com/eohana/tensorflow-onmttok-ops). Wheels are not yet published (i'll try to setup the ci pipeline asap), but for those who want to check it out, there's a Dockerfile provided for building the wheels.
    To answer you about the tokenizer, i think that providing support for building in a Bazel environment could be a nice addition, as it could facilitate the ops integration when building TF Serving. I'll try to investigate myself, but i don't think it's a priority for now.
    Feel free to give me you feedback, as for now i still consider it as a POC
    Thanks for your help !
    Guillaume Klein
    @guillaumekln
    This looks great, nice work! I'm happy to see that the new registration mechanism in OpenNMT-tf is useful.
    yutongli
    @yutongli
    Hi Guillaume, I am training a transformer model(default config) with about 48million training examples. I am trying to figure a few things that I could not understand, could you please help? 1. some discussions above mentioned 'Accumulate gradients of x iterations to reach effective batch size of 25000', in my case it was 2 iterations. How do we interpret this? Is one iteration one step? 2. every 4600 steps, I saw a message 'tensorflow/core/kernels/data/shuffle_dataset_op.cc:143] Filling up shuffle buffer (this may take a while): 2689207 of 48195269', the shuffle usually takes 2-3 minutes, not a big deal in terms of time, but I am wondering can we control the frequency of shuffling, how 4600 steps was calculated for my case? 3. My original data set was 50m, I split it to 48m as training, 1m for dev, and 1m for test. evaluating 1m seems taking a long time, would you recommend use a smaller size for dev and test, for instance 100k or 50k, does that seem to be too less compared with the amount of training examples?
    Guillaume Klein
    @guillaumekln
    Hi,
    1. Maybe iterations is a bit misleading here, but in "Accumulate gradients of x iterations to reach effective batch size of 25000": x iterations = x batches = 1 training step. Does that make sense?
    2. The parameter sample_buffer_size can be used to configure the shuffle buffer size. Small values mean faster filling but worse shuffling of the training data, while large values mean slower filling but improved shuffling. The default buffer size is the dataset size.
    3. You don't need more than 5k sentences in your dev and test sets.
    yutongli
    @yutongli
    Sure, thanks so much for your response! for #1, the default effective size was 25000 and batch_size was 3072 tokens, so if it says 'Accumulate gradients of 2 iterations to reach effective batch size of 25000', that means 2 batches = 25000 tokens, and then 1 batch = 12500 tokens. regardless of the batch_size 3072 in the config?
    for#2, every 4600 steps, it logs 'filling up shuffle buffer', does that mean 4600 steps is one epoch of data in my case?
    for #3, thanks so much. i will use smaller size for dev and test to speed up the training
    Guillaume Klein
    @guillaumekln
    1. If you are training with N GPUs you are also processing N batches in parallel. So we want: 3072 * N * x >= 25000 where x in the number of accumulated iterations.
    2. Not necessarily an epoch but a point where the TensorFlow runtime determines that it should fill the buffer with more data.
    yutongli
    @yutongli
    Thanks so much for the help!
    Rahaf Alnajjar
    @MaryaAI
    Hi, In the documentation "By default, OpenNMT-tf expects and generates tokenized text. The users are thus responsible to tokenize the input and detokenize the output with the tool of their choice."
    Can you please give me example on how to employ my tokanized data and how to use it with transformer model?
    Guillaume Klein
    @guillaumekln
    Hi! You can find more information on the default file format here: https://opennmt.net/OpenNMT-tf/data.html#text
    Rahaf Alnajjar
    @MaryaAI
    Thanks
    VishalKakkar
    @VishalKakkar
    Hi @guillaumekln, I trained two Transformer model one using open NMT and one by writing code in Tensorflow. I made sure that all the configurations of both model are same. I am getting higher performance in Tensorflow. Now I want to convert that model in to Ctranslate. But as per my understanding Ctranslate only support Opennmt models as of now. So I have 2 questions
    1) Can I load my tensorflow model weight in Opennmt and then use Ctranslate. 2) Is there direct way of loading non opennmt tensorflow model in Ctranslate.
    Guillaume Klein
    @guillaumekln
    You should have a look at how models are converted in CTranslate2. The conversion process is about filling a model specification with trained weights. You could write your own converter that extends the base Converter class.
    But how do you explain that one performs better than the other if they are using the same configuration?
    VishalKakkar
    @VishalKakkar
    Hi @guillaumekln, I am debugging the performance diff. I am using this code https://www.tensorflow.org/tutorials/text/transformer in my implementation. I could see 2 major difference between this code and open nmt code. 1) open nmt code has extra normalization layer after encoder and decoder layer. 2) I think there is also difference of how two codes using the positional embeddings.
    Guillaume Klein
    @guillaumekln
    Ok. So it is not the same model and you can't convert it to CTranslate2 unfortunately.
    VishalKakkar
    @VishalKakkar
    What if I add normalisation in my code and load my model weights in the open nmt model and then use ctranslate to convert? Will that work?
    Guillaume Klein
    @guillaumekln
    Guillaume Klein
    @guillaumekln
    But I don't see how the TensorFlow tutorial could be better than OpenNMT outside of toy examples. It is missing beam search and multi GPU/gradient accumulation.
    VishalKakkar
    @VishalKakkar
    @guillaumekln yes you are right, I am debugging the diff.
    VishalKakkar
    @VishalKakkar
    Hi @guillaumekln can we train Language Model using open nmt, and can we get score from trained language model in ctranslate2.
    Guillaume Klein
    @guillaumekln
    VishalKakkar
    @VishalKakkar
    Hu @guillaumekln is it possible to get alignment score without training model with alignment if I am using only one decoder layer in Transformer.
    Guillaume Klein
    @guillaumekln
    You could get the attention values, but they will not represent alignments as you would expect.
    VishalKakkar
    @VishalKakkar
    @guillaumekln this you are saying because of multiple heads? Or there is any other reason?
    Guillaume Klein
    @guillaumekln
    Yes because there are multiple heads and no single head is expected to reflect alignments.
    Anna Samiotou
    @annasamt
    Hello, for OpenNMT-tf v2x, are they all compatible versions (back&forward)? For example, I've have trained with v2.2.1 on the test environment but in the production environment (used for translations) I've installed v2.6.0. I also plan to update test env. with latest version v2.8.0. Thanks in advance.
    Guillaume Klein
    @guillaumekln
    Yes. Versions are backward compatible in the same major release (e.g. 2.x). Forward compatibility, however, is not guaranteed.
    Anna Samiotou
    @annasamt
    Many thanks, Guillaume, it's backward compatibility I'm interested in.
    Anna Samiotou
    @annasamt
    Hello, I train a shared sentencepiece model (spm_train)) and then run the onmt-build-vocab for each language. I then train the omnt-tf v2x model. I notice that sometimes funny words are generated in the output, a result of gluing subword chunks that don't make a valid word in the target language. In spm-encode, we can run it with the parameter --vocabulary and also --vocabulary_threshold, to only allow to produce symbols that exist in the vocabulary and possibly with some frequency. Is this also possible for the inference in order to avoid generation of non-existing target words? and how can I introduce it? Thanks