replace_unknown_targetuses the model attention to select the corresponding source token. However, it is well known that Transformer attention usually can not be used as target-source alignments. You should either constrain the attention to be an alignment or use subword tokenization (like SentencePiece) to avoid UNK. Note that the UNK token does not appear in the vocab but is automatically added when starting the training.
max_stepin the training parameters. There should be a warning about this somewhere in the logs. We just improved that for the next version: a more visible error message will be shown, see OpenNMT/OpenNMT-tf@21df1c7
case_markupoption from the Tokenizer.
effective_batch_sizeworks. The auto config for a Transformer model is
batch_size: 3072. This means that 9 iterations are required to accumulate the gradients to reach a batch size of 25000 on a single GPU. So does that mean that the actual effective batch size is
3072 * 9 = 27648? If this is true, then I would expect that if I set
8192, the actual effective batch size would be
8192 * 4 = 32768. This feels like enough of a difference in effective batch size that it would have an impact on training. Is this accurate?
batch_sizesince increasing it would result in OOM and decreasing it would result in under utilization of compute resources.
Translatorobject? Can you post these details in this issue: OpenNMT/CTranslate2#414
Before CUBLAS_NOT_SUPPORTED we got "Out of memory” error.
We run 2 models, each of 300mb. But in Nvidia-SMI I saw that only 1GB of 40GB is used and then get “out of memory"
To convert models to CTRanslate I used that command (to create 8 bit models)
ct2-opennmt-tf-converter --model_path INPUT_ONMT_MODEL_DIR --model_spec TransformerBig --output_dir OUTPUT_DIR --quantization int8
May be it’s because of “fabric manager”, that need to run with A100 GPU ?
The Translator object creates like:
model = ctranslate2.Translator(path, device=DEVICE)
The package version: