case_markupoption from the Tokenizer.
effective_batch_sizeworks. The auto config for a Transformer model is
batch_size: 3072. This means that 9 iterations are required to accumulate the gradients to reach a batch size of 25000 on a single GPU. So does that mean that the actual effective batch size is
3072 * 9 = 27648? If this is true, then I would expect that if I set
8192, the actual effective batch size would be
8192 * 4 = 32768. This feels like enough of a difference in effective batch size that it would have an impact on training. Is this accurate?
batch_sizesince increasing it would result in OOM and decreasing it would result in under utilization of compute resources.
Translatorobject? Can you post these details in this issue: OpenNMT/CTranslate2#414
Before CUBLAS_NOT_SUPPORTED we got "Out of memory” error.
We run 2 models, each of 300mb. But in Nvidia-SMI I saw that only 1GB of 40GB is used and then get “out of memory"
To convert models to CTRanslate I used that command (to create 8 bit models)
ct2-opennmt-tf-converter --model_path INPUT_ONMT_MODEL_DIR --model_spec TransformerBig --output_dir OUTPUT_DIR --quantization int8
May be it’s because of “fabric manager”, that need to run with A100 GPU ?
The Translator object creates like:
model = ctranslate2.Translator(path, device=DEVICE)
The package version: