Translatorobject? Can you post these details in this issue: OpenNMT/CTranslate2#414
Before CUBLAS_NOT_SUPPORTED we got "Out of memory” error.
We run 2 models, each of 300mb. But in Nvidia-SMI I saw that only 1GB of 40GB is used and then get “out of memory"
To convert models to CTRanslate I used that command (to create 8 bit models)
ct2-opennmt-tf-converter --model_path INPUT_ONMT_MODEL_DIR --model_spec TransformerBig --output_dir OUTPUT_DIR --quantization int8
May be it’s because of “fabric manager”, that need to run with A100 GPU ?
The Translator object creates like:
model = ctranslate2.Translator(path, device=DEVICE)
The package version:
We made tests with the latest CTranslate2 (2.0) release and found that translation speed on Geforce RTX 2080 is 25% faster than 3090 on single GPU. We loaded 14 language models (around 4.7Gb in memory ) in both GPU.
How is it can be ?
We tested “int8” models with “int8” and “float” parameters. With beam_size 1 and 2. Same results - 2080 is always faster 3090.
2080: Driver Version: 460.32.03 CUDA Version: 11.2
3090: Driver Version: 460.73.01 CUDA Version: 11.2
Running in Docker container.