Note:
some features require Python 3.5 and after (eg: Distributed multigpu, entmax)
we currently only support PyTorch 1.4
I can see pytorch can find GPU resources.
>>> torch.cuda.is_available()
True
>>> torch.cuda.current_device()
0
>>> torch.cuda.device(0)
<torch.cuda.device object at 0x7f8c0a3cec50>
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name(0)
'NVIDIA Tesla K80'
>>> torch.cuda.get_device_name(0)
'NVIDIA Tesla K80'
However, OpenNMT is not finding the GPU resource. What is going on here?
I am using OpenNMT from https://github.com/OpenNMT/OpenNMT-py/tree/1.2.0
Traceback (most recent call last):
File "../../OpenNMT-py/train.py", line 6, in <module>
main()
File "/root/work/context/huggingface-models/OpenNMT-py/onmt/bin/train.py", line 197, in main
train(opt)
File "/root/work/context/huggingface-models/OpenNMT-py/onmt/bin/train.py", line 91, in train
p.join()
File "/root/miniconda3/envs/open-nmt-env/lib/python3.7/multiprocessing/process.py", line 140, in join
res = self._popen.wait(timeout)
File "/root/miniconda3/envs/open-nmt-env/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/root/miniconda3/envs/open-nmt-env/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
File "/root/work/context/huggingface-models/OpenNMT-py/onmt/bin/train.py", line 181, in signal_handler
raise Exception(msg)
Exception:
-- Tracebacks above this line can probably
be ignored --
Traceback (most recent call last):
File "/root/work/context/huggingface-models/OpenNMT-py/onmt/bin/train.py", line 135, in run
gpu_rank = onmt.utils.distributed.multi_init(opt, device_id)
File "/root/work/context/huggingface-models/OpenNMT-py/onmt/utils/distributed.py", line 27, in multi_init
world_size=dist_world_size, rank=opt.gpu_ranks[device_id])
File "/root/miniconda3/envs/open-nmt-env/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
timeout=timeout))
File "/root/miniconda3/envs/open-nmt-env/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
timeout)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
https://huggingface.co/blog/constrained-beam-search
Wonder if this can be implemented in OpenNMT/Ctranslate?
For a Neural Machine Translation (NMT) task, my input data has relational information. Probably I can use Graph Neural Network (GNN) and use a Graph2Seq model. But I can't find a good generational model for GNN.
So I want to use a Transformer model. But then the massive problem is how can I embed structural information in a Transformer? Is there any open source artefact for Relational Transformer that I can use out of the box?
For a Neural Machine Translation (NMT) task, my input data has relational information. Probably I can use Graph Neural Network (GNN) and use a Graph2Seq model. But I can't find a good generational model for GNN.
So I want to use a Transformer model. But then the massive problem is how can I embed structural information in a Transformer? Is there any open source artefact for Relational Transformer that I can use out of the box?
OpenNMT only implemented the vanilla Transformer model, so there is such thing as a Relational Transformer that you can directly use. Using Copy with BPE may not be a good choice, the BPE is able to generate any combination of tokens as you want, I don't see any reason that you need both in your model since the Copy mechanism in most situations does not work very well and is not able to copy the target tokens that you want.
But copy mechanism can help to handle the unknown tokens. Isnt it?
BPE is also able to handle unknown tokens, every word after BPE tokenization will be a combination of sub-words, therefore the BPE will not have the problem of unseen tokens in almost every situation. As I said, the Copy mechanism may not give what you want, the Copy mechanism is a relatively outdated structure considering the current Seq2Seq pre-trained models like GPT2 or variants of Transformer, so I don't think you need them both in your case. It will be lots of work to incorporate them both and may not give a good performance.
@PosoSAgapo what would happen if I just use an existing framework and do BPE and also enable the copy mechanism.
I can't say what will happen, but I don't think it will be a good option to combine them both. Truth is SOTA models are all based on the pre-trained models or various variants of Transformers, so I think this somehow tells the fact that the Copy mechanism may not be a good solution to slove unseen tokens. I also use the Copy mechanism many times when I use the OpenNMT, but it does not work very well as far as my experience.