https://huggingface.co/blog/constrained-beam-search
Wonder if this can be implemented in OpenNMT/Ctranslate?
For a Neural Machine Translation (NMT) task, my input data has relational information. Probably I can use Graph Neural Network (GNN) and use a Graph2Seq model. But I can't find a good generational model for GNN.
So I want to use a Transformer model. But then the massive problem is how can I embed structural information in a Transformer? Is there any open source artefact for Relational Transformer that I can use out of the box?
For a Neural Machine Translation (NMT) task, my input data has relational information. Probably I can use Graph Neural Network (GNN) and use a Graph2Seq model. But I can't find a good generational model for GNN.
So I want to use a Transformer model. But then the massive problem is how can I embed structural information in a Transformer? Is there any open source artefact for Relational Transformer that I can use out of the box?
OpenNMT only implemented the vanilla Transformer model, so there is such thing as a Relational Transformer that you can directly use. Using Copy with BPE may not be a good choice, the BPE is able to generate any combination of tokens as you want, I don't see any reason that you need both in your model since the Copy mechanism in most situations does not work very well and is not able to copy the target tokens that you want.
But copy mechanism can help to handle the unknown tokens. Isnt it?
BPE is also able to handle unknown tokens, every word after BPE tokenization will be a combination of sub-words, therefore the BPE will not have the problem of unseen tokens in almost every situation. As I said, the Copy mechanism may not give what you want, the Copy mechanism is a relatively outdated structure considering the current Seq2Seq pre-trained models like GPT2 or variants of Transformer, so I don't think you need them both in your case. It will be lots of work to incorporate them both and may not give a good performance.
@PosoSAgapo what would happen if I just use an existing framework and do BPE and also enable the copy mechanism.
I can't say what will happen, but I don't think it will be a good option to combine them both. Truth is SOTA models are all based on the pre-trained models or various variants of Transformers, so I think this somehow tells the fact that the Copy mechanism may not be a good solution to slove unseen tokens. I also use the Copy mechanism many times when I use the OpenNMT, but it does not work very well as far as my experience.