isinstance(nlp, spacy.lang). Unfortunately, when I do that now, I get an error like TypeError: isinstance() arg 2 must be a type or tuple of types
isinstance(mod, spacy.lang.en.English)returns True, but ideally I could do this test without reference to a specific language
I have build a spacy pipeline for binary text classification. The pipeline works fine for models that are available through the spacy library. In order to compare my existing results to other models (https://github.com/google-research/bert#pre-trained-models) I used the convert_bert_original_tf_checkpoint_to_pytorch.py script (https://github.com/huggingface/transformers/blob/master/src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py) to convert existing checkpoints to PyTorch models. After that I wanted to "load those pyTorch models from a path" (https://github.com/explosion/spacy-transformers) to my pipeline.
I am able to successfully load those pyTorch models to my pipeline, but when I start the training with the same training data, I get the error message:
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1812 # remove once script supports set_grad_enabled
1813 _no_grad_embeddingrenorm(weight, input, max_norm, norm_type)
-> 1814 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
I honestly do no understand this error and how I can solve it. After researching this problem, I tried to adjust my config file in different manners - without success. My only track at the moment is, if I reduce the my input size below 200 words, it is working fine. I would like to compare those models with the same inputa data (with the limitation of the 512 token by BERT) without truncation.
Does someone has an idea or a clue, how I could fix that problem? Any idea, would be highly appreciated!
Thanks a lot in advance!
spacy traincommand to train a custom NER model. For my use case, entity-based evaluation is not relevant, I'd prefer to do token-based evaluation. Is there an easy way to use a custom Scorer with the command line, or I need to write a script for it? Many thanks!