rspeer on master
workaround db connection gettin… Merge pull request #307 from am… (compare)
rspeer on master
optimize postgresql queries and… Merge pull request #303 from gs… (compare)
http://conceptnet5.media.mit.edu/is this a known issue?
Hi, a newbie here. I tried to run "snakemake data/vectors/mini.h5" and received this error: File "/home/bancherd/.local/lib/python3.8/site-packages/wordfreq/tokens.py", line 264, in tokenize
tokens = _mecab_tokenize(text, language.language)
File "/home/bancherd/.local/lib/python3.8/site-packages/wordfreq/mecab.py", line 40, in mecab_tokenize
MECAB_ANALYZERS[lang] = make_mecab_analyzer(lang)
File "/home/bancherd/.local/lib/python3.8/site-packages/wordfreq/mecab.py", line 20, in make_mecab_analyzer
ModuleNotFoundError: No module named 'ipadic'
[Sun Aug 22 17:01:09 2021]
Error in rule miniaturize:
cn5-vectors miniaturize data/vectors/numberbatch-biased.h5 data/vectors/w2v-google-news.h5 data/vectors/mini.h5
(exited with non-zero exit code)
I tried to look for "ipadic", without success. Can anyone suggest solutions? Thank you!
Inspite of the warning in pypi, I went ahead , installed "ipadic" and rerun the script: got the following(different) error:Building prefix dict from /home/bancherd/.local/lib/python3.8/site-packages/wordfreq/data/jieba_zh.txt ...
Dumping model to file cache /tmp/jieba.u600b79f75cbc9b33aa477293be70c0e2.cache
Loading model cost 0.057 seconds.
Prefix dict has been built successfully.
/usr/bin/bash: line 1: 37532 Killed cn5-vectors miniaturize data/vectors/numberbatch-biased.h5 data/vectors/w2v-google-news.h5 data/vectors/mini.h5
[Sun Aug 22 20:16:41 2021]
Error in rule miniaturize:
Hi all, I'm also building ConceptNet5 for the first time on a machine running Ubuntu 20.04 with 32 GB of RAM. I was able to run ./build.sh without any obvious errors that I saw in the output, but pytest is returning failed and skipped tests. Specifically:
test_languages.py fails (316), error message indicates it is unable to find the language_data module (traced to line 809 in .../langcodes/init.py).
test_json_ld.py fails as well with a KeyError on line 82 (which is: "quiz = ld[api('/c/en/quiz')]") and line 161 ("rel = ld[vocab('rel')])
Do these errors indicate that the installation was not successful and I should re-install? Or, have others encountered the same issues and have solutions? I did check the documentation and Googled the errors, but did not find any relevant troubleshooting solutions. Any suggestions would be appreciated.
Hi i want build conceptnet node.
But when i run build.sh i get this error
Error in rule convert_opensubtitles_ft: jobid: 0 output: data/vectors/fasttext-opensubtitles.h5 RuleException: CalledProcessError in line 663 of /home/zb/Desktop/conceptnet/Snakefile: Command 'set -euo pipefail; CONCEPTNET_DATA=data cn5-vectors convert_fasttext -n 2000000 data/raw/vectors/ft-opensubtitles.vec.gz data/vectors/fasttext-opensubtitles.h5' returned non-zero exit status 137. File "/home/zb/Desktop/conceptnet/Snakefile", line 663, in __rule_convert_opensubtitles_ft File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run Exiting because a job execution failed. Look above for error message [Sat Sep 11 19:11:39 2021] Finished job 206. 371 of 472 steps (79%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/zb/Desktop/conceptnet/.snakemake/log/2021-09-11T182112.270477.snakemake.log
What could be the reason for this?
How can I proceed with the installation instead of starting over?
300 GB of free disk space
At least 30 GB of available RAM
The time and bandwidth to download 24 GB of raw data
I start build.sh and on
464 of 472 steps (98%) done
I getting error.
/usr/bin/bash: line 1: 22394 Killed cn5-vectors intersect data/vectors/crawl-300d-2M-retrofit.h5 data/vectors/w2v-google-news-retrofit.h5 data/vectors/glove12-840B-retrofit.h5 data/vectors/fasttext-opensubtitles-retrofit.h5 data/vectors/numberbatch-retrofitted.h5 data/vectors/intersection-projection.h5 [Mon Sep 13 03:26:07 2021] Error in rule merge_intersect: jobid: 177 output: data/vectors/numberbatch-retrofitted.h5, data/vectors/intersection-projection.h5 shell: cn5-vectors intersect data/vectors/crawl-300d-2M-retrofit.h5 data/vectors/w2v-google-news-retrofit.h5 data/vectors/glove12-840B-retrofit.h5 data/vectors/fasttext-opensubtitles-retrofit.h5 data/vectors/numberbatch-retrofitted.h5 data/vectors/intersection-projection.h5 (exited with non-zero exit code) Removing temporary output file data/psql/edges_gin.csv. [Mon Sep 13 03:27:40 2021] Finished job 3. 464 of 472 steps (98%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/zb/Desktop/conceptnet5/.snakemake/log/2021-09-12T221420.189372.snakemake.log
Hi all, I'm getting measures of semantic similarity between two English words from ConceptNet for a project I'm working on, and the values I get from the ConceptNet API (using the 'rel' query, e.g., https://api.conceptnet.io/relatedness?node1=/c/en/invaluable&node2=/c/en/unvaluable) differ from those I get from the raw Numberbatch embeddings (with no further tuning) by loading the data in word2vec format via
gensim.models.KeyedVectors (per the example here: https://www.kaggle.com/danofer/poetry2vec-word-embeddings) and the
.wv.similarity method. Here's an example of the code I'm using:
from gensim.models import KeyedVectors model = KeyedVectors.load_word2vec_format('numberbatch-en-19.08.txt.gz',binary=False,unicode_errors='ignore',limit=800000) model.wv.similarity("invaluable","unvaluable")
In some cases the difference is clearly just a matter of rounding (the API queries round to maybe 3 decimal places, in others it's more substantial. In the example pair "invaluable" and "unvaluable", ConceptNet's API gives me a relatedness of 0.455, and
model.wv.similarity returns 0.251. I would get all of the comparisons from just one source and call it a day, but unfortunately it seems some of the words I'm comparing are not accessible via the API (or, at least, I haven't discovered a way to access them: for example, words or phrases that contain apostrophes).
Are the underlying data different between these two sources? Are the functions not equivalent? I'm new to word2vec so perhaps I'm using that wrong. Any advice would be appreciated. Thanks!