by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • May 30 11:31
    PoriNiki commented #290
  • May 28 11:59
    PoriNiki closed #290
  • May 28 11:58
    PoriNiki commented #290
  • May 28 11:01
    PoriNiki edited #290
  • May 28 11:00
    PoriNiki edited #290
  • May 28 10:55
    PoriNiki edited #290
  • May 28 10:53
    PoriNiki edited #290
  • May 28 10:52
    PoriNiki edited #290
  • May 28 10:50
    PoriNiki opened #290
  • May 27 15:47

    vanyacohen on 5.8.1

    (compare)

  • May 20 19:44

    rspeer on version5.8

    use the |tojson filter to preve… update version number and show … Merge pull request #289 from co… (compare)

  • May 20 19:43

    rspeer on fix-xss-in-json-ld

    (compare)

  • May 20 19:43

    rspeer on master

    use the |tojson filter to preve… update version number and show … Merge pull request #289 from co… (compare)

  • May 20 19:43
    rspeer closed #289
  • May 20 16:55
    rspeer commented #289
  • May 20 16:50
    Sentry reported a fatal event in conceptnet-web: OSError: [Errno 98] Address already in use
  • May 20 16:04
    rspeer synchronize #289
  • May 20 16:04

    rspeer on fix-xss-in-json-ld

    update version number and show … (compare)

  • May 19 16:24
    rspeer opened #289
  • May 19 16:22

    rspeer on fix-xss-in-json-ld

    use the |tojson filter to preve… (compare)

microSoftware
@microSoftware
Can't I build this dependency with ConceptNet? The concept of "This" for example is the same for every language. So if I create a dependency tree of the first 2000 concepts. Then, with ConceptNet I can translate any concept into any languages. I'm probably missing something in my reasoning but I don't know what
Robyn Speer
@rspeer
I'm a little fuzzy on the idea but I can kind of see it. As in, start with a core of key concepts (maybe the Swadesh lists), and then organize other concepts in a directed graph that points toward that core.
and then for each concept outside the core, you have a list of which related concepts are closer to the core than it. If that does what you want it sounds pretty cool.
Well, hmm. Without some filtering of what kinds of links you want to follow, I suspect that almost all concepts would end up at distance 1 or 2. Many concepts have lots of edges coming out of them, so it's likely that at least one of those edges would point to a word in the core, and if not, then almost certainly at least one of them would point to another concept that's adjacent to the core that way.
Robyn Speer
@rspeer
The thing about what you described is, I think you're looking to not do this over all the edges in the graph, but over some subset of edges that indicate the essential knowledge you need to understand a word. And for a word to be at distance 1 from the core, you'd want not just one of its edges, but all of its essential edges, to point to the core.
microSoftware
@microSoftware
I'm not sure I get everything you said about the distances and edges. I'll try with the first 300 words in English by copying the words from learnthesewordsfirst.com and see how it goes
amirouche
@amirouche
I asked a while back how conceptnet does extract relation from wikipedia / wiktionary the answer was https://github.com/LuminosoInsight/wikiparsec
Would it be possible to parse the html rendering instead of the wikimarkup?
I mean did someone try to do that?
(by the way I am trying to load wikidata dump in my database it is MUCH MORE difficult than conceptnet, I mean it is all hardware work, but it takes a lot of time, the current estimation is at 2 months of import process remaining)
Janna Siberia
@JannaSiberia_twitter

Hello,
I would like to train the numberbatch embeddings and have some questions.
How to use conceptnet5.vectors?
Is there a main function that starts the train function?
Do i have to build a database with CN Graph to be able to make retrofitting, or is it possible to work only with CSV files of assertions?

Thank you very much!

amirouche
@amirouche
@microSoftware tx for the link about multi layer dictionary, it is very interesting.
Robyn Speer
@rspeer
@amirouche Parsing the HTML instead of the wikimarkup is certainly a possibility -- it means you at least get to use something Official for turning the wikimarkup into HTML, either Parsoid or just scraping the page -- but it seems to me like it leaves you with as big of a parsing problem as you started with
and actually I suspect that some semantics are lost. Which templates are invoked tells you something about the intent of the markup. A word could be in italics for many different reasons, a link could mean many different things, etc.
Robyn Speer
@rspeer
And I know there's some Parsoid feature that tries to preserve that information about intent but it seems poorly documented and mostly designed around the use case of the MediaWiki Visual Editor
Filip Ilievski
@filievski
@rspeer it seems like there is data in the CN web page that is missing in the dump, such as the links between lemmas and their POS version (/c/en/bike to /c/en/bike/n). Is that observation correct? If so, is there a way to get the webpage-powering version of CN?
Robyn Speer
@rspeer
@filievski I don't think the web page does anything that explicitly represents the link between /c/en/bike and /c/en/bike/n -- correct me if I've overlooked something. It's just a syntactic operation on these URIs. The page for /c/en/bike/n knows to provide that link to /c/en/bike in the header, and the database is designed to include results for /c/en/bike/n when you look up /c/en/bike.
The data that the Web page is built from is API queries like this: http://api.conceptnet.io/c/en/bike?grouped=true
Filip Ilievski
@filievski
@rspeer, correct, it is a trivial operation, though for, say, graph traversal, it makes a difference. thanks for the confirmation.
Akash Tyagi
@Akashtyagi08
I was trying out conceptnet pre-trained embedding by loading them into gensim. The results look bit different, like for most_similar("dog") the results are like "undog", "nondog".
Has anyone tried this, am I doing something wrong ?
JohannaOm
@JohannaOm
does anyone know in what range assertions weights are defined?
JohannaOm
@JohannaOm
I have one more question for evaluation. The evaluation documentation says that the database is required for terms out of vocabulary to calculate the embedding from neighbouring terms in the graph. But I could not find a function that accesses the database for this case. I could only find the function that uses the prefix tree to derive similar words to calculate the embedding vector for an out of vocabulary term. In which case do I need the database? Which evaluation function needs access to the database?
Robyn Speer
@rspeer
@JohannaOm Aha, that documentation is probably out of date
There used to be an out-of-vocabulary step that would try looking up adjacent terms in the graph, but now we have the "propagate" step that creates vectors for these terms and makes them part of the vocabulary
which makes the out-of-vocab step a lot simpler
@Akashtyagi08 that's odd. I don't know what gensim is doing differently
we show "nondog" as only 0.7 related to "dog"
luke12321
@luke12321
Dear friend, I want to study conceptnet5 code, Please give me some advice. Thanks
Akash Tyagi
@Akashtyagi08
@rspeer Can you please share your mail id maybe ? I will create a colab notebook with the outputs. Wasn't able to find any good tutorial for the same.
JohannaOm
@JohannaOm
Hello,
I tried to reproduce the results of the numberbatch paper and used the evaluation function.
I got the results of word similarity task on the MEN data set 0.878 compared to the paper 0.866.
If the results are better with the new ConceptNet version is there a paper with actual evaluation results?
And should I use ConceptNet 5.5 to be able to reproduce the 0.866 result?
What makes ConceptNet well suited for Embeddings compared to other knowledge graphs?
Unfortunately I could not find in the paper what makes it so good for this task.
Totole75
@Totole75
Hi,
In the FAQ you explain that PostgreSQL is better than any graph oriented database you tried for Conceptnet dataset. What is the reason why PostgreSQL is better than Neo4j ? Is it because of the licence of Neo4j, or is there a "computer science" reason ?
PA
@PA15380441_twitter
Hello everybody, is there a way to keep only some kind of concepts such as common words, or companies, or people ? I could find clear categories for each node... Cheers
JohannaOm
@JohannaOm
Hello,
I have a question about the retrofitting parameters. Does the orig_vec_weight variable define the alpha parameter of the retrofitting? How did you estimate the value of alpha? Was it a random choice or did you make a parameter search? Currently orig_vec_weight is set to 0.15.
Robyn Speer
@rspeer
Catching up here, sorry
@JohannaOm I'm looking at the code for that, and I honestly don't know. That parameter was added by someone who once worked for me, Robert Beaudoin, and the commit comment for it was "Shard-oblivious retrofitting."
I remember the issue being that we'd had to break up the matrix into shards by column to keep the retrofitting process in memory, and then he noticed that this was causing distortions when we normalized these smaller rows as if they were the full rows, so this new retrofitting code was his fix for that. But I don't know where the parameter came from.
@Totole75 We tried Neo4J around 2012, so perhaps the situation has changed, but it seemed to me that Neo4J may be a good way to access data that's already in Neo4J, or gradually added to Neo4J, but its ability to import data was lacking and required extending the database itself with custom code.
Robyn Speer
@rspeer
And importing data is something we want to do often, repeatedly, not just once, because we want a reproducible build process, not a mutable database that got this way by happenstance.
JohannaOm
@JohannaOm
Hello,
in the paper conceptnet 5.5 you write:
"To avoid manually overfitting by designing our semantic
space around a particular evaluation, we experimented using
smaller development sets, holding out some test data until it
was time to include results in this paper..."
Why did you split the datasets into test and development subsets?
What did you manipulate?
Which parameter did you learn with the development subset.
Robyn Speer
@rspeer
@JohannaOm Regarding "Why did you split the datasets into test and development subsets?" -- so we could experiment with the code without overfitting to the test set. What we manipulated was the code.
It wasn't an automatic parameter search or anything, we were trying different details of how to implement it and seeing if the validation results improved.
Robyn Speer
@rspeer
ConceptNet 5.8 is released! Release notes: http://blog.conceptnet.io/posts/2020/conceptnet-58/
willradice22
@willradice22
Hello! I just recently started looking into and using ConceptNet, and wanted to ask if everything is strictly factual. For example in the research I'm apart of I'd like to have "cat" be associated with "having 9 lives." I realize this is a myth and I don't see that ConceptNet can make this association, but wanted to check. Also does anyone know of some similar database that could make this association?
Robyn Speer
@rspeer
Hello! "Strictly factual" is definitely not something that you can expect from ConceptNet
willradice22
@willradice22
Ok, thank you!
Robyn Speer
@rspeer
I don't think you can actually get "strictly factual" from something that uses natural language, but of course there are different standards you can have about what sources to pull from
willradice22
@willradice22
ok thank you for the help
Hoyeon Hwang
@ghkdghdus777_gitlab
Hello,
I recently started using ConceptNet. And I have a question about the "/relatedness" API.
In "/relatedness" API, it returns the relatedness value, and I wondering about its value.
How to calculated relatedness value?
Robyn Speer
@rspeer
The relatedness value is the cosine similarity of the word embeddings in ConceptNet Numberbatch: https://github.com/commonsense/conceptnet-numberbatch
the API only runs the "minified" version, so it has lower precision and fewer supported languages than the full Numberbatch that you can download