Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Sep 20 11:48
    tesYolan commented #274
  • Sep 20 11:47
    tesYolan closed #274
  • Sep 20 11:47
    tesYolan opened #274
  • Sep 19 13:35
    amirouche opened #273
  • Sep 14 14:34
    PabloDino commented #71
  • Sep 06 20:38

    rspeer on code-review-20190906

    (compare)

  • Sep 05 16:08

    rspeer on fix-quick-test

    (compare)

  • Sep 05 16:08

    rspeer on master

    modify test db creation fixture… Merge pull request #271 from co… (compare)

  • Sep 05 16:08
    rspeer closed #271
  • Sep 05 14:34
    vanyacohen edited #271
  • Sep 05 14:34
    vanyacohen review_requested #271
  • Sep 05 14:33
    vanyacohen review_requested #271
  • Sep 05 14:33
    vanyacohen opened #271
  • Sep 05 14:32

    vanyacohen on fix-quick-test

    modify test db creation fixture… (compare)

  • Sep 04 20:37

    rspeer on fix-test-db

    (compare)

  • Sep 04 20:37

    rspeer on master

    adds re-creation of test db eac… Merge pull request #270 from co… (compare)

  • Sep 04 20:37
    rspeer closed #270
  • Sep 04 20:21
    vanyacohen review_requested #270
  • Sep 04 20:20
    vanyacohen review_requested #270
  • Sep 04 20:20
    vanyacohen opened #270
晴也多云
@zhangyi0903_twitter
Hi all~ I'm new here. I want to produce the embeddings based on my own data which is similar to conceptnet format. I find that there are only some functions in the concepnet5/vectors dictionary. Are there instructions about how to call these functions to regenerate the embeddings?
the process of building the data is controlled by Snakemake, with rules in the file named Snakefile. ./build.sh will run that process, including building embeddings. By default it'll only build mini.h5, the filtered and quantized version we serve on the Web, but you can then run snakemake data/vectors/numberbatch.h5 to build the full version
Robyn Speer
@rspeer
@itsmeblair I'm wondering if this alternative implementation of ConceptNet, "conceptnet-lite" by Anna Rogers, could do what you want: https://github.com/ldtoolkit/conceptnet-lite
it involves a SQLite DB that can be queried locally
Robyn Speer
@rspeer
ah that was what rominf was working on here!
@rominf I've started trying out conceptnet-lite. The ability to access ConceptNet with just a SQLite database is excellent. So far I'm not a fan of the API being just a thin wrapper on Peewee, for a few reasons.
We shouldn't have to describe exactly how the database is normalized just to be able to look things up in it. Like I have to spell out label = Label.get(text='example', language=Language.get(name='en')). I'd prefer something like label = Label.get(text='example', language='en').
or, because lots of outputs are URIs, I'd like to say concept = Concept.get(uri='/c/en/example'), but that gives a baffling DB error.
Robyn Speer
@rspeer
Interacting with these objects that come from the DB also leaves something to be desired:
>>> label
<Label: 22441>

>>> label.concepts
<peewee.ModelSelect at 0x7f9a5c072eb8>

>>> list(label.concepts)
[<Concept: 29674>,
 <Concept: 4168725>,
 <Concept: 4168727>,
 <Concept: 4168729>,
 <Concept: 19047468>,
 <Concept: 23347195>,
 <Concept: 26527584>,
 <Concept: 26527585>]
I'm wondering if you could support a way to query edges that's more like AssertionFinder.query() in conceptnet5.db.query, where you supply a dictionary of criteria to match, whose values are URIs.
Roman Inflianskas
@rominf
@rspeer Thanks for testing the library. Yes, I'm the primary author. I know about all the limitations and agree with you. I've wanted to improve the things you've described, but I just worked on this library for too short (a few weekends). The library is already usable but needs polishing, and probably we've announced it a bit prematurely.
Anna Rogers
@annargrs
@rspeer Thanks again for testing! We're actually doing this as a part of a larger project, and we thought we'd release this part because it might be useful for other ConceptNet users. I agree, it's better to not have to know anything about the database, but this is all volunteer work, so the time we have is limited. @rominf, for what queries is it possible to get it to look a bit less like Peewee quickly?
Roman Inflianskas
@rominf
Ok, I've filled three issues, that are trivial to implement. I will solve them on Saturday. Other things are a little bit harder to achieve, so I'm not sure about them.
Roman Inflianskas
@rominf
PRs are welcome :)
amirouche
@amirouche
Do this kind of concept /c/en/last/v/wikt/en_2 have documentation somewhere?
amirouche
@amirouche
@rspeer the website is down.
amirouche
@amirouche
it is back.
Paul Dilyard
@pdilyard
Hello, I'm considering using conceptnet as part of a closed-source, company project. I've read the FAQ about attribution, but I just want to make sure I'm 100% clear on the proper way to handle this. Are there any other commercial tools that use conceptnet and feature examples of proper attribution?
Robyn Speer
@rspeer
I don't know one off the top of my head. I can think of examples where software licenses like Apache require attribution, and the way that commercial product developers mostly deal with it is to put a "credits" link on some menu somewhere that includes all the attributions.
For example, the curl license requires attribution, and I think I've seen its author proudly post photos of like a car stereo display that's showing the attribution.
Paul Dilyard
@pdilyard
Ok. Also, if they expand on conceptnet using their own internal data, are they required to release those changes? (unclear on the "share-alike" requirement)
(e.g. extending the graph with private customer data)
Robyn Speer
@rspeer
So, if the graph is an input to something like a machine learning process, you don't need to release the output of that thing. But if you're distributing a graph that includes ConceptNet outside of your company, it does need to follow the same license.
Paul Dilyard
@pdilyard
Gotcha. It wouldn't be distributed directly, it would just be used to supplement user-facing features
Robyn Speer
@rspeer
okay. Just make sure not to redistribute ConceptNet as part of the process
Paul Dilyard
@pdilyard
Right - ok that clarifies it. Thanks for the help!
Robyn Speer
@rspeer
Like I'm a little nervous about "it wouldn't be distributed directly" because "indirectly" is often still a problem.
Paul Dilyard
@pdilyard
We're thinking a lot more "average consumer user"-facing. Like, user enters a search team and we have a search tool powered by conceptnet + other customer data. The users would theoretically never know we were using conceptnet apart from the attribution text. They see search results, not a download of the graph. What are your thoughts on that?
Robyn Speer
@rspeer
That sounds fine.
Paul Dilyard
@pdilyard
Great. Is this the best place if we have any additional questions? We definitely don't want to be in the business of mis-using others' work
Robyn Speer
@rspeer
Yep, this works
Paul Dilyard
@pdilyard
:+1:
Yuri
@yurivish
Hi! Is there a way to use the ConceptNet API to query for related terms and get back not only the term labels but their vectors too (i.e. a more powerful version of http://api.conceptnet.io/related/c/en/tea_kettle), or alternatively an API endpoint to look up the vectors for a bunch of terms at once?
amirouche
@amirouche

I looked around everywhere in the source (and even made a tiny PR) but I can not find the code that parse wiktionary parsoid or html and that generates relations.

Was the code that parse wiktionary published somewhere?

Robyn Speer
@rspeer
we start from the raw Wikitext, not from Parsoid. I can see why Parsoid would solve many (not all) of the complexities of parsing Wikitext. Is there a public API of Parsoid to experiment with, or do I have to set up their code?
I knew I didn't want to work with the "official" parser (MediaWiki itself) because it produces HTML that loses most of the semantic structure of the page. But I can see how Parsoid preserves the semantics.
Robyn Speer
@rspeer
on the other hand, "separation between logic and presentation" is still listed as a future stretch goal, and the lack of such a separation is the big problem I have with dealing with Wikitext-parsed-into-HTML. Maybe they preserve enough information to cope with that, I don't know.
amirouche
@amirouche
Thanks a lot for the reply I will study that code.
There is REST API that can return parsoid data: https://en.wiktionary.org/api/rest_v1/
it is also available in https://en.wikipedie.org/api/
Robyn Speer
@rspeer
ooh yuck. If I wanted to benefit from Parsoid, it looks like I'd have to modify Parsoid-the-software, because Parsoid-the-API returns an incomprehensible unstable format such as https://en.wiktionary.org/api/rest_v1/page/data-parsoid/test/54189349/23131650-db79-11e9-8016-bb4d063c1417
so I feel okay about reinventing the wheel in Haskell
amirouche
@amirouche
I am certain parsing wiki markup is the Right Thing IF / WHEN one has a wikimarkup parser :)
because wikimarkup is the source language it has more information.
Robyn Speer
@rspeer
yeah
Robyn Speer
@rspeer
it's just that wikitext is a horrible language to parse. It's context-sensitive, its syntax isn't even a tree, and it's Turing-complete in one intentional way and a different accidental way. If there were a general-purpose way to offload that I would have been excited for it to be not my problem anymore, and since you mentioned Parsoid, I looked into it