Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Daniele Nicolodi
    @dnicolodi

    Hello, to make a long story short, I am have some data for which I think could be nicely modeled as an RDF graph and for which SPARQL could be an effective query language. I don't know much about RDF or SPARQL or RDFlib, thus I started with a tiny example to familiarize myself with the concepts:

    import rdflib
    from rdflib import FOAF
    
    graph = rdflib.Graph()
    
    for n in range(0, 1000):
        node = rdflib.term.BNode()
        graph.add((node, FOAF.age, rdflib.term.Literal(n)))
        graph.add((node, FOAF.name, rdflib.term.Literal(f'Name{n:}')))
    
    rows = graph.query("""SELECT
                            ?name ?age
                          WHERE {                 
                            ?x :age ?age . FILTER(?age > 998) 
                            ?x :name ?name . }
                          ORDER BY ?age""",
                       initNs={'': FOAF})
    
    for name, age, in rows:
        print(name, age)

    This simple test runs in about two seconds on my laptop. Unfortunately this kind of performance would not be sufficient to work with my real data. Am I doing something wrong, or is the processing in RDFlib the bottleneck? Is there a way to speed this up?

    Thank you!

    Thomas Tanon
    @Tpt
    Hi Daniele! Rdflib is written in plain python. It is designed to be feature complete, easy to use and extend but is not much optimized for performance.
    Anyway, your code takes 0.95s on the first run in my laptop and only 0.2s if I restart it while keeping the "imports" loaded using a REPL. If I drop the printing and replace range(0,1000) by range(0, 100000) it takes 18s.
    Tory Clasen
    @tclasen
    @dnicolodi , without profiling the code you I can make a guess, if you create you build your list of triples as an iterator such as a list-comprehension, you can do a bulk insert into the graph using addN: https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.graph.Graph.addN
    Iwan Aucamp
    @aucampia
    hi, not sure if this is the best forum to ask, are you open to PR for adding linting ?
    Ashley Sommer
    @ashleysommer
    Hi @aucampia
    What kind of linting are you suggesting? We already require PEP8 compliant pull-requests, and we also strongly suggest all contributors use black on their code before creating a PR (though we don't enforce that).
    We've had the discussion about linting many times in the past. The crux of the matter is we want to keep the barrier to contribution as low as possible for users of RDFLib. And the kinds of people who use RDFLib are not necessarily software engineers. We want researchers, academics, scientists, semantic extereprts, ontology experts, etc to be able to contribute to RDFLib. I don't like to stereotype or put people into categories, but in my experience a lot of experts do not want to deal with jumping through hoops to make their code compliant.
    Iwan Aucamp
    @aucampia
    @ashleysommer thanks for the reply, and valid points. I was thinking PEP8, specifically autopep8, I guess I read right past it, but when I ran autopep8 on the codebase there were many warnings, so I guess I just assumed it was not being used, but I guess it makes sense to only apply it to pull requests.
    Iwan Aucamp
    @aucampia
    so I am looking at this, to add tests for transitive_objects and transitive_subjects: https://github.com/RDFLib/rdflib/blob/master/examples/transitive.py
    nvm, I get it I think
    Iwan Aucamp
    @aucampia
    are there some standard graphs that you run tests on?
    Iwan Aucamp
    @aucampia
    I made a PR here: RDFLib/rdflib#1307 - suggestions for tests welcome
    Iwan Aucamp
    @aucampia
    This issue can be closed: RDFLib/rdflib#1279
    Any suggestions on tests here would be appreciated: RDFLib/rdflib#1291
    Iwan Aucamp
    @aucampia
    Are there any plans to make further 5.x releases?
    And if not can I look at it?
    white_gecko
    @white_gecko:matrix.org
    [m]
    As I see it, we have merged some breaking changes already so we will head for 6.x
    Iwan Aucamp
    @aucampia
    Well we could branch 5.0 and cherry pick some
    but I guess if there is no big pressure it is not needed, if 6.x is not to far on the horizon it makes sense
    Iwan Aucamp
    @aucampia
    This should be closed also: RDFLib/rdflib#1291
    Iwan Aucamp
    @aucampia
    What is the policy on type annotations, are they allowed? Encouraged/Discouraged?
    Iwan Aucamp
    @aucampia
    Asked it here also: RDFLib/rdflib#1311
    Have you considered enabling the GitLab Community features?
    So there is a place to ask questions like that without making issues
    Iwan Aucamp
    @aucampia
    why is there so little funding for RDF :/ - really need more people on RDFLib
    If I were in control of universities I would get people to make PRs for things like this instead of dumb coding assignments
    Iwan Aucamp
    @aucampia
    @dnicolodi if you want performance (and better maintenance) try using rdf4j or jena - they are both fast and get more maintenance, though you will need to run on JVM
    white_gecko
    @white_gecko:matrix.org
    [m]
    @aucampia: you are right, there is a lot to do in the rdflib. We try our best in maintaining it and are happy about contributions to improve it.
    Also there is some activity in improving the performance of th rdflib
    Iwan Aucamp
    @aucampia
    recommendation of rdf4j and jena is not meant to be disparaging of rdflib, I also use rdflib mostly because most of the time I don't want to struggle with JVM and JVM does not have pip, pipx, etc - just don't want people to not use RDF because of a performance concern of rdflib
    rdflib is awesome for what it is
    but if I were to build something production grade that needs good performance I would use Jena or RDF4J
    I think the best hope is to find more commercial applications for RDF
    The more commercial use the more funding and more contributions
    But if universities were better actors in the ecosystem it would help, if they instead direct resources to maintaining existing stuff instead of making yet another research project that will be abandoned it would be very beneificial
    Iwan Aucamp
    @aucampia
    I am going to submit a fix for tox this weekend, and then also submit changes to add mypy to CI pipeline (and eventually to tox) - hope it is well received. I will also try and make CONTRIBUTING.md similar to this: https://github.com/pallets/click/blob/main/CONTRIBUTING.rst
    better to have that in Repo IMO
    white_gecko
    @white_gecko:matrix.org
    [m]
    @aucampia: If you are planning to invest more time in performing big changes or something alike, we can also try to schedule a call with @ashleysommer and Nicholas to so how we can best organize this.
    Iwan Aucamp
    @aucampia
    I am open to it, I want to help where I can but I don't really have dedicated time for it. For me it is easier to navigate the code base with type annotations. Besides this I am just looking at small issues to try and get a better understanding of the code base. I will look at the backlog for 6.0.0 where I can, some of it looks quite complicated, the RDF 1.1. test suite looks like a decent thing to try do and not that complicated, but I have not quite built up the courage for starting on that. Actual bug I will look at next is RDFLib/rdflib#1228
    Iwan Aucamp
    @aucampia
    what is best is to just make sure milestone backlog is current and prioritized
    But I think that is the case, or have no reason to think it is not the case
    Iwan Aucamp
    @aucampia
    I don't think this is still current: RDFLib/rdflib#556
    white_gecko
    @white_gecko:matrix.org
    [m]
    Did you test this?
    Iwan Aucamp
    @aucampia
    no but it does not inherit from unicode anymore, though the original problem may still be present yes
    Iwan Aucamp
    @aucampia
    does @nicholascar come here?
    1 reply
    If there are any PRs you want me to review I can have a look, I keep an eye out for small and easy ones where I have some knowledge about the implementation but the more complex ones I won't necessarily review unless someone makes an explicit request
    remi.chateauneu
    @remi.chateauneu:matrix.org
    [m]
    Test
    Iwan Aucamp
    @aucampia
    Hi Remi
    bridge works well
    Thanks for review BTW