Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
Tomasz Pluskiewicz
@tpluscode
someQuads is indeed just an array. and toStream just wraps them in Readable and pushes those quads. this is how all RDF/JS sinks I've used work. I cannot just serializer.import(array). the param has to be a stream
hence I hoped I would use the graphy writer uniformly
Blake Regalia
@blake-regalia
@tpluscode i am noticing that .import(stream) returns the passed argument stream rather than this. So you could change your code to:
import * as writer from '@graphy/content.ttl.write'

const quadStream = toStream([ ...someQuads ])
const turtleWriter = writer();
turtleWriter.import(quadStream)

let turtle = ''
turtleWriter.on('data', chunk => { turtle += chunk })
Tomasz Pluskiewicz
@tpluscode
ah, so that's what's happening. makes sense. thank's I will give it a go
Tomasz Pluskiewicz
@tpluscode
does the input value have to be in the concise hash form to produce fully compact turtle? (bnode abbreviations, merging objects, etc)
Blake Regalia
@blake-regalia
no, but there are several Turtle features that cannot be used when given RDFJS objects such as collections, comments, anonymous blank nodes, nested blank node property lists, etc. Concise hash will let you do any of those
For your use-case i would suggest using the scriber @graphy/content.ttl.scribe instead of the writer. it will pretty-print RDFJS quads quite well
Blake Regalia
@blake-regalia
just as an example, creating some rdfjs objects by using spread operator on factory.c3:
import * as factory from '@graphy/core.data.factory';
import * as ttl_scribe = '@graphy/content.ttl.scribe';

const PREFIXES = {
    dbr: 'http://dbpedia.org/resource/',
    dbp: 'http://dbpedia.org/property/',
    dbo: 'http://dbpedia.org/ontology/',
    dt: 'http://dbpedia.org/datatype/',
    dct: 'http://purl.org/dc/terms/subject',
    'umbel-rc': 'http://umbel.org/umbel/rc/',
    xsd: 'http://www.w3.org/2001/XMLSchema#',
    rdf: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
};

const rdfjs_objs = [...factory.c3({
    'dbr:Banana': {
        a: ['Plant', 'EukaryoticCell', 'BiologicalLivingObject']
            .map(s => `umbel-rc:${s}`),
        'dct:subject': ['Edible_fruits', 'Staple_foods', 'Bananas']
            .map(s => `dbr:Category:${s}`),
        'dbo:wikiPageID': 38940,
        'dbp:carbs': '^dt:gram"22.84',
    },
}, PREFIXES)];

const scriber = ttl_scribe({
    prefixes: PREFIXES,
});
scriber.import(toStream(rdfjs_objs));
scriber.pipe(process.stdout);
prints this:
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix dbp: <http://dbpedia.org/property/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix dt: <http://dbpedia.org/datatype/> .
@prefix dct: <http://purl.org/dc/terms/subject> .
@prefix umbel-rc: <http://umbel.org/umbel/rc/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

dbr:Banana rdf:type umbel-rc:Plant, umbel-rc:EukaryoticCell, umbel-rc:BiologicalLivingObject ;
    dct:subject dbr:Category:Edible_fruits, dbr:Category:Staple_foods, dbr:Category:Bananas ;
    dbo:wikiPageID "38940"^^xsd:integer ;
    dbp:carbs "22.84"^^dt:gram .
@tpluscode
Tomasz Pluskiewicz
@tpluscode
finally gave that a go @blake-regalia. the confusing part is scribe vs write. from the docs I assumed that scribe is the "simpler" serializer and it would not do smart formatting. whereas about write it says stylized RDF serialization
two followup questions for scribe:
  1. is it possible to set base URI?
  2. is it possible to have blank nodes shortened to [ ]?
Blake Regalia
@blake-regalia
@tpluscode yea, the writer is meant for feature-rich pretty-printing and scriber was added later to complement, but for historical reasons the writer will not condense pred/object pairs in turtle for RDFJS quads only, but that will change in the next major release
re 1: base URIs are not used by any of the serializers since this has never seemed like a needed/wanted feature; but i would like to push the writers now to support the full range of language features including optional use of long literals and so on, i'll make a note of adding base URI
re 2: anonymous blank nodes are supported via concise hash objects, which are currently graphy-specific. you can pass these in programatically. this is a unique feature that no other libs offer since these types of blank nodes are 'ephemeral' in a stream and can never be referenced again once they are serialized
Tomasz Pluskiewicz
@tpluscode
hm, curious about the ephemeral blanks. the docs only show exactly []. does it ever serialize their properties as [ :foo :bar ]?
Blake Regalia
@blake-regalia
it does not serialize blank node property lists, altho that would be cool -- seems doable with concise hash objects
Tomasz Pluskiewicz
@tpluscode
I guess the only missing piece is getting for a quad stream to the concise hash
maybe graphy could provide a function which turns the entire dataset/stream into a concise hash?
Blake Regalia
@blake-regalia
yeah actually it will do that thru https://graphy.link/memory.dataset.fast
which is basically an in-mem store. load quads into it, then pipe to any serializer. it will sort quads which ends up pretty-printing them in trig/turtle but as of now won't perform any blank node magic
Adrian Gschwend
@ktk
@blake-regalia does any of your RDF parsers give hints on what the error was on parsing?
I could not find that so far and it's really hard for newbies to figure out what's wrong that way
Thomas Bergwinkl
@bergos
hint would be row + column of the string that causes the problem i guess
Blake Regalia
@blake-regalia
on parsing error it prints the name of the production it expected and a substring of the input text that lead to the error
this is one of the things that i've been wanting to improve tho and unreleased v5 adds line/column tracking to make error reporting better
Adrian Gschwend
@ktk
@blake-regalia sounds great, timeline?
Thomas Bergwinkl
@bergos
@blake-regalia what would be line/column tracking? it would be also nice to have line/column as additional property on every term. maybe just optional for performance/compatibility reasons. that would help to generate better error message one level higher, for content validation. if you consider adding that feature, it think it would be worth syncing with other parser developers to use the same structure. e.g. use just a single property origin/source for an object. line and column should be straightforward, but maybe we can also define dom for rdfa or any browser related stuff. if we could agree on something, it could be used by shacl and similar libraries.
Blake Regalia
@blake-regalia
Very interesting idea @bergos I definitely see value in standardizing a source mapping interface in the rdfjs space, especially for some langs that transpile to turtle such as shexc or sparql generate - something similar to https://github.com/mozilla/source-map
@ktk alpha release for parsers targeting end of Feb
Adrian Gschwend
@ktk
@blake-regalia excellent, looking forward!
Adrian Gschwend
@ktk
@blake-regalia two more questions for turtle-serializer, any plans on nested bnodes with [...] and collections with (…)?
Blake Regalia
@blake-regalia
@ktk have you seen this issue?
blake-regalia/graphy.js#39
Adrian Gschwend
@ktk
@blake-regalia I have not, thanks for the pointer!
Blake Regalia
@blake-regalia
in short, figuring out if the bnodes and collections are safe to serialize as such given an rdfjs dataset is a bit tedious, but if you use c3/c4 objects to serialize it will output as expected
Adrian Gschwend
@ktk
yep I saw the linked example, nice
and lists are in there too
Blake Regalia
@blake-regalia
yep 👍
Adrian Gschwend
@ktk
is there a way to come from a dataset to c3/c4 or is the idea that this is done outside of an rdfjs dataset structure?
Blake Regalia
@blake-regalia
the idea behind c3/c4 is to give the developer a structure they can transform data into (e.g., flat => rdf) before passing it to the serializer
if the requirement is to come from an rdfjs dataset, then there is not much point to turn it into c3/c4
computing bnode closure is part of the normalization algo which might help getting you halfway there, but deducing if a set of statements qualifies for turtle collection serialization would be quite hairy
Adrian Gschwend
@ktk
@blake-regalia right I see what you mean. in our use-case we manupulate something that we add to revision control in git
and there are good reasons why we do that, so turtle is a good fit
this is something that will not become super big so it's not unreadable
so we load rdf, manipulate in JS the structures, serialize back to git
Blake Regalia
@blake-regalia
i see. interesting use-case. i've actually thought about a parser that creates a single c4 object in memory which would preserve the entire syntactic structure of a turtle doc including bnodes and collections but haven't prioritized its development at all. that would probably satisfy what you're after
@ktk
Adrian Gschwend
@ktk
@blake-regalia that would be super useful