Would it be worth defining which subset - if a subset at all - of EventEmitter
is used by https://rdf.js.org/stream-spec/ ?
I was looking at bundle sizes and noticed that most bundles produced by Webpack 4.x tend to pull in the https://www.npmjs.com/package/events package as a replacement for the native one. However, that's significantly heavier than other options such as https://www.npmjs.com/package/@protobufjs/eventemitter, which only implement a subset of it. We only seem to use .on()
method after all so it'd be nice to refer to a subset, just like we do for streams.
readable-stream
and/or native stream
. the web experience with RDF/JS is sadly suboptimal
quadstore
and what to prioritize next.
Comunica
probably deserve to have a gitter of their own not to add too much noise in here but what about smaller projects like quadstore
? Keeping the conversation limited to a few rooms would probably help the community grow tighter.
rdfjs/public
in quadstore
's README as a couple of users have asked about a dedicated gitter room.
every library that generates terms/quads should accept a factory argument to share a single factory
If I understand what you are saying, this implies that using more than one factory instance across a single RDF/JS project is not supported by the spec and may lead to data corruption, yes?
Store
interface in a manner that allows the store the be re-used across process restarts (on the backend) or page reloads (on the frontend). For example, the JSON-LD parser https://github.com/rubensworks/jsonld-streaming-parser.js produces colliding blank nodes when used to parse completely different documents across multiple restarts. I don't think this is an issue with the parser itself, though, as its behavior is coherent with the semantics of blank nodes (and so is the behavior of every other library I've tested, this is just an example).
quadstore
to address this, I think that store implementations should be responsible for handling potential blank node collisions.
@bergos - thank you for brainstorming with me!
the right way of handling blank nodes would be a map that translates the blank node ids from the document to blank node instances generated from the given factory
Indeed, this is what I'm doing within quadstore
ATM. I maintain persistent "scope" instances, each a collection of blank node mappings that can be re-used across write operations. If a scope is provided to a write operation, quadstore
translates blank nodes according to the existing mappings and adds a new mapping for each previously-unencountered blank node. Scope mappings are persisted incrementally (and atomically with newly-written quads) as they are added to each scope and each scope can be dropped from the store when not needed anymore. This does not require any specific factory or generator, it's a feature of the store itself. However, this is not (yet) available in the methods coming from the RDF/JS Store
interface as I'd rather keep those aligned with the spec.
maybe i would like to use a specific factory for a good reason. if i can't hand over the factory to your store i have to do the mapping.
that's something we wanted to avoid when we started working on the RDF/JS specs.
Yeah, I understand. This is partially the reason behind my current approach - users are completely free to pass whatever factory they want to quadstore without any performance penalty even when using scopes.
Thinking about your comments, I guess my point is two-fold:
a) I think it would be best for the RDF/JS spec to explicitly mention the issue of blank node collisions to save users and developers that are not familiar with RDF a few likely headaches (one can spend their whole career without ever hearing about existential variables);
b) I think the RDF/JS spec should settle on a shared strategy to prevent blank node collisions, and this strategy should be compatible with persistent stores.
rdf-canonicalize
over the next few days and see how much of a performance impact it has. From a brief look at the code I expect it to have a significant impact. Any idea on how I might incrementally store and re-hydrate those blank node mappings with it?