Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Nov 22 15:34
    gsvarovsky edited #6
  • Nov 10 18:54
    gsvarovsky edited #48
  • Nov 10 18:54
    gsvarovsky edited #48
  • Nov 03 06:57
    gsvarovsky labeled #83
  • Nov 03 06:57
    gsvarovsky opened #83
  • Oct 29 07:51
    gsvarovsky edited #2
  • Oct 28 15:35
    gsvarovsky edited #2
  • Oct 25 08:19
    gsvarovsky edited #10
  • Oct 25 07:59
    gsvarovsky edited #10
  • Oct 25 07:59
    gsvarovsky opened #10
  • Oct 19 07:21
    gsvarovsky opened #9
  • Oct 19 07:11
    gsvarovsky edited #3
  • Oct 18 10:31
    gsvarovsky opened #8
  • Oct 17 12:45
    gsvarovsky edited #82
  • Oct 17 10:28
    gsvarovsky opened #82
  • Oct 14 07:34
    gsvarovsky edited #2
  • Oct 14 07:34
    gsvarovsky edited #2
  • Oct 06 16:19
    gsvarovsky commented #25
  • Oct 03 14:36
    gsvarovsky edited #74
  • Oct 03 14:35
    gsvarovsky edited #74
peeja
@peeja:matrix.org
[m]
Sure, that sounds great! Thanks for pinning it down!
I forgot there were separate dateTime and dateTimeStamp. The difference is timezone handling, right?
George Svarovsky
@gsvarovsky
In my understanding, yes. I just read a comment in Comunica that it doesn't make any difference to SPARQL (though obviously it does matter to comunica in some way).
peeja
@peeja:matrix.org
[m]

We use an undefined completed property to mean uncompleted, and we provide support for @not...@exists.

I have a marginal preference for not relying on a closed-world assumption here, only because I haven't had to yet. I think I'd rather affirmatively assert that the task is incomplete, since I know that that's a true fact.

George Svarovsky
@gsvarovsky
Ah! Nice. It just occurred to me that completed: false would be sweet. But overloading the property's type might feel a bit icky.
peeja
@peeja:matrix.org
[m]
Hmm. I'm getting a lot of DatasetEngine Cannot connect to remotes due to Error: Send timeout exceeded. when I bring up my second clone, and I think it's just because I have so much data. Do you have a sense of how much is reasonable to be shuffling in a single domain?
George Svarovsky
@gsvarovsky
If it fits in your environment but is timing out that's a bug 🧐– the snapshot handshake (the timed part) should be fast even if the snapshot takes a while to get delivered. Can you share your dataset, and the persistence & remotes you're using so I can test it out?
peeja
@peeja:matrix.org
[m]
Right now the dataset has a bunch of personal info, but I'll see if I can repro it with dummy data.
Looking a bit deeper, it looks like the issue only happens when the web client is doing reads. If I have it connect but not read, it's still slow to connect, but it works. The (admittedly, likely intense) reads seem to be bogging things down enough to cause a timeout. But I'm still surprised to see that appear in the server output. Either there's some synchrony in the browser clone where there shouldn't be, blocking communication with the server clone, or it just hits the CPU so hard it grinds he async connection to a halt. :/
peeja
@peeja:matrix.org
[m]
Hmm, never mind, I think it's just flaky. Looks like I'm still getting it now with all the reading commented out.
George Svarovsky
@gsvarovsky
You have a server clone and a browser clone? Cool. Is the "second" clone another browser clone?
What remotes are you using?
peeja
@peeja:matrix.org
[m]
I've just been running with a Node clone and a browser clone, using Socket.io
George Svarovsky
@gsvarovsky
Part of the handshake involves the snapshot source waiting for an ack from the client. That's probably what's timing out.
peeja
@peeja:matrix.org
[m]
The Node clone is watching the OmniFocus database and reading in updates (which works well, since each update comes in as a new XML file)
One thing that's odd to me: it looked like the errors I was getting in the Node clone and the browser clone were pretty much the same
So, I'd refresh the page, it would start to connect the new browser clone, and both my server-side log and the browser console would say there was a Send timeout exceeded
George Svarovsky
@gsvarovsky
memdown on the client?
peeja
@peeja:matrix.org
[m]
Yep
Actually, memdown on both of them, for now
George Svarovsky
@gsvarovsky
sorry I'm just about to join a call, I'll be back later...
peeja
@peeja:matrix.org
[m]
Sure thing, thanks for the help!
George Svarovsky
@gsvarovsky
In general you probably are bouncing off the limitations of the developer preview. It can be pretty CPU-heavy because of the layering – there's optimisation that we need to do, as the API stabilises. @jacoscaz is also doing some great work looking at query performance in quadstore which should help a lot. In the meantime I'd love to look at your setup more closely in case there's anything I can do.
Maybe together we could put together a test repository that simulates the main moving parts of your project, with some made-up data?
Jacopo Scazzosi
@jacoscaz
Hi all! @peeja:matrix.org as George mentioned, we're currently paving the way for very significant performance work when it comes to how SPARQL queries are handled by quadstore and comunica, which m-ld uses internally. Other than that, which applies to almost all use cases, leveldb backends do have their own peculiarities which often compound the aforementioned inefficiencies. Having a test repository would definitely help pinpoint the issue.
Jacopo Scazzosi
@jacoscaz

The (admittedly, likely intense) reads seem to be bogging things down enough to cause a timeout.

What kind of query are you using for reads?

peeja
@peeja:matrix.org
[m]
It's not so bad now; before it was relying more on @filters over broader @graph patterns, and I think that was more intense. But I'm also using a hook which re-reads on every update to get a live view. I don't think that's the problem here, as the updates aren't actually frequent and the error I'm talking about here happens during the initial snapshot load. I had been wondering if I was having lots of async reads stacking on top of each other, but I don't think that's going on. (Or…can even go on? I'm not clear on how serialized the underlying data access is.)
peeja
@peeja:matrix.org
[m]
Also, it looks like doing a @describe was too heavy, so now I'm just getting the name property. The theory is that the query would be matched to whatever some React component needed to display.
George Svarovsky
@gsvarovsky
If you read while the snapshot is downloading, those reads will indeed queue up. Are you waiting for the clone to report that it's not outdated (https://js.m-ld.org/#initialisation )?
George Svarovsky
@gsvarovsky
I'm now thinking I may have introduced a bug with a refactor of how I wrap query results, for the snapshot. I'll take a look in the morning...
peeja
@peeja:matrix.org
[m]
Oh, I'm not! I probably should, but I'm also only reading once and then again in response to updates through follow(). It seems like follow() isn't triggered during the snapshot load; is that correct?
George Svarovsky
@gsvarovsky
That is correct. For a new clone (with memdown it's always new on a page refresh) then follow won't report anything until after the snapshot.
Michiel de Jong
@michielbdejong
"m-ld can also work with a fully peer-to-peer messaging system, to realise complete architecture decentralisation in next-generation internet apps." https://m-ld.org/doc/#messaging -> in that case, would messages also be forwarded? And if so, how do nodes know when not to forward a message, to avoid a storm?
It's something Scuttlebutt uses Merckle trees for, I think (to know which messages to forward to a given neighbor and which not)
George Svarovsky
@gsvarovsky

Hi @michielbdejong ! Yes, it's an interesting question. For clarity, m-ld does not mind how messages get from one clone to another. It has the simple requirement of FIFO delivery for any pair of clones, but after that, any transport or protocol will probably do. So, it's up to the application architecture.

There is list of possible alternate messaging providers here, with decentralised options: m-ld/m-ld-js#72

There is also a theoretical possibility in m-ld's own protocol that a split-brain scenario could generate a 'storm' of messages when the split heals, regardless of the messaging provider used. This is something we want to look into with tests at scale.
Michiel de Jong
@michielbdejong
So in a domain with 5 participants, each operation leads to 4 messages being sent? Would it be possible to optimise this if the network has for instance a ring architecture? Then 1 message could just be forwarded until it comes around full-circle, instead of sending loads of messages with identical message bodies over the same links
George Svarovsky
@gsvarovsky

I think at least 4 messages always need to be sent (because the payload has to arrive at every node), but you can change who sends each one. With a mesh, the operating node sends four messages. With a hub-and-spoke, the operating node sends one message and the hub (broker) sends four. With a gossip ring, everyone sends one message except the last node in the ring (ideally).

But each option also has different effects on resources and on reliability. A mesh requires a connection from every peer to every other one; gossip usually requires redundancy in case anyone is down; and of course hub-and-spoke needs a broker. Lots to consider.

Something like Scuttlebutt (I'm not an expert) which inherently persists messages, might be overkill for m-ld, but if it's already in the architecture for your app it might be appropriate.

Michiel de Jong
@michielbdejong
Right! Thanks.
Rather than Merckle trees to find the knowledge gaps, you could just request messages from either the source or from any other node with a ?since= parameter
Michiel de Jong
@michielbdejong
Pulling messages through a sparsely and intermittently connected network would probably be much simpler than pushing them through
George Svarovsky
@gsvarovsky
Yes! That's more-or-less what happens when m-ld itself detects (or expects) that it's missed something. You could extend that to also happen under normal operation, so you're effectively just polling for recent messages, like RSS. You would need to use the logical clock for the since value though, and this would put load on whoever was replying.
Well, pulling requires you to know who to pull from, which means you need addressing...
Michiel de Jong
@michielbdejong
if that's the problem then you can push a "do you need anything?" message, and the query could be in the reply of that
this message could actually serve both ways so it would be more "what do you have?"
and that could then also include discovery of newly added nodes
would be fun to write such a messaging layer for m-ld. i wonder if something like that already exists, i'll do some research
George Svarovsky
@gsvarovsky
Interesting! To re-iterate, the messaging layer is very open, so you should be able to do anything. There are two levels of support already existing: just the basic interface, and a Pub-Sub base class which simplifies things for such kinds of messaging systems. (e.g. see m-ld/m-ld-spec#6)
Michiel de Jong
@michielbdejong
Thanks!
Michiel de Jong
@michielbdejong