Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
Aditya Mukhopadhyay
@adityamukho
Explicit commits is in some sense the opposite. It would be used in a scenario where a document (vertex/edge) was created/updated/deleted outside of the knowledge of RecallGraph. Calling this method would make RG tally the specified node paths against its event log, determine where the event log is lagging behind and add appropriate events to make it "catch up".
But it is not a way to allow the event log to "lead" the actual object graph. I don't think there will be a use case for that.
Gary Gendel
@ggendel
I'm just looking for a way to seed RecallGraph with existing data. The alternative is to create each node and edge from scratch. This becomes a bit of a pain to build a map of old to new nodes in order to map the edges. This would be done only to set the initial condition when RecallGraph is installed on existing database. A "create in place" would allow me to do this without having to deal with dependencies and mappings.
Aditya Mukhopadhyay
@adityamukho
Ok I think we're both talking about the same scenario. Consider the following:
  1. You have a database with existing data on which you want to install RG to enable temporality from this point onwards.
  2. Just after installation, RG's event log is empty so it is unaware of existing DB records. There is no recorded history so to speak.
  3. In other words, there are documents in the database whose corresponding event log entries are absent. This means the event log is lagging behind the actual database entries (the object graph of user-entered data)
  4. Running an explicit commit makes RG scan the database (path = '/') and update its event log with entries to represent the existing object graph (CREATE events in this case)
  5. After this operation, RG's event log is in sync with the object graph.
From here onwards, you can use RG for further writes. If it goes out of sync again (due to external writes), you can run explicit commits on the affected path again to close the gap.
Gary Gendel
@ggendel
That's exactly what I need. For the time being, I did a "read table->truncate table->create objects in RG" procedure until it's ready.
Gary Gendel
@ggendel
I've added functionality in my foxx app to purge the object's history when I "obliterate" an object in my database. The "obliterate' operation is only enabled during regression testing and is user to selectively clean things up (restricted to the test objects) before starting each test. It took a couple of iterations to figure out the right approach to get all the right items in the collections and nothing more, but this was the last piece I needed before I could put it on Jenkins for nightly regression testing of my code branch.
Aditya Mukhopadhyay
@adityamukho
Awesome! If you want, I can take a look at your queries to make sure you're catching everything.
Gary Gendel
@ggendel
Cool. The process is relatively straightforward.
  • I get the skeleton vertex and edge_hubs connected to the object (v,e,p IN 2..2) and delete skeleton edges and vertex. This gives me a list of the objects that will be obliterated.
  • I collect all of the events associated with those objects.
  • From this list: I delete all associated snapshot information (event_snapshot_links, events, commands, snapshot_links, and snapshots).
  • From this list I delete the associated events.
  • Finally I delete any associated commands .
Gary Gendel
@ggendel
If I delete all objects, what remains is only the root origin document. My testing shows that I only obliterate the information for the specific objects and leave everything else untouched. The list of relevant snapshot information is found by looking at the event_snapshot_links _to field to match the hub and vertex objects and returning the _from field.
Gary Gendel
@ggendel
Here is a run with debug outputdeleting an object with all of it's snapshots:
  • Obliterating current database
    SUCCESS: Obliterated everything!
  • creating path to delete
    pmsite/20989796
    pmproject/20989808
    pmlibspec/20989826
    pmvariant/20989838
    pmlibtype/20989857
    pmlibrary/20989884
    pmlibrary/20989909
  • Adding snapshots
    pmlibrary/20989909
    pmlibrary/20989909
    pmlibrary/20989909
    pmlibrary/20989909
    pmlibrary/20989909
    2020-04-29T13:21:19.771Z: WITH pmhist_skeleton_edge_hubs,pmhist_skeleton_edge_spokes,pmhist_skeleton_vertices,pmhist_snapshot_links FOR v,e,p IN 2..2 ANY "pmhist_skeleton_vertices/pmlibrary.20989909" pmhist_skeleton_edge_spokes RETURN {edges:p.edges[*]._id,vertices:[p.vertices[0]._id,p.vertices[1]._id]}
    2020-04-29T13:21:19.771Z: FOR o IN pmhist_skeleton_vertices FILTER o._id IN ["pmhist_skeleton_vertices/pmlibrary.20989909"] REMOVE {_key:o._key} IN pmhist_skeleton_vertices
    2020-04-29T13:21:19.772Z: FOR o IN pmhist_skeleton_edge_spokes FILTER o._id IN ["pmhist_skeleton_edge_spokes/20989917","pmhist_skeleton_edge_spokes/20989916","pmhist_skeleton_edge_spokes/20989923","pmhist_skeleton_edge_spokes/20989922"] REMOVE {_key:o._key} IN pmhist_skeleton_edge_spokes
    2020-04-29T13:21:19.772Z: FOR o IN pmhist_skeleton_edge_hubs FILTER o._id IN ["pmhist_skeleton_edge_hubs/pm_type.20989913","pmhist_skeleton_edge_hubs/pm_child.20989919"] REMOVE {_key:o._key} IN pmhist_skeleton_edge_hubs
    2020-04-29T13:21:19.772Z: FOR v in pmhist_events FILTER v.meta.id IN ["pmlibrary/20989909","pm_type/20989913","pm_child/20989919"] RETURN v
    2020-04-29T13:21:19.773Z: FOR o IN pmhist_event_snapshot_links FILTER o._to IN ["pmhist_snapshots/origin-20989641","pmhist_snapshots/origin-20989706","pmhist_snapshots/origin-20989721","pmhist_snapshots/20989943"] REMOVE o IN pmhist_event_snapshot_links RETURN o._from
    2020-04-29T13:21:19.773Z: FOR o IN pmhist_events FILTER o._id IN ["pmhist_events/20989945","pmhist_events/origin-20989641","pmhist_events/origin-20989706","pmhist_events/origin-20989721"] REMOVE o IN pmhist_events
    2020-04-29T13:21:19.773Z: FOR o IN pmhist_commands FILTER o._to IN ["pmhist_events/20989945","pmhist_events/origin-20989641","pmhist_events/origin-20989706","pmhist_events/origin-20989721"] REMOVE o IN pmhist_commands
    2020-04-29T13:21:19.773Z: FOR o in pmhist_snapshot_links FILTER o._to IN ["pmhist_snapshots/origin-20989641","pmhist_snapshots/origin-20989706","pmhist_snapshots/origin-20989721","pmhist_snapshots/20989943"] REMOVE o IN pmhist_snapshot_links
2020-04-29T13:21:19.774Z: FOR o in pmhist_snapshots FILTER o._id IN ["pmhist_snapshots/origin-20989641","pmhist_snapshots/origin-20989706","pmhist_snapshots/origin-20989721","pmhist_snapshots/20989943"] REMOVE o IN pmhist_snapshots
2020-04-29T13:21:19.774Z: FOR o in pmhist_events FILTER o._id IN ["pmhist_events/20989910","pmhist_events/20989914","pmhist_events/20989920","pmhist_events/20989928","pmhist_events/20989933","pmhist_events/20989938","pmhist_events/20989945","pmhist_events/20989951"] REMOVE o IN pmhist_events
2020-04-29T13:21:19.774Z: FOR o in pmhist_commands FILTER o._to IN ["pmhist_events/20989910","pmhist_events/20989914","pmhist_events/20989920","pmhist_events/20989928","pmhist_events/20989933","pmhist_events/20989938","pmhist_events/20989945","pmhist_events/20989951"] REMOVE o IN pmhist_commands
Aditya Mukhopadhyay
@adityamukho
Great! All 8 collections seem to be cleared of the relevant entries.
You don't need to remove the collection-specific origin events. They can be left intact. In fact, if there are any events belonging to a collection that you want to leave untouched, these origins must remain intact. Same goes for snapshot origins. Like collection-specific event origins, they are safe to remove ONLY when ALL documents for that collection are being purged.
Best to leave all origin events and origin snapshots untouched in all cases. They are not document-specific anyway.
Gary Gendel
@ggendel
From what I saw, they are event specific. I only delete them when the relevant event/command is removed. I originally left them, but then they were not reused.
Aditya Mukhopadhyay
@adityamukho
Ok here is how the event tree is structured:
  1. At the top level there is a single root event from which everything is reachable as a descendant
This is called the super origin
  1. Under this, the 2nd level consists of per-collection origins. All events for all documents of a collection are reachable from the origin event for that collection. These are the ones that i said should be left intact.
From the 3rd level onwards is where document-specific events are recorded. This means the creates, updates and deletes. These are the ones that should be deleted during a purge
Gary Gendel
@ggendel
Thanks. I'll validate that I don't remove collection origins.
Gary Gendel
@ggendel
I'm pretty happy with the result. It removes the collection origin only when the last event for that collection is removed. I have some tests to write and a couple of ArangoDB integration items to work through before I release to the Applications Engineers and Marketing to check it out.
Aditya Mukhopadhyay
@adityamukho
Oh in that case, it is safe to remove. Excited to see the pace at which things are moving ahead at your end!
Aditya Mukhopadhyay
@adityamukho

@/all The new documentation website is live!

Check it out at https://docs.recallgraph.tech/ and do leave your feedback!

Gary Gendel
@ggendel
Great start. How can we help garner more attention to your project?
Aditya Mukhopadhyay
@adityamukho
@ggendel Thank you! I'd be very delighted to see more users onboarded. I'll start on the main website soon enough. When that is ready, there could be a section for testimonials, case studies, etc. In the meantime, any channel through which the word can be spread would be greatly beneficial, including LinkedIn, Twitter and not to underestimate good old word of mouth.
I'm also reaching out to potential individual, academic and enterprise users, particularly those who are using or are likely to use graph databases in their research/analytics/business operations. But this effort could use some up-scaling, and without a doubt the word of a happy enterprise user carries a lot of weight - far more than a developer marketing his own creations ;)
Also, more sections will be added to the doc website over time - guided tutorials with examples, contribution guides, etc.
Always open to feedback
Aditya Mukhopadhyay
@adityamukho

A brand new guide section has been added to the docs, along with minor improvements to other sections:

https://docs.recallgraph.tech/working-with-recallgraph/guide

whyDoesThisWork
@whyDoesThisWork
Hi @adityamukho ! Sorry about the delay on joining glitter
Aditya Mukhopadhyay
@adityamukho
No sorries required absoultely! Welcome to the RecallGraph community @whyDoesThisWork !
Aditya Mukhopadhyay
@adityamukho

PSA: The project homepage on GitHub now has a shiny new Sponsor button.
Enabled transfer methods:

  1. PayPal (active)
  2. GitHub Sponsors (coming soon)

https://github.com/RecallGraph/RecallGraph

Gary Gendel
@ggendel
Good to know. I'm sure I can convince management to contribute once we have officially deployed RecallGraph.
Aditya Mukhopadhyay
@adityamukho
Thank you!
bb-kodexa
@bb-kodexa
Great work Aditya! RecallGraph is a real testament to how Arangodb can be extended and built upon. We are working on our first Arangodb project and near the top of our list is support for versioning, so we are really looking forward to RecallGraph reaching v1.0.
Aditya Mukhopadhyay
@adityamukho
Thank you @bb-kodexa ! I've tried my best to keep it generic and relevant. You may be interested to know that the API for v1.0 is >95% stable, and the code in the dev-raw branch can already be used to start basing your project upon.
I've finished all features on the 1.0 roadmap, what remains is to write all the test cases and ensure it is thoroughly tested and adequately performant.
Aditya Mukhopadhyay
@adityamukho

RecallGraph v1.0 has been released.
https://github.com/RecallGraph/RecallGraph/releases/tag/v1.0.0

Supported ArangoDB versions: 3.5, 3.6

Highlights

  1. Providers for all API endpoints to let other dependent Foxx services invoke RecallGraph's service methods directly, using ArangoDB's service linking mechanism.
  2. Explicit Commits to sync event log with writes that occurred outside of RecallGraph's API methods.
  3. Purge endpoint to remove all history for nodes at a specified path.
  4. Restore endpoint to undelete nodes that were deleted through RecallGraph's API.
  5. Paths are returned in traverse calls, with support for path filters.
  6. k Shortest Paths - Custom-weighted, point-in-time, shortest paths between endpoints.

Changelog: https://docs.recallgraph.tech/working-with-recallgraph/changelog#1-0-0

Gary Gendel
@ggendel
Super news. Congrats.
Aditya Mukhopadhyay
@adityamukho

Here's the HackerNews entry: https://news.ycombinator.com/item?id=23455516

Folks, please show your love with upvotes, shares, and also star the GitHub repository if you haven't already done so. If this project crosses 100 stars, I can then apply to sponsorship programs at OpenCollective and CodeFund. TIA!

Gary Gendel
@ggendel
I'm trying to use the traverse exported function directly, but I must have something wrong. I get the following error:
ValidationError: child \"edges\" fails because [child \"uniqueVertices\" fails because [\"uniqueVertices\" must be one of [inbound, outbound, any]]]
1 reply
I've set uniqueVertices to "path" in the passed object for the edges parameter. That seems to be the right place to pass it.
I'm also not getting the right information back using the web interface. It seems to be ignoring the vFilter but I'd like to get the traverse function call working first.
2 replies
Aditya Mukhopadhyay
@adityamukho
Ok, I think I might have mixed up the validation messages and/or the validation code between input fields. A minor bug either way, though I'm not yet sure why the tests didn't catch it.
I have a presentation on RG to make on the 19th evening, for which I'm a little tied down preparing the material. I hope it is ok if I look into this issue on the 20th.
Gary Gendel
@ggendel
There is no urgency. I've still got a lot of other things on my plate. Good luck with your presentation.
Gary Gendel
@ggendel
Something else for you to investigate when you have some time. A performance degradation seems to be related to the size of the RG tables. My test was to take a graph with a couple million each of objects and edges and "commit" them so there would only be a single create command for each of these. Then I created a small independent graph (10s of vertex and edges) via RG and finally request a traversal of that. It took many seconds to return the request. This is not high priority for me, but it means that I can't use RG calls to do current-time traversals but do it directly on my tables.