Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
kapilpipaliya
@kapilpipaliya
oh thanks.
Gary Gendel
@ggendel
@adityamukho This may be a silly question but where is is the "path" field described? I'm trying to figure out what to put there.
Aditya Mukhopadhyay
@adityamukho

@adityamukho This may be a silly question but where is is the "path" field described? I'm trying to figure out what to put there.

@ggendel Please have a look at https://github.com/RecallGraph/RecallGraph/wiki/Terminology#path

Aditya Mukhopadhyay
@adityamukho

can i write RecallGraph like queries myself , so i can make AQL queries work?

@kapilpipaliya You can theoretically write AQL queries directly on RG collections once you're familiar with their document schema and relations, but I would advise against it since this structure is likely to undergo significant alterations during these early days of feature and performance enhancements. On the other hand, the API would remain relatively stable, and does support quite a few powerful filtering capabilities. The upcoming version will make this even more so, by integrating filter expressions into almost every read endpoint.

This is not yet reflected in the documentation, but if you're interested in trying out the absolute bleeding edge, try working with the dev-raw branch. Let me know if you need help with your queries.

@adityamukho Thanks. Performance is always a hot-spot in my work. As long as you keep it backwards compatible, I'm thrilled. For our regression testing, I can truncate your tables at the start of each test so I'm not too concerned. I'm concerned about resources in an active database that runs for years so we may want to have a way to (at least) archive ancient data. I'm not concerned about someone trashing the tables since users have to go through another application layer to get to the database so we can lock down this feature absolutely necessary.

@ggendel Understood. I will take up archiving as a roadmap item, once the internal event log structure has been stabilized (should be by the next 2 releases).

Gary Gendel
@ggendel
@adityamukho I'm using CivicGraph "stable" because I had trouble with the development version. I'm still having problems with paths: Everything I try gives me an intenal error:
Gary Gendel
@ggendel
http://localhost:8529/_db/gdpxl/pmhist/history/show?path=%2Fn%2Fpmconfig%2F17081815
{
"error": true,
"errorNum": 404,
"errorMessage": "Internal Server Error",
"code": 404
}
Gary Gendel
@ggendel
BTW, event/log with that path seems to work.
Gary Gendel
@ggendel
I got RecallGraph-developement to work and it exhibits the same problem. event commands work but history commands don't.
Aditya Mukhopadhyay
@adityamukho
@ggendel do you have access to the server logs associated with the failed request?
also, can you please share the result of the query fired at /event/log with the same parameters? You can DM me the results.
kapilpipaliya
@kapilpipaliya
@adityamukho Thanks.
Gary Gendel
@ggendel
@adityamukho What happens if RecallGraph is installed on an already populated database? Is there a mechanism to set this as a starting point (capture the state of all objects as if they were just created)? Is there something I can put in my other app's "setup" script to identify this and "snapshot" all objects, perhaps doing a RealGraph "update" with no changes?
Aditya Mukhopadhyay
@adityamukho
RecallGraph expects the first recorded event to be a create. It might reject updates otherwise (haven't verified behaviour for pre-existing nodes). What you're describing would eventually be handled by 'explicit commits' feature I have mentioned in the roadmap. How much of a priority is this for you? I'm currently working on enabling APM traces in the application, following which I was planning to focus on the 'valid time' dimension. But, I can re-prioritize (once the tracing instrumentation is finished).
Maybe you can try a blank update as an experiment, but it would have to be carefully validated to see that state rebuilds using the show api still work.
Aditya Mukhopadhyay
@adityamukho
But ultimately traversals will not work, since skeleton graph nodes are created only by create events
Gary Gendel
@ggendel
@adityamukho As this will be new functionality for us customers using the existing product will have their database populated. It just means, before we can roll out RealGraph, we will need some way of adding existing nodes and edges. Deployment would be several months out since it will have to go through a rigorous test and QA cycle. Keep in mind that, if you change the RealGraph schema, we need documentation on how to do the schema upgrade (Unless you do it automatically like we do in our Foxx setup script for our own schema changes).
Aditya Mukhopadhyay
@adityamukho
Now that there is going to be a live production deployment, I will take care not to change the schema unless it is absolutely unavoidable. In case it does change, there will be a automated migration script to port all data over to the new schema, requiring no manual intervention.
On the matter of populating the existing database, I think the cleanest approach would be to start with a fresh database and run a batch job to import all the existing data from the old database to the new one. That way, every existing record would have its corresponding entry in the event log and skeleton graph. Even so, this would only have to be a stopgap measure until the 'explicit commits' feature is built (which I will now prioritize over 'valid time').
Gary Gendel
@ggendel
I can live with that. Thanks.
Gary Gendel
@ggendel
Is "explicit commits" sort of a "create in place" feature where it would treat it like a create but not actually create the object?
Aditya Mukhopadhyay
@adityamukho
Explicit commits is in some sense the opposite. It would be used in a scenario where a document (vertex/edge) was created/updated/deleted outside of the knowledge of RecallGraph. Calling this method would make RG tally the specified node paths against its event log, determine where the event log is lagging behind and add appropriate events to make it "catch up".
But it is not a way to allow the event log to "lead" the actual object graph. I don't think there will be a use case for that.
Gary Gendel
@ggendel
I'm just looking for a way to seed RecallGraph with existing data. The alternative is to create each node and edge from scratch. This becomes a bit of a pain to build a map of old to new nodes in order to map the edges. This would be done only to set the initial condition when RecallGraph is installed on existing database. A "create in place" would allow me to do this without having to deal with dependencies and mappings.
Aditya Mukhopadhyay
@adityamukho
Ok I think we're both talking about the same scenario. Consider the following:
  1. You have a database with existing data on which you want to install RG to enable temporality from this point onwards.
  2. Just after installation, RG's event log is empty so it is unaware of existing DB records. There is no recorded history so to speak.
  3. In other words, there are documents in the database whose corresponding event log entries are absent. This means the event log is lagging behind the actual database entries (the object graph of user-entered data)
  4. Running an explicit commit makes RG scan the database (path = '/') and update its event log with entries to represent the existing object graph (CREATE events in this case)
  5. After this operation, RG's event log is in sync with the object graph.
From here onwards, you can use RG for further writes. If it goes out of sync again (due to external writes), you can run explicit commits on the affected path again to close the gap.
Gary Gendel
@ggendel
That's exactly what I need. For the time being, I did a "read table->truncate table->create objects in RG" procedure until it's ready.
Gary Gendel
@ggendel
I've added functionality in my foxx app to purge the object's history when I "obliterate" an object in my database. The "obliterate' operation is only enabled during regression testing and is user to selectively clean things up (restricted to the test objects) before starting each test. It took a couple of iterations to figure out the right approach to get all the right items in the collections and nothing more, but this was the last piece I needed before I could put it on Jenkins for nightly regression testing of my code branch.
Aditya Mukhopadhyay
@adityamukho
Awesome! If you want, I can take a look at your queries to make sure you're catching everything.
Gary Gendel
@ggendel
Cool. The process is relatively straightforward.
  • I get the skeleton vertex and edge_hubs connected to the object (v,e,p IN 2..2) and delete skeleton edges and vertex. This gives me a list of the objects that will be obliterated.
  • I collect all of the events associated with those objects.
  • From this list: I delete all associated snapshot information (event_snapshot_links, events, commands, snapshot_links, and snapshots).
  • From this list I delete the associated events.
  • Finally I delete any associated commands .
Gary Gendel
@ggendel
If I delete all objects, what remains is only the root origin document. My testing shows that I only obliterate the information for the specific objects and leave everything else untouched. The list of relevant snapshot information is found by looking at the event_snapshot_links _to field to match the hub and vertex objects and returning the _from field.
Gary Gendel
@ggendel
Here is a run with debug outputdeleting an object with all of it's snapshots:
  • Obliterating current database
    SUCCESS: Obliterated everything!
  • creating path to delete
    pmsite/20989796
    pmproject/20989808
    pmlibspec/20989826
    pmvariant/20989838
    pmlibtype/20989857
    pmlibrary/20989884
    pmlibrary/20989909
  • Adding snapshots
    pmlibrary/20989909
    pmlibrary/20989909
    pmlibrary/20989909
    pmlibrary/20989909
    pmlibrary/20989909
    2020-04-29T13:21:19.771Z: WITH pmhist_skeleton_edge_hubs,pmhist_skeleton_edge_spokes,pmhist_skeleton_vertices,pmhist_snapshot_links FOR v,e,p IN 2..2 ANY "pmhist_skeleton_vertices/pmlibrary.20989909" pmhist_skeleton_edge_spokes RETURN {edges:p.edges[*]._id,vertices:[p.vertices[0]._id,p.vertices[1]._id]}
    2020-04-29T13:21:19.771Z: FOR o IN pmhist_skeleton_vertices FILTER o._id IN ["pmhist_skeleton_vertices/pmlibrary.20989909"] REMOVE {_key:o._key} IN pmhist_skeleton_vertices
    2020-04-29T13:21:19.772Z: FOR o IN pmhist_skeleton_edge_spokes FILTER o._id IN ["pmhist_skeleton_edge_spokes/20989917","pmhist_skeleton_edge_spokes/20989916","pmhist_skeleton_edge_spokes/20989923","pmhist_skeleton_edge_spokes/20989922"] REMOVE {_key:o._key} IN pmhist_skeleton_edge_spokes
    2020-04-29T13:21:19.772Z: FOR o IN pmhist_skeleton_edge_hubs FILTER o._id IN ["pmhist_skeleton_edge_hubs/pm_type.20989913","pmhist_skeleton_edge_hubs/pm_child.20989919"] REMOVE {_key:o._key} IN pmhist_skeleton_edge_hubs
    2020-04-29T13:21:19.772Z: FOR v in pmhist_events FILTER v.meta.id IN ["pmlibrary/20989909","pm_type/20989913","pm_child/20989919"] RETURN v
    2020-04-29T13:21:19.773Z: FOR o IN pmhist_event_snapshot_links FILTER o._to IN ["pmhist_snapshots/origin-20989641","pmhist_snapshots/origin-20989706","pmhist_snapshots/origin-20989721","pmhist_snapshots/20989943"] REMOVE o IN pmhist_event_snapshot_links RETURN o._from
    2020-04-29T13:21:19.773Z: FOR o IN pmhist_events FILTER o._id IN ["pmhist_events/20989945","pmhist_events/origin-20989641","pmhist_events/origin-20989706","pmhist_events/origin-20989721"] REMOVE o IN pmhist_events
    2020-04-29T13:21:19.773Z: FOR o IN pmhist_commands FILTER o._to IN ["pmhist_events/20989945","pmhist_events/origin-20989641","pmhist_events/origin-20989706","pmhist_events/origin-20989721"] REMOVE o IN pmhist_commands
    2020-04-29T13:21:19.773Z: FOR o in pmhist_snapshot_links FILTER o._to IN ["pmhist_snapshots/origin-20989641","pmhist_snapshots/origin-20989706","pmhist_snapshots/origin-20989721","pmhist_snapshots/20989943"] REMOVE o IN pmhist_snapshot_links
2020-04-29T13:21:19.774Z: FOR o in pmhist_snapshots FILTER o._id IN ["pmhist_snapshots/origin-20989641","pmhist_snapshots/origin-20989706","pmhist_snapshots/origin-20989721","pmhist_snapshots/20989943"] REMOVE o IN pmhist_snapshots
2020-04-29T13:21:19.774Z: FOR o in pmhist_events FILTER o._id IN ["pmhist_events/20989910","pmhist_events/20989914","pmhist_events/20989920","pmhist_events/20989928","pmhist_events/20989933","pmhist_events/20989938","pmhist_events/20989945","pmhist_events/20989951"] REMOVE o IN pmhist_events
2020-04-29T13:21:19.774Z: FOR o in pmhist_commands FILTER o._to IN ["pmhist_events/20989910","pmhist_events/20989914","pmhist_events/20989920","pmhist_events/20989928","pmhist_events/20989933","pmhist_events/20989938","pmhist_events/20989945","pmhist_events/20989951"] REMOVE o IN pmhist_commands
Aditya Mukhopadhyay
@adityamukho
Great! All 8 collections seem to be cleared of the relevant entries.
You don't need to remove the collection-specific origin events. They can be left intact. In fact, if there are any events belonging to a collection that you want to leave untouched, these origins must remain intact. Same goes for snapshot origins. Like collection-specific event origins, they are safe to remove ONLY when ALL documents for that collection are being purged.
Best to leave all origin events and origin snapshots untouched in all cases. They are not document-specific anyway.
Gary Gendel
@ggendel
From what I saw, they are event specific. I only delete them when the relevant event/command is removed. I originally left them, but then they were not reused.
Aditya Mukhopadhyay
@adityamukho
Ok here is how the event tree is structured:
  1. At the top level there is a single root event from which everything is reachable as a descendant
This is called the super origin
  1. Under this, the 2nd level consists of per-collection origins. All events for all documents of a collection are reachable from the origin event for that collection. These are the ones that i said should be left intact.
From the 3rd level onwards is where document-specific events are recorded. This means the creates, updates and deletes. These are the ones that should be deleted during a purge
Gary Gendel
@ggendel
Thanks. I'll validate that I don't remove collection origins.
Gary Gendel
@ggendel
I'm pretty happy with the result. It removes the collection origin only when the last event for that collection is removed. I have some tests to write and a couple of ArangoDB integration items to work through before I release to the Applications Engineers and Marketing to check it out.
Aditya Mukhopadhyay
@adityamukho
Oh in that case, it is safe to remove. Excited to see the pace at which things are moving ahead at your end!
Aditya Mukhopadhyay
@adityamukho

@/all The new documentation website is live!

Check it out at https://docs.recallgraph.tech/ and do leave your feedback!

Gary Gendel
@ggendel
Great start. How can we help garner more attention to your project?
Aditya Mukhopadhyay
@adityamukho
@ggendel Thank you! I'd be very delighted to see more users onboarded. I'll start on the main website soon enough. When that is ready, there could be a section for testimonials, case studies, etc. In the meantime, any channel through which the word can be spread would be greatly beneficial, including LinkedIn, Twitter and not to underestimate good old word of mouth.
I'm also reaching out to potential individual, academic and enterprise users, particularly those who are using or are likely to use graph databases in their research/analytics/business operations. But this effort could use some up-scaling, and without a doubt the word of a happy enterprise user carries a lot of weight - far more than a developer marketing his own creations ;)