These are chat archives for RBMHTechnology/eventuate

21st
Aug 2016
Volker Stampa
@volkerstampa
Aug 21 2016 10:17

@Tvaroh I thought a little bit about your application. I still think it is hard to come up with a concrete proposal based on what you have written, but I will try to give a few hints that might be worth thinking about.

As far as I can tell your are basically struggling with two things: How to use a Neo4j backed write model and how to automatically resolve conflicts in case of concurrent and conflicting updates.

First of all currently eventuate does not really support persistent write models. So if you still want to use Neo4j as your write model and eventuate to distribute the changes to your Neo4j in form of events you could perform your command validation and database updates outside of eventuate and persist eventuate events based on the applied changes. You typically want to do this in a transactionally safe manner, i.e. for each successful transaction write corresponding events (even if your application crashes right after a successful commit) but do not write events for failed transactions. If Neo4j allows to access its transaction logs it might be possible to read the changes from there, persist corresponding events along with an event that indicates which transaction has been processed successfully, so the application knows where to continue after a restart (This is basically the CDC Martin mentioned above).

An alternative is to use eventuate for initial command processing, but then your write model must be in-memory. In that case you might want to rethink what is actually needed to validate your commands. Maybe you do not need the entire graph for this, but just some aspects of it. Maybe you can partition your data and use aggregateIds in your EventsourcedActors for example using user-ids as aggregateIds. If your application allows this you could validate your commands based on this in-memory write model and use an EventsourcedWriter to populate your Neo4j as query database.

(From my point of view) one of the core features of eventuate is to detect concurrent updates and allow to resolve conflicts of such updates based on application defined (i.e. high level) events (as opposed to using generic (low level) CRUD events on table-row basis for example). So it is up to the application to define and persist events in a form that conflicts can be resolved in a way that makes the most sense for the application. You basically have to built your application specific CRDT. As far as I understand your central data structure is a tree so maybe it makes sense to read the section 3.4 about Graphs in the paper A comprehensive study of Convergent and Commutative Replicated Data Types and build upon its ideas.

Just my 2c. I hope this helps, but maybe someone else has better ideas. I assume in any case it is going to be hard to solve all the details through discussions in this forum :-)

Martin Grotzke
@magro
Aug 21 2016 14:15

@Tvaroh @volkerstampa Great thoughts! I also thought a bit about the application, but couldn't write my thoughts due to bad connectivity. Now here they are:

  • as I understand it the system shall remain available/writable if nodes get partitioned
  • if nodes get reconnected later, conflicts have to be detected and resolved - there need to be conflict resolution rules that lead to convergence between nodes (related to what Volker wrote regarding CRDTs)
    Events need to carry all information needed to perform db changes Not sure if there might be situations where events diverge from db state that much so that during conflict resolution db changes cannot be applied from events
  • that means that every write could be invalidated later (depending on the conflict resolution / convergence rules some writes in fact might be considered "stable")
  • this behavior should be known by users, and a write should be confirmed accordingly (in terms of "your request is being processed")

After all to me it seems rather complicated to get this right/correct and robust. Perhaps it's worth it evaluating an open source / free graph database that supports replication, as e.g. OrientDB (multi master, quorum based strong consistency).