These are chat archives for RBMHTechnology/eventuate

20th
Feb 2018
Volker Stampa
@volkerstampa
Feb 20 2018 07:53

To get a better understanding of the optimization I browsed quickly over the paper "Making Operation-based CRDTs Operation-based". As far as I understand the optimizations regarding causal stability would affect implementations in eventuate like the versionedEntries in ORSet and could lead to more efficient storage, is that correct? I actually wonder if it make sense to start with an implementation that does not introduce new messages between the locations even if that comes with a potentially significant lag. Actually to me this also seems to be the approach proposed in the paper (in section 4 after the definition of "Causal Stability").

IMHO I think it is, stabilization is proper of the vector clocks used by the EventLog and it should be in the [core] module, CRDT's just use it for optimization.

I agree that causal stability is a property of the distributed log. However I wonder if some sort of application specific configuration is anyways required as in Eventuate the logs do not have a priori knowledge about the existing locations which seems to be required to determine causal stability. I guess atm Eventuate could just determine causal stability for all locations it has already seen. What are your thoughts on this?

Gabriel Giussi
@gabrielgiussi
Feb 20 2018 16:49

implementations in eventuate like the versionedEntries in ORSet and could lead to more efficient storage, is that correct?

I've implemented the AWSet (aka ORSet) as a pure op-based CRDT. This kind of crdts use stable VectorTime (aka TCStable) to prune the POLog (basically the set of operations), so yes is for more efficient in-memory storage (with snapshots this should lead to more efficient persistent storage).

I actually wonder if it make sense to start with an implementation that does not introduce new messages between the locations even if that comes with a potentially significant lag

I agree, documenting this lag behavior. So I should go back to use ReplicationRead and ReplicationWrites to update the RTM? Or you prefer a less intrusive implementation?

However I wonder if some sort of application specific configuration is anyways required as in Eventuate the logs do not have a priori knowledge about the existing locations which seems to be required to determine causal stability

This is correct. The stabilization requires knowing all the partitions of an EventLog. I've already implemented and tested a solution for this that requires a couple of messages from Connector while this is creating the corresponding Replicators and after this initial messages it uses replicated events to solve the set of partitions.
My current implementation uses the conf eventuate.endpoint.connections (this conf is always required, isn't?)
The only "issue" with my current implementation is regarding changes in the cluster, e.g. added/removed connection endpoint, but is safe to do this in Eventuate? In that case I guess I could check for the ApplicationId and ask the user to change when it adds/removes an endpoint (to restrictive?)