These are chat archives for RBMHTechnology/eventuate

6th
Feb 2017
Mehdi Massoudi
@mehdimas
Feb 06 2017 14:14
Disaster Recovery.png

@krasserm: See comments below.

I want to use ephemeral storage for all application instances

Can you please elaborate? Do you want to delete the LevelDB files and then recover them by replicating from the hub location?

Yes, LevelDB files would be removed by the termination of a location. For example, when a Docker container is terminated and a new container is created from an image.

If all locations are terminated ...

Is this just a normal application shutdown or something else?

Most likely normal (expected) shutdown. It isn't clear how to recover the hub locations when the entire system is shut down and then started. The hubs connect to both the DR location and the spokes. Since all directly connected locations must be available during recovery, the hubs can't recover from the spokes because they haven't recovered yet.

I'm struggling with the recovery of hub locations ...

Shouldn't the hub location recover from its Cassandra storage backend as it writes its log to Cassandra for durability?

Yes, the disaster recovery location can recover from Cassandra. The disaster recovery location should then replicate events to additional "hubs" in other regions before replication to spoke locations.

See the diagram above.

Mehdi Massoudi
@mehdimas
Feb 06 2017 14:21
I think this relates to dynamically adding new connections to a replication endpoint. I need to be able to add the spoke connections to the hub replication endpoints after the hubs are recovered. Or I need to be able to recover from a subset of connected locations.
Martin Krasser
@krasserm
Feb 06 2017 16:06

@mehdimas thanks for clarifying. In the current Eventuate version, disaster recovery is only possible if the location to be recovered is connected to "healthy" locations only. Eventuate considers a disaster a rare event, affecting only a single location, and doesn't have special support for DR of whole location networks (as in your case).

Why don't you write the LevelDB files to a volume so that they survive container restarts? This would make data loss a rather rare event. DR for whole location networks requires further changes to Eventuate. Please create a ticket if you'd like to have that in one of the next versions. As always, PR are welcome :smile:

Adding new locations must be distinguished from recovering lost locations. A new location requires a new location id which adds an additional entry to the vector clocks. Frequently adding new locations by replacing others doesn't scale. Only disaster recovery allows for location id reuse. BTW, dynamic location addition will be supported in Eventuate 0.9.