These are chat archives for RBMHTechnology/eventuate

2nd
Jan 2016
Alexander Semenov
@Tvaroh
Jan 02 2016 10:38 UTC
@krasserm if I use Cassandra cluster as a storage backend for multiple servers running eventuate, then I probably do not want the eventuate keyspace to be used by multiple servers, right? I.e. I need different eventuate keyspace for each server?
Martin Krasser
@krasserm
Jan 02 2016 11:56 UTC
What are these servers running? Replicas of the same application? Different applications that collaborate via events? Independent applications. What are their availability requirements? How do these servers map to eventuate locations?
Alexander Semenov
@Tvaroh
Jan 02 2016 11:57 UTC
yes, servers are just replicas, kind of HA setup
if one is not available, a request is routed to another one
within a single dc
Alexander Semenov
@Tvaroh
Jan 02 2016 12:04 UTC
I have a set of servers app-1, app-2, app-3 running the same app and serving websocket connections. On each server's machine I have local Cassandra running. Those Cassandra servers form a cluster. As I understand, each app server should have its own Cassandra-backed event log, i.e. no Cassandra clustering setup is required. But at the same time I want to use same Cassandra instances as a cluster to store some data, e.g. session ids.
Alexander Semenov
@Tvaroh
Jan 02 2016 12:12 UTC

Currently, with

CREATE KEYSPACE eventuate WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;

I'm getting the same keyspace used on each app server. So, I probably need to give it a different name for each server, like eventuate-1, eventuate-2, and so on.

I'm just asking whether I understand this correctly. :)
Martin Krasser
@krasserm
Jan 02 2016 12:32 UTC
You're sharing a cassandra cluster across eventuate locations then. Hence you limit the availability of your servers to that of the cassandra cluster (eventuate makes quorum r/w to that cluster). This is not the intent of locations. Locations are availability zones and need a storage backend whose availability is independent of that of other locations. Furthermore, you'd replicate twice to the same cluster, first on cassandra level and second on eventuate level which doesn't make sense. Usually, within a DC you can use a master-slave setup for your servers. This also gives you HA. Alternatively give each server its own independent storage backend. I,ll demonstrate the latter with Leveldb in an Akka Cluster (supporting dynamic replication network changes) once eventuate 0.5 is out.
Alexander Semenov
@Tvaroh
Jan 02 2016 12:36 UTC
wow thanks, can you elaborate on how master-slave setup could solve this?
looks like I can't use existing cassandra cluster for locations story, right?
Martin Krasser
@krasserm
Jan 02 2016 12:53 UTC
You could use it if you don't care about limited availability of your locations. In this case, use a separate keyspace for each location. But you just generate more replication overhead in this case. With a master slave setup all servers share a cassandra backend and you direct all writes to the application master. If the master goes down, a slave recovers from the logs in cassandra and becomes the new master. Use whatever leader election utility you want to setup a master-slave cluster.
Martin Krasser
@krasserm
Jan 02 2016 13:00 UTC
Access to storage backends within a location requires strong consistency. Writes to different locations give you causal consistency (favoring availabi). Sharing a storage backend across locations therefore
would subvert the availability of location's
(Typing on my phone ...)
Alexander Semenov
@Tvaroh
Jan 02 2016 13:02 UTC
yep, I don't want that replication overhead. Replication on logical level (events at locations) is enough.
thanks for the suggestions, I will also consider using leveldb as storage backend for events
Martin Krasser
@krasserm
Jan 02 2016 16:46 UTC
@Tvaroh another alternative of avoiding the replication overhead in the scenario you described is to use 'replication_factor': '1' when creating the keyspace. In this case, events are not replicated in Cassandra and replication takes only place on Eventuate level (between locations). This is comparable to writing events to a local LevelDB which isn't replicated either. However, when using Cassandra with 'replication_factor': '1', event logs are still partitioned across Cassandra nodes (= scalability with data volume) which is not the case when you use LevelDB. So, if you don't need stronger durability guarantees (i.e. replication factor > 1), you can try using the setup initially described (with a different keyspace name for each application server). Even if a Cassandra node crashes without being able to recover the stored events on it, you can still recover most of them from other Eventuate locations using disaster recovery, optionally in combination with a Cassandra backup. In this case, you'd recover all events which have already been successfully replicated to other eventuate locations (and loose only those that are inside the replication latency window, which may be as small as a few milliseconds, depending on the setup details).
Martin Krasser
@krasserm
Jan 02 2016 16:59 UTC
@Tvaroh In my previous answers, I assumed that the only reason for using a Cassandra storage backend is to achieve strong(er) durability guarantees (using a replication factor > 1 in combination with QUORUM writes, for example) which is why I actually developed the driver for it. If you don't require strong durability, your setup can still make sense as you can scale with data volume by adding Cassandra nodes. Hope that helps.
Alexander Semenov
@Tvaroh
Jan 02 2016 17:32 UTC
Thank you. I suspect with replication factor of 1 the data is still not guaranteed to be on a single node though...
Martin Krasser
@krasserm
Jan 02 2016 18:31 UTC
Why not? It should be.
I mean, a single event will only be on a single node.
Martin Krasser
@krasserm
Jan 02 2016 18:37 UTC
... per location, of course