These are chat archives for dgraph-io/dgraph

10th
Dec 2015
Manish R Jain
@manishrjain
Dec 10 2015 03:56
Hey @pires, let me know if you wish to discuss how gossip protocol based discovery would work.
I haven't dug too deep in how other OS softwares are doing it, so your thoughts would be useful.
Paulo Pires
@pires
Dec 10 2015 09:43
@manishrjain it can either be an internal protocol or we can leverage on something like serf.
Give me a few hours and I'll open the issue with more details.
Manish R Jain
@manishrjain
Dec 10 2015 10:57
sure.
Paulo Pires
@pires
Dec 10 2015 19:17
@manishrjain dgraph-io/dgraph#12
am inclined to adopt serf (as a library, ofc) since it provides more than discovery but cluster management as well. reactive updates, cluster messages, that kind of stuff that will definitely prove useful when dealing with the consensus stuff.
Manish R Jain
@manishrjain
Dec 10 2015 21:10
Nice, will read up about serf. Also, I’d like you to also look at two more things: 1. CockroachDB, and see what they’re using for gossip: https://smazumder05.gitbooks.io/design-and-architecture-of-cockroachdb/content/architecture/node_allocation_via_gossip.html ; and 2. CoreOS’s etcd discovery system.
Think the biggest challenge here, is for a new node to figure out that one machine which should always be up. Elastic Search also does this, so might be worth figuring out how they do this.
^ @pires
Paulo Pires
@pires
Dec 10 2015 21:15
@manishrjain i know etcd and am using it for a lot of stuff. discovery depends on a public registry, discovery.etcd.io.
serf is used by nomad and consul, both by Hashicorp.
one of its huge advantages @manishrjain is the network tomography feature.
Manish R Jain
@manishrjain
Dec 10 2015 21:20
yeah
Paulo Pires
@pires
Dec 10 2015 21:20
nodes learn with time which nodes are closer to itself and this can improve latency. it would be huge if you think about sharding data.
never read anything related to graph sharding though
Manish R Jain
@manishrjain
Dec 10 2015 21:21
Agree about the sharding part. Also, the above link re: Cockroachdb, do the same thing.
Would we have to run something equivalent of etcd discovery for a new node to determine who to ping?
(I’m going to be a bit on and off, have a meeting in half an hour)
Paulo Pires
@pires
Dec 10 2015 21:23
@manishrjain one thing against the cockroach gossip is that it’s implemented for cockroach purposes, it seems. so it’s not designed to be agnostic for others to use. example, it forces you to use their RPC stuff.
i don’t think adding etcd or any other k/v to the mix will help discovery.
in a cloud-environment it’s very hard to do zero-configuration unless some registry service is provided.
people nowadays use Kubernetes, Consul, etcd, etc. it should be up to them how to advertise the initial node.
multicast is forbidden in all the cloud providers i’m aware of.
Manish R Jain
@manishrjain
Dec 10 2015 21:32
i don’t think adding etcd or any other k/v to the mix will help discovery.
For sure. That’s not what I meant, actually.
I’m not clear about how a new node will find it’s way to other existing nodes, without having something running which has to be up all the time.
etcd discovery (not etcd the kv store itself) is that constant point of reference for etcd clusters. So, my question is, how do we run our protocol without having to run a discovery like service.
in a cloud-environment it’s very hard to do zero-configuration unless some registry service is provided.
So, my suspicions were right. Some service has to be running, which has to be kept up.
Elastic Search seems to do without it — based on their tutorials etc. Are they doing multicasting then? How do they do cluster detection on AWS then?
@pires ^
Paulo Pires
@pires
Dec 10 2015 21:39
I do contribute to Elasticsearch discovery plug-ins, namely GCE and Kubernetes. it’s not magical.
multicast discovery is magic but forbidden in clouds due to its multicast nature.
so again, you always need some sort of registry to get (at least initial) discovery information.
Manish R Jain
@manishrjain
Dec 10 2015 21:40
Got it.
Paulo Pires
@pires
Dec 10 2015 21:40
the power of gossip protocols is that you only need one address.
it’s P2P so eventually you are connected to the entire cluster or the max number of nodes you can connect to.
Manish R Jain
@manishrjain
Dec 10 2015 21:41
And how does ES do it when say run outside of GCE/Kubernetes?
Paulo Pires
@pires
Dec 10 2015 21:42
the default is multicast.
Manish R Jain
@manishrjain
Dec 10 2015 21:42
I see.
So, basically, we’ll need all these different mechanisms eventually.
Paulo Pires
@pires
Dec 10 2015 21:43
OrientDB defaults to multicast as well, because it relies on Hazelcast (a datagrid) for discovery, clustering and messaging/orchestration.
Manish R Jain
@manishrjain
Dec 10 2015 21:44
Alright, sounds good! I’ll summarize the conversation on the issue.
Might have a few more questions as I read about serf.
Thanks for your explanation!
Paulo Pires
@pires
Dec 10 2015 21:46
@manishrjain if i had to choose, i’d choose gossip.
multicast is usable locally alone and when you’re doing it locally you’re not clustering.
Manish R Jain
@manishrjain
Dec 10 2015 21:49
Gossip is my choice as well. But, to find your peers the first time, you’ll have to use multicast right.
Paulo Pires
@pires
Dec 10 2015 21:49
you just need one peer.
multicast won’t work in cloud environments.
anyway, would be interesting to provide gossip and allow for custom discovery mechanisms.
Manish R Jain
@manishrjain
Dec 10 2015 21:51
Yeah, that one peer — if it has to be kept up and running all the time, then it’s a service by definition.
Paulo Pires
@pires
Dec 10 2015 21:51
plug-ins in go are still tricky, but it’s possible
no, it doesn’t need to be up & running all the time
Manish R Jain
@manishrjain
Dec 10 2015 21:51
Or, maybe you mean, every time a new node is brought up, we manually give it the address of one running peer.
Paulo Pires
@pires
Dec 10 2015 21:51
that node may disappear in the future, because all nodes will know about the cluster topology.
yes, you just need to provide an existing node IP to a new node and you’re done
that’s usually IT stuff
Manish R Jain
@manishrjain
Dec 10 2015 21:52
I see, that would work!
Paulo Pires
@pires
Dec 10 2015 21:52
that’s how you cluster Cassandra, for instance
@manishrjain again, we could implement gossip and allow for discovery plugins to be added.
Manish R Jain
@manishrjain
Dec 10 2015 21:52
Cassandra is interesting. They also use a common cluster name.
Paulo Pires
@pires
Dec 10 2015 21:53
gossip will be used for cluster management. well serf gives you more than gossip, but it’s gossip based.
cluster names is interesting, but take a look at OrientDB cluster notion.
Manish R Jain
@manishrjain
Dec 10 2015 21:53
Will do.
anyway, let’s see what others may add to this discussion
Manish R Jain
@manishrjain
Dec 10 2015 21:55
Alright. Will read up on these, and get back to you later today. Think I have a good understanding of what you have in mind.
Paulo Pires
@pires
Dec 10 2015 22:00
@manishrjain awesome. i have little idea about how to do graph databases, but i have some experience with networking/discovery/clusters. hope i can help :)
Manish R Jain
@manishrjain
Dec 10 2015 22:00
Yes, I’m counting on you for discovery :-).
Paulo Pires
@pires
Dec 10 2015 22:22
:+1: