These are chat archives for atomix/atomix

17th
May 2016
Kevin Daly
@kedaly
May 17 2016 00:12
Nope that did not work (increasing timeout)
Jordan Halterman
@kuujo
May 17 2016 01:11
@kedaly I don't know why I didn't mention this, but if you set a breakpoint and it breaks then sessions will absolutely expire. Time doesn't stop when break points happen, and session timeouts are roughly based on wall clock time. I usually use the debugger to log expressions rather than breaking.
Theres not really any way to get around that since servers are independent of clients
I suppose I'm just to used to having to deal with that issue in debugging now
Ditto when debugging a server, setting a breakpoint on the leader will cause a leader change
You can probably set the session expiration incredibly high to get around it
How high did you set it?
Jordan Halterman
@kuujo
May 17 2016 01:17
Set it for an hour or something :-P there's no harm in that when debugging... Unless you're testing failover in DistributedGroup or that locks are released when a node crashes
...or if you're adding/removing nodes from the cluster a lot (clients only learn of new servers on keep-alives, and a longer session timeout requires fewer keep-alives)
Jordan Halterman
@kuujo
May 17 2016 03:19
I know I failed you :-(
Kevin Daly
@kedaly
May 17 2016 10:27
@kuujo I'm not sure how you set the timeout.. WithHeartBeatInterval? And as to failing me.. No problem :)
Jordan Halterman
@kuujo
May 17 2016 17:13
@kedaly if you're referring to the session timeout, it's withSessionTimeout
The session timeout is what dictates the amount of time a client has to send a keep-alive to the cluster. If a client doesn't send a keep-alive within a session timeout of its last keep-alive, its session can be expired
Kevin Daly
@kedaly
May 17 2016 17:50
I was the first person to post in the google Group, how new is this project? Is there anything I can do to help? Seems quite useful
Roman Pearah
@neverfox
May 17 2016 17:51
I think the gitter is the primary forum at this point.
Jordan Halterman
@kuujo
May 17 2016 17:51
this project has been in development for a few years… we had another Google Group in the past that we replaced with the Atomix one. But since we started chat people have heavily preferred this
look at the chat history there’s a ton of conversation :-) I like it better on chat anyways
but in terms of maturity it’s still young IMO and you should share your experience!
Kevin Daly
@kedaly
May 17 2016 17:52
Ok just don't want to overwhelm you guys with questions.. Although I don't have many.. This thing seems to be much better than Zookeeper for my uses
We're building a distributed app server on top of Vertx, Apache Ignite and Atomix.. Atomix is being used for configuration and discovery.
Kevin Daly
@kedaly
May 17 2016 17:58
Actually without disrespecting Zookeeper, this just simply seems better :)
Roman Pearah
@neverfox
May 17 2016 17:59
That system sounds awesome. Lots of best-of-breed stuff there.
And I don't even know what it does! lol
Kevin Daly
@kedaly
May 17 2016 18:00
Well it's for data scientists to write services do do Machine Learning without worrying about where data comes from..
They just simply write a "Servlet" and talk to an API and it all just magically will do the right things.
It's being used to replace a Hadoop Cluster.
Jordan Halterman
@kuujo
May 17 2016 18:03
That's awesome! I understand. It was designed to be better than ZooKeeper for many uses so that's good :-) ZooKeeper is obviously a massively popular project that's incredibly stable and proven, but that it's so widely used became a problem at my company because it really has such a poorly designed API but you end up needing to maintain a ZooKeeper cluster anyways because so many projects depend on it. Consensus has never been really accessible from a programmatic perspective. Atomix/Copycat is intended to make it accessible.
Kevin Daly
@kedaly
May 17 2016 18:04
If you guys know the Netflix stack, think something like that.. Discovery, Service Proxies, Microservices, Distributed Rest API's we're just getting started on it.. Build your services and processes, point them at the "Master Cluster" based on Atomix and everything will find itself and connect..
Ironically we're embedding ZK in our project and using Atomix to configure and deploy it bc we will be using Kafka eventually as other parts of our legacy are Kafka dependent.. The point will be though that we want to make Zookeeper completely invisible to the developers. I've considered re-writing parts of Kafka so that we won't need Zookeeper, but for now it's a little ambitious.
Roman Pearah
@neverfox
May 17 2016 18:21
Yeah, kafka is our main reason for having zookeeper right now.
As for the clustering stuff, we get a lot of that functionality with Kubernetes
Jordan Halterman
@kuujo
May 17 2016 18:22
yeah Kafka was our reason for needing ZooKeeper too
I also work at a data science company
Roman Pearah
@neverfox
May 17 2016 18:24
Would be nice to see Kafka embrace an interface that abstracts away the zookeeper dependency.
Jordan Halterman
@kuujo
May 17 2016 18:24
I’ve been talking to them about it… they are definitely going that way, but they said it likely won’t be for another year or so
they have to rewrite the controller first
Kevin Daly
@kedaly
May 17 2016 18:25
I'm not sold on containers, we might use Kubernetes and are targeting some sort of docker based solution.. But for the big data stuff it's been problematic.. If you've ever used Cloudera, we're looking at that idea where you define a node role in a cluster and it does what it does.. If that makes sense.
Roman Pearah
@neverfox
May 17 2016 18:25
that's good to know
it does
but the advantages of containers have been many for us, especially with CI/CD
but it can be tricky for distributed systems
zookeeper in docker isn't trivial
but once it works it works
Kevin Daly
@kedaly
May 17 2016 18:30
everyone is trying to stuff everything into containers.. it's kind of the latest fashion.. but the control plane for the services are outside looking in, where our design is inside looking out.. That being said, we are probably going to use containers.. but the networking sucks badly.. it's a problem we will have to overcome. Plus our Ignite servers use and need all the memory on a node.. So if you have 64GB one instance of Ignite will use all of that! So to run the database in containers makes little sense, as well as controlling replica policies... One team (not here) I worked with launched replicas in containers, unfortunately all of one shard launched on the same physical node, so when it crashed and burned, they were SOL.
Roman Pearah
@neverfox
May 17 2016 18:44
What do you mean by "outside looking in" vs "inside looking out"?
Jordan Halterman
@kuujo
May 17 2016 18:49
I have loved my limited experience with containers, but yeah the networking is complicated
Kevin Daly
@kedaly
May 17 2016 19:06
I mean that right now orchestration is from the outside of the cluster looking in.. Monitoring tools and automation tools such as Chef and Puppet control the deployment and configuration in a lot of systems.. Wouldn't it be nice if a cluster was "aware" of it's environment and could scale up and down automatically and configure itself automatically?
Of course within limits..
Nobody in my company likes when I say this, but Chef and Puppet are treatments for a disease, not cures, they are for deploying things that are not designed for the cloud.. or badly designed for the cloud.. A "cloud aware" system should handle everything except seeding itself..
Roman Pearah
@neverfox
May 17 2016 19:45
I see
I think something like that could be possible in a system like Kubernetes. In fact, we launched a Cassandra cluster that essentially formed the ring by using the Kube API from inside of the containers.
You could scale up and down without needing to configure anything further about the existing nodes.
I'm more familiar with Ansible, but I know what you mean.
Kevin Daly
@kedaly
May 17 2016 19:48
I don't want to sound anti-container or anti automation, just in my case I want to do something different.. and @neverfox yes, Kubernetes might be part of that. We're early.. But part of the design is that it might be multi-cloud, and maybe can't use Kubernetes..
Roman Pearah
@neverfox
May 17 2016 19:48
Yeah, it's still in its infancy.
I really appreciate your ideas though and I don't take it that way.
It helps me to related something concrete I've touched to what you're suggesting.
@kedaly I seem to be hearing a lot of this "leaving Hadoop" talk. Is that a real trend? Has it served its purpose in bootstrapping the "big data" trend and now people are moving on to better tech?
Kevin Daly
@kedaly
May 17 2016 19:52
Ok now I'm going to be a little more controversial :) In the puppet/chef world, it's a very "IT" view of the world and works in a very "IT" way.. So the IT Directors and Devops guys have the control of the solution and the Engineers just participate.. In my case I'm not sure if that is the right approach.. I'm trying to see if we can make the solution more in the control of Engineering if that makes sense. I'm trying to make the application development painless for the engineers..
Roman Pearah
@neverfox
May 17 2016 19:52
That makes a lot of sense (being an engineer).
We don't have DevOps people and have to play both roles, so we tend to find solutions that make our lives easier.
Our ideal is that a git push and an environment flag should get things to the QA people, rinse and repeat.
Kevin Daly
@kedaly
May 17 2016 19:55
@neverfox I think there are solid use cases for Hadoop, but for most companies, they just jumped on it without understanding it well.. Sort of "we need a big data project".. It's a very "job" oriented, batch way of thinking.. As the business needs more real time data, you run into limitations.. Streaming and Lambda architectures are becoming a better solution.. Or in our case we are going to do it all in memory (or at least Nvme) using Ignite.
Roman Pearah
@neverfox
May 17 2016 19:56
I just caught wind of Ignite myself and was very intrigued.
Kevin Daly
@kedaly
May 17 2016 19:56
It seems well thought out.
By the way god help me if our devops guys read this Glitter Thread :)
Roman Pearah
@neverfox
May 17 2016 19:57
It one of those things though that when you see it, you're all "oh crap" at the sheer prospect of absorbing the concepts.
So much tech, so little time.
I think Gitter detects and auto-bans DevOps ;)
Kevin Daly
@kedaly
May 17 2016 19:59
I hope.. I just find it interesting that they always seem to have more control than the Engineering group who are actually designing the solution..
Roman Pearah
@neverfox
May 17 2016 19:59
We can't have that.
Kevin Daly
@kedaly
May 17 2016 20:00
I understand that they have to run it and cost it.. but I always have a spreadsheet attached with my designs that specifically spells EXACTLY what it will cost.. and all of the SLA's that they can "buy" with more money or less money.
Roman Pearah
@neverfox
May 17 2016 20:00
That's smart.
Jordan Halterman
@kuujo
May 17 2016 20:14
We also recently entirely abandoned Hadoop
For Spark that is
I was sort of against that decision. My experience with Spark has not been that great. In my experience, it's conceptually awesome, but in practice on real big data it still struggles a lot where Hadoop doesn't, and those are the problems in solving now
But we teach data science courses that involve Spark, so I suppose we can't not do Spark. So instead, we're solving some of Spark's problems to make it more accessible to data scientists who are not engineers. In theory, it's already supposed to be accessible, but in practice it's more complicated than that
But Hadoop will be abandoned as more of these types of systems become stable
Kevin Daly
@kedaly
May 17 2016 21:52
You can use Ignite as a DS for Spark streaming