These are chat archives for atomix/atomix

6th
Mar 2016
Richard Pijnenburg
@electrical
Mar 06 2016 00:01
im off to grab some sleep. @kuujo feel free to ping me here or in the repo about your idea.
Jordan Halterman
@kuujo
Mar 06 2016 08:42
I was at a UFC party at work but I’m back now :-P
err… was at lunch and then a UFC party
Richard Pijnenburg
@electrical
Mar 06 2016 08:44
Haha fair enough :) these things need to happen too :)
Jordan Halterman
@kuujo
Mar 06 2016 08:47
:beers:
Richard Pijnenburg
@electrical
Mar 06 2016 08:48
parties are always good :-)
Had drinks all 4 days at my new job so started good as well :p
Jordan Halterman
@kuujo
Mar 06 2016 08:49
New office has an 18 foot screen. They say it’s for the DataScience school but I think it’s really for sporting events :-P
haha
Richard Pijnenburg
@electrical
Mar 06 2016 08:49
Hahahah so true yeah :D
Jordan Halterman
@kuujo
Mar 06 2016 08:50
Anyways… I will elaborate on what I was talking about soon. Gotta finish hacking on some stuffs
Richard Pijnenburg
@electrical
Mar 06 2016 08:50
No worries bud :-) Looking forward to your idea
I’ve been trying to come up with some ideas for the project. But my lack of java knowledge makes it hard to form an idea of how some components should work :-(
Jordan Halterman
@kuujo
Mar 06 2016 08:55
well… I think we have to define some goals here…
from what I know of Logstash (which I haven’t used in a few years), the general goal is shipping logs from point A to point B. My question is, what problem are we trying to solve here? Are we trying to increase the throughput of log shipping by scaling it across multiple nodes? Should we assume we’re shipping logs to ES specifically?
Richard Pijnenburg
@electrical
Mar 06 2016 08:57
I basically want to take logstash from a single node perspective with very little persistency to a multi node setup to reach some goals
Jordan Halterman
@kuujo
Mar 06 2016 08:58
gotcha… so here’s my question
Richard Pijnenburg
@electrical
Mar 06 2016 08:58
  1. Have more throughput by adding nodes
  2. Simplify management
  3. More dynamic setup
Jordan Halterman
@kuujo
Mar 06 2016 09:01
We were looking at DistributedTaskQueue to provide persistence/fault tolerance for data queued to go to the output. However, the thing is it’s not necessarily true that any persistence is needed. Typically (at least the way we’ve used LS) data is already persisted on the input (is that even true?). So, if the goal is to increase throughput then a persisted input can act as persistence, and output can be distributed among multiple nodes. What this would look like is a DistributedGroup on the input side that distributes writes to members of the group:
DistributedGroup group = atomix.getGroup(“foo”).get();
PartitionGroup partitions = group.partition(3);
partitions.partition(“foo”).connection().send(“foo”, “bar”);
ugh
Richard Pijnenburg
@electrical
Mar 06 2016 09:02
With LS nothing is persisted :-( that’s one of the big problems. you have to rely on an external thing
Jordan Halterman
@kuujo
Mar 06 2016 09:02
anyways, if data is persisted in the input, then even though a connection is a technically unreliable transport, retries can make it reliable if the output is idempotent
so, we can’t assume an input is persisted
alright… there’s still a solution for this :-)
Richard Pijnenburg
@electrical
Mar 06 2016 09:03
even if you have something like reading from a queue like redis or rabbitmq. once its read by logstash you have a chance of losing it.
Jordan Halterman
@kuujo
Mar 06 2016 09:03
yeah
gotcha
Richard Pijnenburg
@electrical
Mar 06 2016 09:03
but an other problem i also try to solve is ease of management.
Jordan Halterman
@kuujo
Mar 06 2016 09:03
so, there’s still a solution for this that’s faster than DistributedTaskQueue was and still fault tolerant
Richard Pijnenburg
@electrical
Mar 06 2016 09:04
for example if you have 10 nodes now and use config management and have a plugin you only want to run once like IRC input. you have to make special statements.
I want those kind of things being done automatically
by using the distributed lock
Jordan Halterman
@kuujo
Mar 06 2016 09:04
yeah
makes sense that’s awesome
Richard Pijnenburg
@electrical
Mar 06 2016 09:05
and in raft i can store the last state so that if that node that was running it dies or what ever. the plugin can be launched somewhere else with a known state
Jordan Halterman
@kuujo
Mar 06 2016 09:05
yeah totally
Richard Pijnenburg
@electrical
Mar 06 2016 09:05
i also want end to end persistency
so that once an input got the data, the original message is persisted somewhere.. so that if something goes wrong in the pipeline. we still have the original message
and can replay it
Jordan Halterman
@kuujo
Mar 06 2016 09:07
Indeed… so here’s another question: does the buffer need to be larger than memory? Probably so, right?
Richard Pijnenburg
@electrical
Mar 06 2016 09:09
possibly yeah. for example if the output is blocked we still want to keep accepting events
that’s important for those things that do ‘fire and forget’ stuff
like switches / firewalls
and things that have no real state
Jordan Halterman
@kuujo
Mar 06 2016 09:09
yeah
Richard Pijnenburg
@electrical
Mar 06 2016 09:11
i looked into re-using the existing plugins of logstash which are in ruby but it seems to be hard to call ruby code from java
and the ways that are there don’t seem really performant either
so most likely have to write plugins in java
Jordan Halterman
@kuujo
Mar 06 2016 09:15
cool
not cool but sounds good
:-P
Richard Pijnenburg
@electrical
Mar 06 2016 09:15
prefrably in a way so others can build plugins and easy get it in the system
so most likely a plugin script that can download jar files
most likely from maven :-)
Jordan Halterman
@kuujo
Mar 06 2016 09:43
@electrical can you help me with something?
Richard Pijnenburg
@electrical
Mar 06 2016 09:43
of course
Jordan Halterman
@kuujo
Mar 06 2016 09:44
I’m trying to follow this documentation, but I’m relatively a VM newb: https://github.com/hortonworks-gallery/ambari-zeppelin-service#setup-pre-requisites
find the IP address of the VM and add an entry into your machines hosts file… not sure how to do that, figured you might know
getting sidetracked with real work stuff :-)
Richard Pijnenburg
@electrical
Mar 06 2016 09:45
hehe
Jordan Halterman
@kuujo
Mar 06 2016 09:46
I googled around but not much luck
Richard Pijnenburg
@electrical
Mar 06 2016 09:46
ah. that’s just to have a name so you can connect to it instead of an ip
not super important
Jordan Halterman
@kuujo
Mar 06 2016 09:47
gotcha
well… I guess I’m having trouble getting the IP of the VM in general then
Richard Pijnenburg
@electrical
Mar 06 2016 09:52
using virtualbox ?
Jordan Halterman
@kuujo
Mar 06 2016 09:52
ahh nvm got it
hmm maybe not
yeah virtualbox
VBoxManage guestproperty get "Hortonworks Sandbox with HDP 2.4" "/VirtualBox/GuestInfo/Net/0/V4/IP”
Value: 10.0.2.15
but doesn’t seem like I can ssh into that IP
Richard Pijnenburg
@electrical
Mar 06 2016 09:56
hmm what interfaces do you have active on your machine ? is one of those related to that IP ?
Jordan Halterman
@kuujo
Mar 06 2016 09:59
doesn’t look like it
Richard Pijnenburg
@electrical
Mar 06 2016 09:59
hmm. usually virtualbox should give an ip that is on your machine
ever had issues with virtualbox before?
Jordan Halterman
@kuujo
Mar 06 2016 10:02
nope
Richard Pijnenburg
@electrical
Mar 06 2016 10:03
hmm weird suff
Jordan Halterman
@kuujo
Mar 06 2016 10:03
was that the right command to find the IP?
Richard Pijnenburg
@electrical
Mar 06 2016 10:03
usually it should give an ip based on the virtualbox interfaces
yeah. that should be good
Jordan Halterman
@kuujo
Mar 06 2016 10:04
bah oh well
sure it’s user error :-P
Richard Pijnenburg
@electrical
Mar 06 2016 10:05
haha
Richard Pijnenburg
@electrical
Mar 06 2016 11:05
trying to understand the config factory thing that @jhalterman linked. getting a headache from it :-(