These are chat archives for atomix/atomix

30th
Mar 2016
Jordan Halterman
@kuujo
Mar 30 2016 00:27
Well... I started implementing Iterator-per-follower and realized it's going to have little to no affect in practice because of existing optimizations. The index is optimized for O(1) lookups of entries in uncompacted segments. Because all the entries are present, a lookup is just a matter of multiplying the offset by 8 and reading the entry's position from memory. So, Iterator-per-follower would only be beneficial to followers that are far enough behind the leader to be receiving entries from compacted segments, and those don't impact write latency anyways. There's still some general overhead to the binary search, but the case of catching up a sever is likely rare enough not to have to worry about it now.
Richard Pijnenburg
@electrical
Mar 30 2016 07:30
Morning
Jordan Halterman
@kuujo
Mar 30 2016 07:30
hey
Richard Pijnenburg
@electrical
Mar 30 2016 07:31
How are you?
Jordan Halterman
@kuujo
Mar 30 2016 07:31
good good
Richard Pijnenburg
@electrical
Mar 30 2016 07:33
Still trying to wrap my head around the Executor service stuff. I can get it work and push entries into a LinkedList ( not sure if that’s wise to use as a queue ? ) but for some reason the action gets done in the main tread and not in a new one :-(
will have to start with a simpler example and build from there i think
Jordan Halterman
@kuujo
Mar 30 2016 07:34
not sure what you mean
what’s the purpose of using a queue?
Richard Pijnenburg
@electrical
Mar 30 2016 07:34
I need something so i can pass events on from inputs to filters to outputs.
I’m using the runnable way to run something and then push that data into the ‘queue'
Jordan Halterman
@kuujo
Mar 30 2016 07:36
Executor… Executors are wrappers around queues, so using queues externally is redundant. Every time you call executor.execute(…), the callback is placed on a queue that’s read by the executor’s thread. So, you can pass events from inputs to filters to outputs through executors.
SomeEvent event = ...;
executor.execute(() -> {
  someFilter.filter(event)
});
executor.execute(runnable) says “place this callback on a queue and execute it ASAP in the proper thread"
this is why there are virtually no queues anywhere in Copycat… it’s all done internally by Executors
Richard Pijnenburg
@electrical
Mar 30 2016 07:38
Yeah okay. but i mean the result of thing that’s running. so for example i got an input running in a thread which runs continuesly. that needs to push out the data it gets to somewhere
another Executor
Running a loop inside an Executor is reduntant and makes the Executor framework useless. It’s effectively an event loop. There’s nothing that can be done in a loop that can’t be done without a loop and with executors
if you’re doing looping and pulling from queues might as well just create your own Executor by just starting a Thread
Richard Pijnenburg
@electrical
Mar 30 2016 07:41
for filters it makes sense to use the executor i think since that will do the run every time on every event. but inputs and outputs are a bit different i think. for example an output can do bulk sending. so it collects multiple events and then sends it instead of acting on a single event
Jordan Halterman
@kuujo
Mar 30 2016 07:41
executor.execute(() -> {
  Object result = something();
  nextExecutor.execute(() -> {
    Object nextResult = somethingElse(result);
    lastExecutor.execute(() -> {
      doIt(nextResult);
    });
  });
});
It just requires recursion to make nice
well, it’s not like an input’s state has to last as long as a single callback. It still makes sense for a bunch of execute calls to result in one eventual output
just think of execute as put on a queue and pull off a queue in a loop in another thread, because that’s all it is. If you’re repeating that pattern then that’s a bad sign
Richard Pijnenburg
@electrical
Mar 30 2016 07:44
and what if i have an IRC input for example. that should run continuesly and push events into something for processign right? Don’t see that as an executor ?
Jordan Halterman
@kuujo
Mar 30 2016 07:44
the place that it’s pushing events should be an executor
it’s still perfectly fine to do a loop with executors… it’s just recursion, and the benefit is it’s an event loop, so a many logical areas of code can share the same thread
it’s just a bit complicated to program event loops
Richard Pijnenburg
@electrical
Mar 30 2016 07:47
huh? i lost you there. What i understand from executors is that they are short lived ‘jobs’ right?
Jordan Halterman
@kuujo
Mar 30 2016 07:49
public void doSomething(Executor executor) {
  executor.execute(() -> {
    doSomethingElse();
    doSomething(executor);
  });
}
This code amounts to an endless loop, but it’s done in a way that allows other events to be executed in the same thread without impacting the loop. This is how event loops sort of mimic multiple threads with a single thread. Some other code could call executor.execute(this::doSomthingElseAgain) on the same Executor and it’s not blocked by the loop
public void loop(Executor executor) {
  executor.execute(() -> loop(executor));
}
for (;;) {
  ...
}
Both of these blocks of code are functionally equivalent, but the Executor version just has a queue in its loop
lots of recursion in asynchronous programming
Richard Pijnenburg
@electrical
Mar 30 2016 07:54
okay.. how do i handle things that should run continuesly and never end unless i stop the application. for example the IRC input should always be online. that happens at initialization ( init method ) then i have a ‘run’ method that actually gets the data. you mean that the run method should run in an executor?
back in 2
Jordan Halterman
@kuujo
Mar 30 2016 07:54
probably not
but its output should be execute()d
to put its output on a queue for the filter(s)
but that really just depends on how the IRC input is implemented
Jordan Halterman
@kuujo
Mar 30 2016 08:03
I know I’m being extra confusing :-)
Richard Pijnenburg
@electrical
Mar 30 2016 08:04
Im back.
uhm. yeah, very confusing :-)
so for example ( taking the IRC input as an example ). It has the init method to setup the connection. that should only die if i shutdown. the run method does something with the connection and gets the data and pushes it into something so the filter threads can consume it
and everything has to be abstracted in some way so i can build it up all dynamically based on a config
Jordan Halterman
@kuujo
Mar 30 2016 08:10
right… so when the input has some data, it outputs it through some interface which calls executor.execute(() -> doSomethingWithTheDataNextFilterPleaseAndThankYou(data)). Good design would dictate the input has access to some object like an InputCollector, and for each record it calls collector.emit(data). The implementation of InputCollector has an Executor and understands where the data is going (to a filter).
public class TheInputCollector implements InputCollector {
  @Override
  public void emit(Object data) {
    executor.execute(() -> filter.filter(data));
  }
}
the Executor then acts as a queue between the input and the filter
Richard Pijnenburg
@electrical
Mar 30 2016 08:15
Okay. that makes sense so far. the input needs to have access to something indeed that can process the data further.
so the IRC input calls the emit method to send the data to
Jordan Halterman
@kuujo
Mar 30 2016 08:16
yeah
the input should probably just run in its own Thread
Richard Pijnenburg
@electrical
Mar 30 2016 08:17
but still trying to understand the abstraction of it. because the input should have no direct knowledge of anything that happens behind him. the pipeline manager part only should
that controls how the pipeline is build up
like IRC input, Grok filter, Date filter, Elasticsearch output
the pipeline manager then has to initialize all the plugins and starts their threads.
Jordan Halterman
@kuujo
Mar 30 2016 08:20
yeah… well there are a bunch of patterns. The OutputCollector (that’s what it should be named) pattern is taken from Storm which is designed in a similar way. Storm’s “bolts” never know where their output is going, they’re just given an OutputCollector which is configured with that knowledge, and bolts emit output to the collector which in turn sends it to the right place. Just because the input has an OutputCollector doesn’t mean the input has to know where that output collector sends data. The OutputCollector should be configured by the pipeline manager
Richard Pijnenburg
@electrical
Mar 30 2016 08:27
okay.. the output collector is something that sits between each plugin i assume?
Jordan Halterman
@kuujo
Mar 30 2016 08:27
yeah
Richard Pijnenburg
@electrical
Mar 30 2016 08:27
so for each plugin i need to create a collector which then sends it to the next plugin
Jordan Halterman
@kuujo
Mar 30 2016 08:27
right
Basically, you’re injecting a sort of connection between plugins, so the plugins don’t need to know where they’re sending data, something external tells them where by giving them a properly configured object. That seems the most obvious way to go about it
ugh I gotta go to sleep
Richard Pijnenburg
@electrical
Mar 30 2016 08:33
collector = something;
myplugin = new IRCInput(collector);
myplugin.run(); # This should be run in a separate thread. perhaps a threadpool ?

Filtercollector = something;
myfilter = new GrokFilter(Filtercollector);
myfilter.filter();
Jordan Halterman
@kuujo
Mar 30 2016 08:33
indeed
Richard Pijnenburg
@electrical
Mar 30 2016 08:33
or put the collector part in the run/filter methods
might be a bit easier
but the filter function for example. that has to act on a per event basis
so that might work a bit differently
?
anyway. can talk further your tomorrow
oh yeah. the collector instance of the input has to be passed on to the filter parts
so it knows how to get the event i guess
I’m really trying to understand it :-)
Jordan Halterman
@kuujo
Mar 30 2016 09:19
I can make a couple thorough examples when I'm not on myway to bed :-) It's quick to throw together an example like this
Richard Pijnenburg
@electrical
Mar 30 2016 09:19
that would be great! and no worries. sleep is important. i didn’t sleep enough last night and paying for it now.
Madan Jampani
@madjam
Mar 30 2016 17:15
@kuujo regarding the release, are you planning for one final rc before the official release? Might be useful to have one just to run some quick integration tests. I did pull your latest pipelining changes. In my set up I did not see any significant difference in performance. There was no degradation in throughput, which is good. I'd expect that feature to have a positive impact when RTT between nodes is high. I'll do a quick test run with your latest multi-threaded appenders today.
I couldn't find anything noteworthy in my attempts to break rc5. Which could only mean one thing: I didn't try hard enough :wink2: In any case I am very happy with the stability and performance (even without the pipelining and multi-threading changes)
Jordan Halterman
@kuujo
Mar 30 2016 17:20
@madjam I only saw a performance difference after merging and including atomix/catalyst#44. It looks like I did not include the bump to 1.0.6-SNAPSHOT for the Catalyst dependency in that PR :-( so that may be why you didn’t see any difference either
It turned out that the NettyTransport was preventing some concurrency because of the way it was written. I originally did not see any performance improvement using a thread-per-follower in the leader, but after that change I saw around a 15% gain
we can do another RC sure
I was going to exclude the threading change from the release just in case, but we can include it and do one more RC. I’m feeling pretty happy too, but more breaking would be nice
Madan Jampani
@madjam
Mar 30 2016 17:24
Absolutely. That is what I'll be trying next couple of days :)
Jordan Halterman
@kuujo
Mar 30 2016 17:24
great
Madan Jampani
@madjam
Mar 30 2016 17:25
So what is tentatively plan? If nothing breaking shows up, release on Friday or Monday?
Jordan Halterman
@kuujo
Mar 30 2016 17:25
yeah that seems good… maybe Monday
Madan Jampani
@madjam
Mar 30 2016 17:25
Sounds good.
Jordan Halterman
@kuujo
Mar 30 2016 17:25
gives me a weekend to make my own attempt too
Madan Jampani
@madjam
Mar 30 2016 17:27
If you are planning cut a RC today, I'll use that.
Jordan Halterman
@kuujo
Mar 30 2016 17:28
awesome I’ll do that
Jordan Halterman
@kuujo
Mar 30 2016 18:06
k I’m releasing Catalyst/Copycat without atomix/copycat#206 which I closed after some more testing
Madan Jampani
@madjam
Mar 30 2016 18:07
Sounds good.
Jordan Halterman
@kuujo
Mar 30 2016 18:49
k both are released
Madan Jampani
@madjam
Mar 30 2016 19:25
@kuujo: Is a Atomix rc coming as well?
Jordan Halterman
@kuujo
Mar 30 2016 20:49
I’m finishing up a PR right now that I’d like to get in. I’ll probably be done with it this evening
Richard Pijnenburg
@electrical
Mar 30 2016 22:28
Pfff finally home.
Jordan Halterman
@kuujo
Mar 30 2016 23:04
did you get lost? :-)