These are chat archives for atomix/atomix

4th
Jan 2016
Jordan Halterman
@kuujo
Jan 04 2016 00:00
oh yeah duh
so the second was “inconsistent”
because it was already received
Richard Pijnenburg
@electrical
Jan 04 2016 00:01
ahh okay. yeah it didn't print it out but only logged it
but it never printed out the 9999 others
sorry 999
hmm. every time i run the client the task id number does increment.
should the code i used for consuming the thing run in a loop ?
or polling if there are items in the queue
something that was mentioned in atomix/atomix#60
Jordan Halterman
@kuujo
Jan 04 2016 00:07
yeah… the DistributedTaskQueue does something like long polling internally… let me try this
Richard Pijnenburg
@electrical
Jan 04 2016 00:07
okay :-)
Jordan Halterman
@kuujo
Jan 04 2016 00:21

Here’s the test I’m using to process 10000 tasks, submitting 100 concurrently and recursively:

  public void testSubmitRecursive() throws Throwable {
    createServers(3);

    DistributedTaskQueue<String> worker = createResource().async();
    DistributedTaskQueue<String> queue = createResource().async();

    AtomicInteger counter = new AtomicInteger();
    worker.consumer(task -> {
      if (counter.incrementAndGet() % 1000 == 0) {
        System.out.println("1000 messages received");
      }
      if (counter.get() == 10000) {
        resume();
      }
    });
    submitTasks(queue);
    await(10000);
  }

  private void submitTasks(DistributedTaskQueue<String> queue) {
    for (int i = 0; i < 100; i++) {
      submitTask(queue);
    }
  }

  private void submitTask(DistributedTaskQueue<String> queue) {
    queue.submit(UUID.randomUUID().toString()).whenComplete((result, error) -> submitTask(queue));
  }

But this is way too slow for practical use in high-throughput systems. As is designed now, it’s only really useful for sparse critical tasks. The problem is even though we’re getting 100 concurrent writes to the cluster, a consumer will only ever process one at a time. It could be pretty easily improved to grab a batch from the cluster though.

Anyways, that test is working properly. The worker.consumer(…) is getting all the queued tasks, just slowly (like hundreds/sec)
Jordan Halterman
@kuujo
Jan 04 2016 00:28
batching would be way faster, and the session event system actually has the ability to batch into a single message over TCP so would even be faster than sending a bunch of requests to the client
hmm I think I found the case in which that IndexOutOfBoundsException can legitimately occur
awesome
hmmm
This is where thinking about the Raft algorithm starts making my head hurt
Jordan Halterman
@kuujo
Jan 04 2016 00:33
  • Leader A sends entries 10-13 to follower B with a commitIndex of 9
  • A leadership change occurs and leader C is elected
  • Leader C commits its own entries 10-13 via follower A (overwriting follower A’s entries) and then sends an empty AppendRequest to follower B with a commitIndex of 13
  • Leader C then sends the committed entries 10-13 to follower B who notices that they don’t match since follower B still has the entries from leader A, so it truncates its log to index 9, but the log throws an exception since those entries were committed
crazy shit
easy fix though
servers just can’t increase commitIndex past the end of their log
or past the end of the entries received in an AppendRequest
Richard Pijnenburg
@electrical
Jan 04 2016 00:35
Hahahs. I'm so happy I don't have to think about it lol
hundreds/sec wouldn't be fast enough for logstash clustering
Jordan Halterman
@kuujo
Jan 04 2016 00:37
haha no way
way too slow for me in general
Richard Pijnenburg
@electrical
Jan 04 2016 00:37
yeah
Jordan Halterman
@kuujo
Jan 04 2016 00:38
batching should be done
Richard Pijnenburg
@electrical
Jan 04 2016 00:38
yeah indeed
and i rather not use an external system for queuing.
like redis or rabbitmq
Richard Pijnenburg
@electrical
Jan 04 2016 00:43
potentially could work though. but would be nice to use a single system
Jordan Halterman
@kuujo
Jan 04 2016 00:43
yeah totally
Richard Pijnenburg
@electrical
Jan 04 2016 00:45
but there are other things i also still need to look at. Like publishing an arbitrary config to a node for example.
logstash config i mean
the queue implementation is an aribtrary solution that can be replaced with anything if i abstract it enough
Jordan Halterman
@kuujo
Jan 04 2016 00:46
good point
Richard Pijnenburg
@electrical
Jan 04 2016 00:47
as long as i have a publish and consume function in each driver its fine.
locks, grouping, config distribution are other important things
Jordan Halterman
@kuujo
Jan 04 2016 00:48
that’s what this should do best :-)
Richard Pijnenburg
@electrical
Jan 04 2016 00:48
;-)
the locks i want to use for inputs that you only want a single instance of it have running in the cluster
so one doesn't have to care where its running, as long as its somewhere
oh. and for those things, sometimes it might be required to keep track of a state. i could save that in raft as well right?
distributed hash value
or at least something that allows me to track a random string or something
Jordan Halterman
@kuujo
Jan 04 2016 00:52
yeah totally
Richard Pijnenburg
@electrical
Jan 04 2016 00:52
so that if it gets moved to an other node, it knows where node A left off
Jordan Halterman
@kuujo
Jan 04 2016 00:52
DistributedValue a map
yeah
that’s awesome
Richard Pijnenburg
@electrical
Jan 04 2016 00:53
and the only thing the current LS team thinks about is distributing the config using elasticsearch. hahaha
http://atomix.io/atomix/user-manual/collections/#distributedmap that would be the way to save a random value ?
Richard Pijnenburg
@electrical
Jan 04 2016 00:59
maybe stupid question, but whats the difference between a map and a set? map = hash, set = array ?
Jordan Halterman
@kuujo
Jan 04 2016 01:01
A map has keys, and a set is just values. In a set, values are unique. If you add foo and foo to a set, only one foo will exist. So, basically set values behave the same as map keys.
Richard Pijnenburg
@electrical
Jan 04 2016 01:01
ahhh, okay. i see
Jordan Halterman
@kuujo
Jan 04 2016 01:01
in a map keys are unique, in a set values are unique I guess is the more concise description
Richard Pijnenburg
@electrical
Jan 04 2016 01:02
completely clear :-)
i potentially could use the map for saving the config per LS node
just have to think of something how to tell the node what it is
btw, if i assign node X to group 'filters' is there a way to access that from the code? for example if i bring up a new node, can i detect that and push a config to it ?
oh yeah. 'Listening for membership changes'
Richard Pijnenburg
@electrical
Jan 04 2016 01:07
so that if i have for example a single filter node. and add a new one. i want the master process to be able to see a new node joined the filter group and push a config to it
Jordan Halterman
@kuujo
Jan 04 2016 01:07
oh yeah that’s it
Richard Pijnenburg
@electrical
Jan 04 2016 01:09
hmm..i could use a random UUID as an identifier when it registers the node and joins a group or groups. then the master process can compile the config for it. save it into a map and the node will discover that and read it and start up the pipeline(s)
uhg. so many things to think about :-)
Jordan Halterman
@kuujo
Jan 04 2016 01:13
I think the DistributedMembershipGroup assigns a unique ID (a long) to each member that can be seen across the cluster
the member id() is just the index in the Raft log for the entry that added the member
Richard Pijnenburg
@electrical
Jan 04 2016 01:14
ahh okay
so i'll defo need to use that then
hmm.. wonder how i'm going to do the following.. not sure if this is even useful.. what if i have x separate pipelines, does it make sense to be able to assign a pipeline to have x number of nodes? and let the cluster decide which nodes run which pipelines ?
Jordan Halterman
@kuujo
Jan 04 2016 01:18
I think so
Richard Pijnenburg
@electrical
Jan 04 2016 01:18
for example for pipeline 1 i want 2 filter nodes
Jordan Halterman
@kuujo
Jan 04 2016 01:18
yeah
seems like it
Richard Pijnenburg
@electrical
Jan 04 2016 01:18
for pipeline 2 i want 1 filter node.
Jordan Halterman
@kuujo
Jan 04 2016 01:18
hmm
Richard Pijnenburg
@electrical
Jan 04 2016 01:19
the only weird thing may happen is if in that case i have 2 filter nodes in total
means that 1 node will have 2 pipelines running on it
Jordan Halterman
@kuujo
Jan 04 2016 01:19
yeah
Richard Pijnenburg
@electrical
Jan 04 2016 01:20
it would allow for dynamic scaling :D
Jordan Halterman
@kuujo
Jan 04 2016 01:20
I guess you either just deal with the imbalance or make the number of filter nodes a “hint"
Richard Pijnenburg
@electrical
Jan 04 2016 01:21
how do you mean make it a hint ?
that it should run on at least 1 node ?
sort of a minimum number of nodes
or in a way that if i added a 3rd node, it would move one of the pipelines to the empty machine
Jordan Halterman
@kuujo
Jan 04 2016 01:24
yeah
I guess a hint would be a minimum
Richard Pijnenburg
@electrical
Jan 04 2016 01:25
eventually the idea is to spread the load across the cluster anyway
also the idea is to limit filter X to node.. for example a real heavy filter to run on a single machine
using all the cores
or at least 90% of it :-)
node.input: true
node.filter: true
node.output: true

pipelines:
 - pipeline-a
 - pipeline-b

filters:
  - grok
  - date
something like that possibly
as a node config
the node.* control which parts of the pipelines are allowed to run
oh and also node.leader: true and node.processor: true
not sure about the processor name though :p
Richard Pijnenburg
@electrical
Jan 04 2016 01:33
of and filter.grok.threads: 50% sets the number of threads to 50% of the cores availabe of the machine. and splits that between both pipelines.
Jordan Halterman
@kuujo
Jan 04 2016 01:33
that’s awesome
Richard Pijnenburg
@electrical
Jan 04 2016 01:33
filter.grok.pipeline-a.threads: 20% and filter.grok.pipeline-b.treads: 30%
just some ideas im playing with
or a fixed number instead of %
the only issue is now with the current design of LS filter part is you can set an overal number of threads.. for each thread it sets up a single instance of the whole filter part so that it doesn't have to put it back in a queue
and i want to avoid having to many queues everywhere
because those will need to be monitored
Jordan Halterman
@kuujo
Jan 04 2016 01:37
yeah totally
sounds fun :-)
Richard Pijnenburg
@electrical
Jan 04 2016 01:37
so maybe i should keep it to filter.threads: x
oh. and i also have an other problem to think about.. there are filters that don't thread well.. like multiline crap
Jordan Halterman
@kuujo
Jan 04 2016 01:38
oh yeah
that makes it complicated
Richard Pijnenburg
@electrical
Jan 04 2016 01:38
at least it can't be threaded for the same source.. but it can for different sources.
so i will need something of a stream ID
so that all the data for the same stream gets sent to the same filter instance.
for others it can be on a spread basis
those single threaded filters should have a 'stream_identity' already anyway
Jordan Halterman
@kuujo
Jan 04 2016 01:43
madness :-)
Richard Pijnenburg
@electrical
Jan 04 2016 01:43
but how do i ensure the right stream ID gets to the right threaded instance of that filter.... ARGH!
and i can't do a thread per stream ID because it could potentially be quite a lot.
or drop the multiline filter :-)
there is the multiline codec anyway
Jordan Halterman
@kuujo
Jan 04 2016 01:46
For really complex things, electing a leader simplifies a lot because it allows you to control the cluster in a single process while still maintaining fault tolerance. That’s about the best it gets, but you still have to make sure if the leader does crash and a new one is elected it can pick up where it left off - see the state that the previous leader saw
Richard Pijnenburg
@electrical
Jan 04 2016 01:47
hehe yeah indeed
Richard Pijnenburg
@electrical
Jan 04 2016 01:53
hmm. not sure if setting threads makes sense in the node config. should be in the general pipeline config i think.
if i say this pipeline has an input with 10 threads. and i got 2 nodes that handle that pipeline it can spread it across the 2 nodes :-)
Jordan Halterman
@kuujo
Jan 04 2016 01:55
yeah I think you’re right
less configuration related to any specific node is better
Richard Pijnenburg
@electrical
Jan 04 2016 01:55
yeah indeed
can i let the client bind to a certain IP if i have multiple IP's on a system?
or doesn't that make any sense?
Jordan Halterman
@kuujo
Jan 04 2016 01:58
hmm
you can do that
but it’s harder when you account for fault tolerance since a specific IP could not be available
Richard Pijnenburg
@electrical
Jan 04 2016 01:58
hmm yeah indeed
Jordan Halterman
@kuujo
Jan 04 2016 01:58
Atomix abstracts that away
Richard Pijnenburg
@electrical
Jan 04 2016 01:58
okay
Jordan Halterman
@kuujo
Jan 04 2016 01:59
but Copycat has a ServerSelectionStrategy that allows you to prioritize servers
Richard Pijnenburg
@electrical
Jan 04 2016 01:59
just thinking about different settings atm :-)
Jordan Halterman
@kuujo
Jan 04 2016 01:59
Atomix just connects to the first one that is playing nicely
Richard Pijnenburg
@electrical
Jan 04 2016 01:59
already have a setting for the raft leaders. raft.hosts:
Richard Pijnenburg
@electrical
Jan 04 2016 02:05
also doing some paths for the logs and cluster state data.
Jordan Halterman
@kuujo
Jan 04 2016 02:06
I’m busy fighting with Java’s generics
Richard Pijnenburg
@electrical
Jan 04 2016 02:06
hehe okay
Jordan Halterman
@kuujo
Jan 04 2016 02:06
type inference gets a little complicated
trying to add configurations for resources
Richard Pijnenburg
@electrical
Jan 04 2016 02:06
ahh okay
hmm. cluster wide config things.. like what queue driver to use.. makes no sense to have that in each node config right? or am i going nuts ?
but then again, have to set it somewhere
Jordan Halterman
@kuujo
Jan 04 2016 02:08
hmm
yeah that part is sometimes hard about distributed systems
Richard Pijnenburg
@electrical
Jan 04 2016 02:08
and managing a file with automation tooling is easier then a REST api call for example.
Jordan Halterman
@kuujo
Jan 04 2016 02:08
I think it depends on whether it’s okay for configurations for different nodes to diverge
if it’s not, users can shoot themselves in the foot
Richard Pijnenburg
@electrical
Jan 04 2016 02:09
yeah indeed.
i think it does make sense to split out node specific settings and cluster wide settings.
for example in this case its about the queuing system..which one to use.. internal, redis or rabbitmq for example
all the nodes should use the same thing
Richard Pijnenburg
@electrical
Jan 04 2016 02:15
but would mean i need to build a whole Rest API as well :-)
I'll keep it to reading a file for now
easier to start with i guess
btw, would it be fine if i run the client or master in its own thread? or will that mess things up? want to make sure i can name them and control it
at least outside of the main thread
Jordan Halterman
@kuujo
Jan 04 2016 02:18
yeah you can do that
Richard Pijnenburg
@electrical
Jan 04 2016 02:21
because i already need to manage the threads of the different plugins anyway
anyway... 2.23 am for me.. think its bed time :p
Jordan Halterman
@kuujo
Jan 04 2016 02:24
haha indeed
see you later
Richard Pijnenburg
@electrical
Jan 04 2016 02:24
later man. and thank you very much for the help so far :-)
Jordan Halterman
@kuujo
Jan 04 2016 07:01
Thank you for the help so far… getting some awesome work done! Gonna push a release candidate this week afer cleaning up a few APIs
Richard Pijnenburg
@electrical
Jan 04 2016 11:33
Not sure if im expecting the wrong with with the lock part.. what i expected is that if a node ( client ) dies that has the lock. an other node can take it over.
but that doesn't seem to happen
Richard Pijnenburg
@electrical
Jan 04 2016 11:41
oh, and something you might wnat to look into as well is the losing of connection stuff. https://gist.github.com/electrical/c1947bf19e8a5914e586
only happens when doing a GetResource command
perhaps an other race condition or threading issue in the client ?
Richard Pijnenburg
@electrical
Jan 04 2016 13:12
Hmm.. only seeing one PublishRequest and then does nothing anymore
I have modified the client so i can specify a role. a producer ( which creates 10 events ) or consumer that just reads them and prints it out
14:13:31.023 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Sending CommandRequest[session=1814, sequence=2, command=InstanceCommand[resource=1816, command=ResourceCommand[command=io.atomix.messaging.state.TaskQueueCommands$Submit@2ab38b4f, consistency=LINEARIZABLE], consistency=LINEARIZABLE]]
14:13:31.025 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Sending CommandRequest[session=1814, sequence=3, command=InstanceCommand[resource=1816, command=ResourceCommand[command=io.atomix.messaging.state.TaskQueueCommands$Submit@944b14e, consistency=LINEARIZABLE], consistency=LINEARIZABLE]]
14:13:31.026 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Sending CommandRequest[session=1814, sequence=4, command=InstanceCommand[resource=1816, command=ResourceCommand[command=io.atomix.messaging.state.TaskQueueCommands$Submit@62b53978, consistency=LINEARIZABLE], consistency=LINEARIZABLE]]
14:13:31.027 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Sending CommandRequest[session=1814, sequence=5, command=InstanceCommand[resource=1816, command=ResourceCommand[command=io.atomix.messaging.state.TaskQueueCommands$Submit@47fcff21, consistency=LINEARIZABLE], consistency=LINEARIZABLE]]
14:13:31.029 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Sending CommandRequest[session=1814, sequence=6, command=InstanceCommand[resource=1816, command=ResourceCommand[command=io.atomix.messaging.state.TaskQueueCommands$Submit@97faf15, consistency=LINEARIZABLE], consistency=LINEARIZABLE]]
14:13:31.030 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Sending CommandRequest[session=1814, sequence=7, command=InstanceCommand[resource=1816, command=ResourceCommand[command=io.atomix.messaging.state.TaskQueueCommands$Submit@251db1b5, consistency=LINEARIZABLE], consistency=LINEARIZABLE]]
14:13:31.031 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Sending CommandRequest[session=1814, sequence=8, command=InstanceCommand[resource=1816, command=ResourceCommand[command=io.atomix.messaging.state.TaskQueueCommands$Submit@6e283297, consistency=LINEARIZABLE], consistency=LINEARIZABLE]]
14:13:31.038 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Sending CommandRequest[session=1814, sequence=9, command=InstanceCommand[resource=1816, command=ResourceCommand[command=io.atomix.messaging.state.TaskQueueCommands$Submit@4e8dcd84, consistency=LINEARIZABLE], consistency=LINEARIZABLE]]
14:13:31.040 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Sending CommandRequest[session=1814, sequence=10, command=InstanceCommand[resource=1816, command=ResourceCommand[command=io.atomix.messaging.state.TaskQueueCommands$Submit@3afed447, consistency=LINEARIZABLE], consistency=LINEARIZABLE]]
14:13:31.042 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Sending CommandRequest[session=1814, sequence=11, command=InstanceCommand[resource=1816, command=ResourceCommand[command=io.atomix.messaging.state.TaskQueueCommands$Submit@72a64ab4, consistency=LINEARIZABLE], consistency=LINEARIZABLE]]
14:13:31.056 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Received CommandResponse[status=OK, index=1818, result=null]
14:13:31.070 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Received CommandResponse[status=OK, index=1819, result=null]
14:13:31.071 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Received CommandResponse[status=OK, index=1820, result=null]
14:13:31.072 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Received CommandResponse[status=OK, index=1821, result=null]
14:13:31.072 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Received CommandResponse[status=OK, index=1822, result=null]
14:13:31.073 [copycat-client-2] DEBUG i.a.c.client.session.ClientSession - 1814 - Received CommandResponse[status=OK, index=1823, result=null]
the fact they all have result=null is confusing
Richard Pijnenburg
@electrical
Jan 04 2016 15:58
is there a way btw to see how many events are in the task queue stored? :-) thinking ahead for metrics
Richard Pijnenburg
@electrical
Jan 04 2016 17:32
@kuujo just did an other test. interesting stuff. https://gist.github.com/electrical/1308dd46ad92e71aa8e8
3rd try makes it fail completely
only processes a single event on startup
and now crap about Failed to connect to the cluster
while the 3 masters are just fine
What happens when I client dies or closes gracefully is its session on the server is closed, the the LockState should grant the lock to the next in the queue...
Richard Pijnenburg
@electrical
Jan 04 2016 18:37
hmm, it didn't in my case.
Jordan Halterman
@kuujo
Jan 04 2016 18:37
Ditto leader elecrions
They both use essentially the same logic. Does the leader election example switch leaders when a leader is killed? It does for me, but I think there could still be a bug in pushing messages from the server to clients.
Richard Pijnenburg
@electrical
Jan 04 2016 18:38
i started client 1 with a lock.. then client 2 with the same lock.. client 1 reported it got the lock. client 2 failed. when i kill client 1, i thought client 2 would pick it up but it didn't.
i can run an other test if needed
Jordan Halterman
@kuujo
Jan 04 2016 18:40
I'll set up a quick test actually
Richard Pijnenburg
@electrical
Jan 04 2016 18:40
okay
and the distributed task queue seems to be misbehaving as well
but the leaders are still stable :D
Jordan Halterman
@kuujo
Jan 04 2016 18:41
Has to be a bug in session events. I fixed a bug there the other day but event-related tests are still failing so probably still something missing there.
Richard Pijnenburg
@electrical
Jan 04 2016 18:41
okay
Jordan Halterman
@kuujo
Jan 04 2016 19:12
@electrical Not sure how interested you are in this, but here's what should be happening: Sending messages from servers to clients in a fault tolerant manner is a little complex, but not too much so since we use Raft to do most of the work. The way it works is session "event" messages are associated with the index of command that resulted in the message being published. The indexes are used to ensure messages sent by a server are received by the client in full and in order. That's the simple part. The hard part comes with fault tolerance. A client is only ever connected to one server at a time. Thus, to send an event to a client, it must be sent by the server to which the client is connected. For fault tolerance, all servers effectively "publish" each event and hold them in memory until acknowledged by the client, but only the server to which the client is connected actually sends a message. If the client switches servers for some reason, the server to which it reconnects will have the events that haven't been received by the client. Even though they're held in memory, event messages are persistent in that they're constructed from commands applied to a state machine, and those commands will be replayed after a crash and recovery.
What the lock and leader election state machines do is publish event messages in reaction to a session holding the lock or election being closed. My point in telling you this is so you can also follow logs if you want. What we should be seeing is when the lock holder is killed, eventually its session is expired. The leader will commit an UnregisterEntry, and when that entry is applied to one of the servers' state machines, we should see it send a PublishRequest to the next client waiting for the lock.
K gonna set up a test
@electrical Does the leader election test work as expected for you? I mean, if you kill a node that prints "Elected leader!" do you see another one get elected?
Richard Pijnenburg
@electrical
Jan 04 2016 19:15
sorry, took me a while to read and understand it :-)
yeah. the leader election stuff in the example works
Jordan Halterman
@kuujo
Jan 04 2016 19:15
I did see an oddity in the client yesterday where it kept having to reconnect to the cluster, but it seemed to have gone away after I fixed the Netty thing. Is that what you're also seeing with the client?
Richard Pijnenburg
@electrical
Jan 04 2016 19:15
i can kill the current leader and an other one becomes the leader.
Jordan Halterman
@kuujo
Jan 04 2016 19:15
Damn
Hmm alright I'll just set up a lock example and see what happens
May just have to run the Copycat tests a bunch to try to get a failure with debug logs
Richard Pijnenburg
@electrical
Jan 04 2016 19:17
    DistributedLock lock = client.create("lock", DistributedLock.class).get();

    lock.tryLock().thenAccept(locked -> {
     if (locked) {
       System.out.println("Lock acquired!");
     } else {
       System.out.println("Lock failed!");
     }
   });
that's the example i have now
Jordan Halterman
@kuujo
Jan 04 2016 19:18
Ahh
tryLock is probably returning an exception immediately
Richard Pijnenburg
@electrical
Jan 04 2016 19:18
ahh okay
Jordan Halterman
@kuujo
Jan 04 2016 19:18
lock() will block
If another process already has the lock
Java's Lock docs might have better explanations, but it's essentially the swme
Richard Pijnenburg
@electrical
Jan 04 2016 19:19
i see.
Jordan Halterman
@kuujo
Jan 04 2016 19:20
If you use whenComplete((result, error) ...) you should see a non-null error id you call tryLock and another process already has it
If you call lock it should just complete when the other process crashes or releases it
Richard Pijnenburg
@electrical
Jan 04 2016 19:21
Yeah you are right
:-)
works now
Jordan Halterman
@kuujo
Jan 04 2016 19:24
I still need to try out DistributedWorkQueue which hung too right? Maybe I'll set that one up instead. I still think there's something not obvious being missed there
Richard Pijnenburg
@electrical
Jan 04 2016 19:24
yeah. that one does weird things
Jordan Halterman
@kuujo
Jan 04 2016 19:27
Alright good
Richard Pijnenburg
@electrical
Jan 04 2016 19:31
    DistributedTaskQueue queue = client.create("queue1", DistributedTaskQueue.class).get();

    if (role.equals("producer")) {
      System.out.println("Starting producer");
      for (int i = 0; i < 10; i++) {
        queue.submit("Task ID: "+i);
      }
    }

    if (role.equals("consumer")) {
      System.out.println("Starting consumer");
      queue.consumer( task -> {
          System.out.println("Task: "+task);
      });
    }
thats what i have as an example
with one of the args i set the role to producer or consumer
Richard Pijnenburg
@electrical
Jan 04 2016 19:40
clean state
started the consumer first and then the producer
Jordan Halterman
@kuujo
Jan 04 2016 19:40
Awesome
Richard Pijnenburg
@electrical
Jan 04 2016 19:41
I'm afraid to push more events through it since it fails with 10 :p
Richard Pijnenburg
@electrical
Jan 04 2016 19:47
leaders are still on Term 1 btw :D
so mazing!
amazing even
lol
Jordan Halterman
@kuujo
Jan 04 2016 19:57
I have an example made… momentito
Richard Pijnenburg
@electrical
Jan 04 2016 19:57
okay.
Jordan Halterman
@kuujo
Jan 04 2016 19:57
of course it’s still the leader! ;-)
Richard Pijnenburg
@electrical
Jan 04 2016 19:58
hehe :p
it is yeah after some fixes ;-) lol
Jordan Halterman
@kuujo
Jan 04 2016 20:07
hmm
example works great
I’ll push it
@electrical I pushed the example. What I did was started three servers using the standalone-server example. Then, started one task-queue-consumer client and one task-queue-producer client and watched the messages roll in.
Richard Pijnenburg
@electrical
Jan 04 2016 20:09
Hmm okay
but this example effectively blocks pushing items on the queue… it’s possible the concurrency is what could be causing issues
I’ll set up another test
@electrical are you producing and consuming on a client or replica?
that could also be something to look at
Richard Pijnenburg
@electrical
Jan 04 2016 20:11
I use a client
Jordan Halterman
@kuujo
Jan 04 2016 20:12
hmmmmm
actually I might have just gotten something
I killed the producer and restarted it and the client got one message and then stopped
this might be an issue with the logic it uses to restart consuming
Richard Pijnenburg
@electrical
Jan 04 2016 20:13
That's what I got as well yeah
Jordan Halterman
@kuujo
Jan 04 2016 20:13
this is good
basically, once the client runs out of messages it waits for a message from the server, and once it gets a new message it should start acking messages, but that’s probably what’s not working
Richard Pijnenburg
@electrical
Jan 04 2016 20:14
Think if you keep the producer running and restart the consumer you get the same.
Jordan Halterman
@kuujo
Jan 04 2016 20:14
also it’s still really slow :-/ that can be fixed for definite but only to an extent
I can probably fix it based on this info… let’s see...
Jordan Halterman
@kuujo
Jan 04 2016 20:21
Oh wait actually of course the example is slow, it's using sync mode which means a client has to consume and acknowledge it before the next message is sent
So that's a bad test
Richard Pijnenburg
@electrical
Jan 04 2016 20:21
Haha okay
Jordan Halterman
@kuujo
Jan 04 2016 20:23
I see he prblem
I see the problem
Ugh
Finding bugs in resources is by far the best case scenario
Richard Pijnenburg
@electrical
Jan 04 2016 20:27
Why is that ?
Jordan Halterman
@kuujo
Jan 04 2016 20:29
Resources are incredibly simple
Compared to Copycat and Atomix
Richard Pijnenburg
@electrical
Jan 04 2016 20:30
hehe okay
Jordan Halterman
@kuujo
Jan 04 2016 20:35
definitely something in the state machine
I was… and forgot
Richard Pijnenburg
@electrical
Jan 04 2016 20:44
haha :p
i can send a PR to modify it if you want
Jordan Halterman
@kuujo
Jan 04 2016 20:51
sure!
Jordan Halterman
@kuujo
Jan 04 2016 20:58
Hmm probably gonna take some time to figure out this one. It's definitely an issue with the state machine, and the reason I say that is because it persists after you kill a producer or consumer or both. Sure it's something dumb, but there also seems to be a cluster state in which the client has trouble keeping its connection to a server. Every command fails, and every keep
-alive fails and then succeeds after it resets the leader. But it's hard to reproduce that state
Richard Pijnenburg
@electrical
Jan 04 2016 21:17
yeah indeed
its very weird stuff
i thought i was going nuts
because every time i restart the producer it got worse
uhg. just had a roadmap meeting of the stuff i do for work. most useless meeting ever
lasted about 5 minutes mainly me saying what im gonna work on further
Richard Pijnenburg
@electrical
Jan 04 2016 21:52
i can't see anything wrong in that code.. but then again, i barely understand any of it anyway :p
Jordan Halterman
@kuujo
Jan 04 2016 21:54
Haha yeah I don't either but should be something there I think. Just going through the normal debugging steps.
Richard Pijnenburg
@electrical
Jan 04 2016 21:54
hehe okay
almost done with the PR for the examples to use atomix-all
done
Richard Pijnenburg
@electrical
Jan 04 2016 21:59
I
I'll also add a new example. one with the lock.
Jordan Halterman
@kuujo
Jan 04 2016 22:01
looks like I got it
nice
it was something dumb
Richard Pijnenburg
@electrical
Jan 04 2016 22:02
ohw? :-)
Jordan Halterman
@kuujo
Jan 04 2016 22:02
at least the example is working for me when I kill both sides now anyways
it was just from an exception thrown when it tried to publish to a closed session
Richard Pijnenburg
@electrical
Jan 04 2016 22:03
ahh okay
and that exception caused havoc ?
Jordan Halterman
@kuujo
Jan 04 2016 22:05
yeah
Still acting weird with the async() mode though. It works for me, but it seems to be batching them too much. I see 100 messages and then stops for a while. For some reason sync() was going faster which makes no sense.
Richard Pijnenburg
@electrical
Jan 04 2016 22:06
lol. that's weird indeed
Jordan Halterman
@kuujo
Jan 04 2016 22:07
sync() adds on to the protocol
odd
Richard Pijnenburg
@electrical
Jan 04 2016 22:21
hmm, one more PR and ill #100 lol
Btw, did you want me to submit a PR for the netty upgrade as well?
Jordan Halterman
@kuujo
Jan 04 2016 22:42
Yeah sure! The more the merrier :-)
Richard Pijnenburg
@electrical
Jan 04 2016 22:42
lol
Jordan Halterman
@kuujo
Jan 04 2016 22:43
I'd tell you to upgrade other dependencies but Netty's the only dependency outside of test dependencies which don't really matter
Richard Pijnenburg
@electrical
Jan 04 2016 22:43
which other dependencies ?
Jordan Halterman
@kuujo
Jan 04 2016 22:44
There are none
Richard Pijnenburg
@electrical
Jan 04 2016 22:44
oh wait. had to read it again :p lol
Jordan Halterman
@kuujo
Jan 04 2016 22:44
Only other dependencies are junit and mockito and testing libraries like that
Richard Pijnenburg
@electrical
Jan 04 2016 22:44
i need more beer :-)
Jordan Halterman
@kuujo
Jan 04 2016 22:45
Clearly.
Haha
On a related note, I'm creating a version of the Atomix class that includes all dependencies. That's the last task on the todo list (it includes resource configurations) and I'll finish it up tonight.
Richard Pijnenburg
@electrical
Jan 04 2016 22:47
oh very nice :-)
Jordan Halterman
@kuujo
Jan 04 2016 22:47
DistributedMap<String, String> map = atomix.createMap("foo");
Richard Pijnenburg
@electrical
Jan 04 2016 22:49
maybe stupid question.. what does the <String, String> do ?
Jordan Halterman
@kuujo
Jan 04 2016 22:52
Those are generic arguments. In this case, they define the type of the key and value in the map respectively. A Set might have Set<String> which indicates it's a set of strings. It's just additional type information and compile time type checking. It's generally considered bad practice not to use them.
Really, the fact that you can not use them is just for backwards compatibility
Richard Pijnenburg
@electrical
Jan 04 2016 22:53
ahhh okay
automatic validation
Jordan Halterman
@kuujo
Jan 04 2016 22:53
Yeah. Sometimes they'll be used to specify the type of some return value, other times the type of some argument.
Richard Pijnenburg
@electrical
Jan 04 2016 22:54
ah i see. good to know :-)
Richard Pijnenburg
@electrical
Jan 04 2016 23:12
hmm. maybe stupid question.. did you think about option for tls/ssl transport ?
since netty supports it anyway it should be fairly aribitrary to support it
Richard Pijnenburg
@electrical
Jan 04 2016 23:24
so, just throwing some idea's out there in issues @kuujo.