These are chat archives for atomix/atomix

9th
Feb 2017
Jordan Halterman
@kuujo
Feb 09 2017 01:22
well… I’m making progress at least
haha
maybe
Jon Hall
@jhall11
Feb 09 2017 01:22
;p
Jordan Halterman
@kuujo
Feb 09 2017 01:26
just get this when I try to run them: bash: lein: command not found
Jon Hall
@jhall11
Feb 09 2017 01:28
isn’t that installation part of the Dockerfile?
Jordan Halterman
@kuujo
Feb 09 2017 01:28
yeah, which is weird

lein test atomix-jepsen.dvalue-test
INFO [2017-02-09 01:23:24,052] jepsen node n2 - jepsen.os.debian :n2 setting up debian
INFO [2017-02-09 01:23:24,052] jepsen node n5 - jepsen.os.debian :n5 setting up debian
INFO [2017-02-09 01:23:24,052] jepsen node n3 - jepsen.os.debian :n3 setting up debian
INFO [2017-02-09 01:23:24,052] jepsen node n4 - jepsen.os.debian :n4 setting up debian
INFO [2017-02-09 01:23:24,052] jepsen node n1 - jepsen.os.debian :n1 setting up debian
INFO [2017-02-09 01:23:24,815] jepsen node n1 - atomix-jepsen.core :n1 stopping atomix
INFO [2017-02-09 01:23:24,815] jepsen node n5 - atomix-jepsen.core :n5 stopping atomix
INFO [2017-02-09 01:23:24,815] jepsen node n3 - atomix-jepsen.core :n3 stopping atomix
INFO [2017-02-09 01:23:24,815] jepsen node n4 - atomix-jepsen.core :n4 stopping atomix
INFO [2017-02-09 01:23:24,815] jepsen node n2 - atomix-jepsen.core :n2 stopping atomix
INFO [2017-02-09 01:23:25,731] jepsen node n1 - atomix-jepsen.core :n1 fetching atomix-jepsen
INFO [2017-02-09 01:23:25,835] jepsen node n5 - atomix-jepsen.core :n5 fetching atomix-jepsen
INFO [2017-02-09 01:23:25,835] jepsen node n4 - atomix-jepsen.core :n4 fetching atomix-jepsen
INFO [2017-02-09 01:23:25,835] jepsen node n3 - atomix-jepsen.core :n3 fetching atomix-jepsen
INFO [2017-02-09 01:23:25,945] jepsen node n2 - atomix-jepsen.core :n2 fetching atomix-jepsen
INFO [2017-02-09 01:23:26,556] jepsen node n1 - atomix-jepsen.core :n1 building atomix-jepsen replica
INFO [2017-02-09 01:23:26,650] jepsen node n5 - atomix-jepsen.core :n5 building atomix-jepsen replica
INFO [2017-02-09 01:23:26,775] jepsen node n1 - atomix-jepsen.core :n1 stopping atomix
INFO [2017-02-09 01:23:26,775] jepsen node n2 - atomix-jepsen.core :n2 stopping atomix
INFO [2017-02-09 01:23:26,775] jepsen node n3 - atomix-jepsen.core :n3 stopping atomix
INFO [2017-02-09 01:23:26,775] jepsen node n4 - atomix-jepsen.core :n4 stopping atomix
INFO [2017-02-09 01:23:26,776] jepsen node n5 - atomix-jepsen.core :n5 stopping atomix
INFO [2017-02-09 01:23:26,851] jepsen node n4 - atomix-jepsen.core :n4 building atomix-jepsen replica
INFO [2017-02-09 01:23:26,856] jepsen node n2 - atomix-jepsen.core :n2 building atomix-jepsen replica
WARN [2017-02-09 01:23:26,970] jepsen node n4 - jepsen.control Encountered error with conn [:control :n4]; reopening
WARN [2017-02-09 01:23:26,970] jepsen node n2 - jepsen.control Encountered error with conn [:control :n2]; reopening
INFO [2017-02-09 01:23:27,159] jepsen node n3 - atomix-jepsen.core :n3 building atomix-jepsen replica
java.util.concurrent.ExecutionException: java.lang.RuntimeException: sudo -S -u root bash -c "cd /opt/atomix-jepsen; lein clean" returned non-zero exit status 127 on n1. STDOUT:


STDERR:
bash: lein: command not found
haha
makes no sense
rebuilding the jepsen image though
err.. trying your PR
Jon Hall
@jhall11
Feb 09 2017 01:30
is it trying to run lein on the onos docker images?
or lein isn’t in root’s path maybe
hopefully my pr works, at least in terms of building and getting to run the test. I know it will fail when creating the copycat cluster when running the test
I tested building all the containers from scratch
Jordan Halterman
@kuujo
Feb 09 2017 01:38
hopefully that will work
Jordan Halterman
@kuujo
Feb 09 2017 01:57
ugh same thing
Jon Hall
@jhall11
Feb 09 2017 01:58
try DEV=true lein test ...
Jordan Halterman
@kuujo
Feb 09 2017 01:58
ahh yes
Jon Hall
@jhall11
Feb 09 2017 01:59
Iirc it will use the node images from dockerhub if you don't
Jordan Halterman
@kuujo
Feb 09 2017 01:59
I think that’s what I was missing
yep
thanks!
totally forgot about that
java.lang.ClassCastException: clojure.lang.LazySeq cannot be cast to io.atomix.AtomixReplica
 at trinity.core$bootstrap_async_BANG_.invokeStatic (core.clj:82)
    trinity.core$bootstrap_async_BANG_.invoke (core.clj:82)
    trinity.core$bootstrap.invokeStatic (core.clj:99)
    trinity.core$bootstrap.invoke (core.clj:93)
    atomix_jepsen.dvalue.CasRegisterClient.setup_BANG_ (dvalue.clj:36)
    jepsen.core$run_case_BANG_$fn__5958.invoke (core.clj:282)
    clojure.lang.AFn.applyToHelper (AFn.java:154)
    clojure.lang.AFn.applyTo (AFn.java:144)
    clojure.core$apply.invokeStatic (core.clj:646)
    clojure.core$apply.invoke (core.clj:641)
    jepsen.util$fcatch$wrapper__3244.doInvoke (util.clj:27)
    clojure.lang.RestFn.invoke (RestFn.java:408)
    jepsen.util$real_pmap$launcher__3249$fn__3250.invoke (util.clj:47)
    clojure.core$binding_conveyor_fn$fn__4676.invoke (core.clj:1938)
    clojure.lang.AFn.call (AFn.java:18)
    java.util.concurrent.FutureTask.run (FutureTask.java:266)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
    java.lang.Thread.run (Thread.java:745)
it’s something :-)
Jon Hall
@jhall11
Feb 09 2017 02:01
yep
interesting, I don’t think I saw this, but I didn’t specify which tests to run, so it might be I just didn’t get to this
Jordan Halterman
@kuujo
Feb 09 2017 02:02
yeah
Jon Hall
@jhall11
Feb 09 2017 02:04
I don’t know why I thought it was a good idea to upgrade the os in my vm 15 minutes before I wanted to leave work
Jordan Halterman
@kuujo
Feb 09 2017 02:04
haha
no idea what that error means :-P
Jon Hall
@jhall11
Feb 09 2017 02:13
my guess is this call needs to first call replica on the node set
Jordan Halterman
@kuujo
Feb 09 2017 02:13
it’s all gibberish to me
haha
Jon Hall
@jhall11
Feb 09 2017 02:14
it’s getting slightly more readable after going through some tutorials
Jordan Halterman
@kuujo
Feb 09 2017 02:15
oh nice
Jon Hall
@jhall11
Feb 09 2017 02:15
I figured a good place to learn is from the author of jepsen
Jordan Halterman
@kuujo
Feb 09 2017 02:15
indeed
this seems pretty fun
Jonathan Halterman
@jhalterman
Feb 09 2017 05:40
oh, missed the atomix-jepsen discussions here
i'll jump in friday afternoon and see how she's running
slammed until then
@jhall11 If you're interested to learn how the jepsen tests themselves work, Kyle has a nice series of writeups that cover how to write a jepsen test suite (which didn't used to exist) :) click through at the bottom of each page to browse the series.
Jon Hall
@jhall11
Feb 09 2017 05:43
hmm ok, I’ll check that out
Jonathan Halterman
@jhalterman
Feb 09 2017 05:43
that might make the atomix jepsen tests easier to follow
Shalakha Sidmul
@shalakhansidmul
Feb 09 2017 07:51
Hello
I am a newbee here. I have read the documentation for Atomix from http://atomix.io/atomix/docs.
I want to know, where has atomix been used until now and also if there could be a diagrammatic representation of its architecture as an example for how replication is managed, network partition is handled, etc. it would be very helpful.
Jordan Halterman
@kuujo
Feb 09 2017 08:01

Copycat and Atomix are used in various projects. I don't really keep track of them all :-P but ONOS is the one we're talking about above.

I'm not sure you've read all the documentation :-) Atomix is built on an implementation of the Raft consensus protocol, Copycat. See the Raft website for an interactive demonstration of how Raft works: https://raft.github.io

It's a CP system. The cluster elects a leader and replicates to followers. Atomix resources are state machines Copycat implements all of the Raft protocol and extends it to support concurrent operations, larger clusters, and fault tolerant cluster-client notifications for more efficient locking and what not.

There is extensive documentation of the algorithms inside Copycat on the website (under the architecture section):
http://atomix.io/copycat/docs/

How a network partition is handled depends on where it is. If the leader's on the majority side of a partition, nothing special happens. If it's not, a new leader on the majority side will be elected and the old leader will detect the partition and step down.
Most importantly, Atomix is designed for consistency over performance. It's specifically designed for efficient and safe distributed locking, leader election, and management of mission critical state. It can be scaled by just sharding the cluster, but that won't become a built in feature until the next version.
Shalakha Sidmul
@shalakhansidmul
Feb 09 2017 08:54
Thank you for the reply :)
I have read the documentation for atomix, really :) I have a WebApplication which is used for migration and integration of data. I am looking into ways of making it highly available and also improving its performance in a cluster environment. I was comparing HazelCast and Atomix for distributed data management. I needed an example for a clear understanding, maybe something like a component diagram of a real world application which has used atomix in its distributed system.
Roman Pearah
@neverfox
Feb 09 2017 14:59
@shalakhansidmul "highly available and also improving its performance" In CAP theorem terms, it sounds like you might be looking for something that emphasizes the A, so Atomix might not be appropriate. I'm not sure enough about what you mean by distributed data management though to recommend anything specific.
Jordan Halterman
@kuujo
Feb 09 2017 16:51

Right. There will always be limitations of Atomix/Raft when compared to Hazelcast simply because they're two different consistency models. In general, Hazelcast allows for sections of the data to be updated concurrently. Additionally, Hazelcast is purely in memory while Atomix writes data to disk and aims to guarantee the integrity of the data under any failure. It's technically feasible to do both of these things in Atomix. ONOS uses a sharded Atomix cluster with 2PC for cross-shard operations. Atomix also supports an in-memory transaction log. Sharding is great but is not necessarily easy to do right. I would recommend the in-memory transaction log only for testing. It's not a very efficient way to build a totally in-memory database.

So, as I said Atomix is really designed for managing clusters and concurrency within them, not storing general state. Hazelcast has locks that are decidedly unsafe. Atomix solves that problem. It does group membership and leader election in a manner that's very safe and effective. Trying to scale Atomix by sharding the cluster only really makes sense if you rely on that strong consistency model but still have a use case that requires the added scalability as ONOS did. If you just need scalability and don't need to use it for coordination in a cluster, there are certainly better avenues to take.

Jon Hall
@jhall11
Feb 09 2017 18:16
ok, so I’ve got a debugger attached to the leader of a 3 node ONOS cluster that has hit this bug
Jon Hall
@jhall11
Feb 09 2017 18:22
It is going into the isOpen branch on line 727
Jordan Halterman
@kuujo
Feb 09 2017 18:29
I wonder if it gets to context.getStateMachine().apply(index)
Jon Hall
@jhall11
Feb 09 2017 18:29
Breakpoint reached at io.atomix.copycat.server.state.LeaderState.lambda$register$16(LeaderState.java:727)
Breakpoint reached at io.atomix.copycat.server.state.LeaderState.lambda$register$16(LeaderState.java:729)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.apply(ServerStateMachine.java:292)
Breakpoint reached at io.atomix.copycat.server.state.LeaderState.lambda$register$16(LeaderState.java:727)
Breakpoint reached at io.atomix.copycat.server.state.LeaderState.lambda$register$16(LeaderState.java:729)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.apply(ServerStateMachine.java:292)
...
i have breakpoints on 299, 301 and 304 of ServerStateMachine
I also have a breakpoint on 730 in LeaderState.java, but that doesn’t hit
Jordan Halterman
@kuujo
Feb 09 2017 18:35
any of 299, 301 and 304 of ServerStateMachine do?
299 must not right?
that would LOGGER.debug("{} - Applying {}", state.getCluster().member().address(), entry); which is what’s not being seen in the log
is ServerStateMachine.java:292 blocking or something?
hmm
Jon Hall
@jhall11
Feb 09 2017 18:37
I think so
Jordan Halterman
@kuujo
Feb 09 2017 18:40
hmmm
Jon Hall
@jhall11
Feb 09 2017 19:02
It looks like apply is going into applyAll
Jordan Halterman
@kuujo
Feb 09 2017 19:03
wonder what is causing that not to exit
an insanely high lastIndex?
it doesn’t seem like the server is blocked though
so seems odd
maybe an exception in there?
Jon Hall
@jhall11
Feb 09 2017 19:06
fwiw:
Breakpoint reached at io.atomix.copycat.server.state.LeaderState.lambda$register$16(LeaderState.java:727)
Breakpoint reached at io.atomix.copycat.server.state.LeaderState.lambda$register$16(LeaderState.java:729)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.apply(ServerStateMachine.java:292)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:270)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:270)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:272)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:274)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.lambda$applyAll$1(ServerStateMachine.java:272)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:270)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:272)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:274)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.lambda$applyAll$1(ServerStateMachine.java:272)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:270)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:272)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:274)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.lambda$applyAll$1(ServerStateMachine.java:272)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:270)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:272)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:274)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.lambda$applyAll$1(ServerStateMachine.java:272)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:270)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:272)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:274)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.lambda$applyAll$1(ServerStateMachine.java:272)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:270)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:272)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:274)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.lambda$applyAll$1(ServerStateMachine.java:272)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:263)
Breakpoint reached at io.atomix.copycat.server.state.LeaderState.lambda$register$16(LeaderState.java:727)
Breakpoint reached at io.atomix.copycat.server.state.ServerStateMachine.applyAll(ServerStateMachine.java:268)
Breakpoint reached at io.atomix.copycat.server.state.LeaderState.lambda$register$16(LeaderState.java:729)
Jon Hall
@jhall11
Feb 09 2017 19:53
So I couldn’t get intellij to breakpoint on this lambda but i was able to log on the apply and print entry. I’m only seeing keepAlive entries printed, even after apply is called in register()
Jordan Halterman
@kuujo
Feb 09 2017 21:03
Hmm
Jon Hall
@jhall11
Feb 09 2017 22:16
when I try to print log.get(i) on the line where entry is set, I get: Unable to evaluate the expression "log.get(i)" : Method threw 'java.lang.IndexOutOfBoundsException' exception.
Jordan Halterman
@kuujo
Feb 09 2017 22:26
Awesome! So that's the problem then
It's going to be a hard one to track down though :-(
Hmm
Jordan Halterman
@kuujo
Feb 09 2017 22:33
I've been out but just got back and I'll dig in the code in a few. But typically these IndexOutOfBoundsExceptions have been difficult to reproduce in unit tests, which is what we need. It's difficult to deduce the state that caused them. But it may be possible to back port the Copycat 2.0 log which is very likely more stable and fixes this problem since it replaces indexes with scanning. That's probably not practical but I'll look at it and changing to that log may just have to be expedited in the interest of more stability.
Jon Hall
@jhall11
Feb 09 2017 22:35
hmm ok, I’ll keep playing around and see if I can find anything
Jordan Halterman
@kuujo
Feb 09 2017 22:53
Obviously back porting the log is a last resort but I think it should be fairly easy to do. But actually, I think I may know a way I can reproduce this.
@jhall11 can you see where the exception is thrown? There are a few possibilities
Jon Hall
@jhall11
Feb 09 2017 22:56
i can try, The debugger logging isn’t as robust as i’d like. Do you have specific places you want me to look?
Jordan Halterman
@kuujo
Feb 09 2017 22:57
Yeah let me link them...
what you should probably do is set a breakpoint here and see what’s calling it. That’s most likely where it’s thrown
lemme see if there’s any other possibility
I think that should be where it’s thrown, and the question is just where the Assert.index call is
Jon Hall
@jhall11
Feb 09 2017 23:46
So i’m trying to get it without suspending the thread, but here is what I’ve got at so far:
Breakpoint reached at io.atomix.catalyst.util.Assert.index(Assert.java:33)
expr: false msg: invalid log index: 1
Exception 'java.lang.IndexOutOfBoundsException' occurred in thread 'copycat-server-/172.17.0.2:9876-partition-2' at io.atomix.catalyst.util.Assert.index(Assert.java:45)
hmmm
so I’m going to assume this is what’s happening in the logs from the PR, and maybe I can deduce something from it