These are chat archives for atomix/atomix

5th
Feb 2017
Jon Hall
@jhall11
Feb 05 2017 21:16
I’ve had to restart mine several time in the last week or so, not sure if they just released a buggy version or what
Jon Hall
@jhall11
Feb 05 2017 21:39
So I’m taking another look at atomic jepsen, trying to get it to work with the latest snapshots. I’m wondering what your setup was for running the tests? I’m running the outer container on Mac OS, and things seem to be different. Not sure how much of that is that the base container images are outdated, or some invalid assumptions
Jordan Halterman
@kuujo
Feb 05 2017 21:40
I was just thinking about it again last night. Would love to work to try to get them going well again… let me refresh my memory...
I did essentially the same thing - running the container on OS X
just haven’t done it in a while so maybe I’ll give it a shot
…at the risk of Docker for Mac destroying my CPU again...
Jon Hall
@jhall11
Feb 05 2017 21:42
ok, I’ve been pushing my changes my forks, mostly minor changes to get everything working with the lastest versions of atomic and jepsen
Roman Pearah
@neverfox
Feb 05 2017 21:42
Docker for Mac destroyed your CPU??
Jon Hall
@jhall11
Feb 05 2017 21:42
oh no, that sounds pretty terrible
Jordan Halterman
@kuujo
Feb 05 2017 21:44
yeah
Roman Pearah
@neverfox
Feb 05 2017 21:44
yikes
how did it do that?
Jordan Halterman
@kuujo
Feb 05 2017 21:44
it’s terrible I’ve just been too lazy to revert to boot2docker
it’s a known bug
Roman Pearah
@neverfox
Feb 05 2017 21:44
omg
Jordan Halterman
@kuujo
Feb 05 2017 21:44
docker/for-mac#82
Roman Pearah
@neverfox
Feb 05 2017 21:44
I made the switch a while ago but didn't realize there was this risk
happens after I close my laptop and come back to it… my laptop loses power when plugged in until I kill that process
Roman Pearah
@neverfox
Feb 05 2017 21:45
damn
Jordan Halterman
@kuujo
Feb 05 2017 21:45
I just updated it again so maybe it’s finally fixed
Roman Pearah
@neverfox
Feb 05 2017 21:45
I haven't seen anything like that.
luckily
Jordan Halterman
@kuujo
Feb 05 2017 21:47
probably user error :-P
haha
hopefully they fixed it though
Roman Pearah
@neverfox
Feb 05 2017 21:47
who knows, could be a special combination of circumstances
perfect storm
I'm always working my CPU hard for all kinds of reasons tho
surprised I haven't killed it
Jordan Halterman
@kuujo
Feb 05 2017 21:55
lol
Jordan Halterman
@kuujo
Feb 05 2017 22:12
hmm… just have to figure out why it’s trying to pull the Copycat/Atomix snapshots
I updated the Trinity dependencies
but I’m no Clojure expert
Jon Hall
@jhall11
Feb 05 2017 22:17
I think thats baked into the docker images? I can’t remember what I did friday night to fix that
I’m working on building the container images for the nodes
Jordan Halterman
@kuujo
Feb 05 2017 22:18
hmm
Jon Hall
@jhall11
Feb 05 2017 22:18
Oh, and it looks like the tests are actually running :)
Jordan Halterman
@kuujo
Feb 05 2017 22:18
oh awesome
Jon Hall
@jhall11
Feb 05 2017 22:19
I’m not sure if the clojure changes i made in the test are correct, I didn’t really spend any time looking at what it was trying to do when I updated to the newer api
Jordan Halterman
@kuujo
Feb 05 2017 22:22
so, when they’re done you should get a /store folder that contains the test results. They’ll have a history.txt that shows all the commands Jepsen submitted and partitions/crashes it caused, and a results.edn that shows whether the history was valid (linearizable). Also there should be some latency graphs and things like that in there, unless I added those to my tests locally and never committed them...
Jon Hall
@jhall11
Feb 05 2017 22:23
do you remember how long the tests usually take?
Jordan Halterman
@kuujo
Feb 05 2017 22:25
Probably like 5 minutes for one test, but I usually run them individually since some of the tests are really incomplete. Really the dvalue ones are the only ones that work IIRC: lein test :only atomix-jepsen.dvalue-test/bridge You can test things like bridge, random-halves, etc (https://github.com/atomix/atomix-jepsen/blob/master/test/atomix_jepsen/dvalue_test.clj#L11-L53)
actually I think the only ones that have really been done are those baseline tests: https://github.com/atomix/atomix-jepsen/blob/master/test/atomix_jepsen/dvalue_test.clj#L11-L24
and what really needs to be done and working are the configuration change tests especially, in addition to testing things like locks that use the session events framework
also need to add a way to force nodes to compact their logs and check results after compacting logs and crashing/restarting nodes
lots of things to be tested with Jepsen that haven’t that would probably go a long way towards stability
the dvalue tests for basic linearizability under network partitions are certainly the most important, but the edge cases are with configuration changes and log compaction and events and things like that
configuration changes and log compaction shouldn’t be difficult to add into tests, but events may be since they have a different consistency model
and thus can result in different valid histories than are supported by the linearizability checker
Jon Hall
@jhall11
Feb 05 2017 22:32
https://github.com/atomix/atomix-jepsen/compare/master...jhall11:master#diff-5268f8b6081de1fb845dcb3a2996b63eR38
I don’t think connect is the right call here. Do we want to call bootstrap once with the nodeset then connect with the other nodes?
Jordan Halterman
@kuujo
Feb 05 2017 22:35
That's a client right? which only has connect. But the Trinity code may need to be updated. We may have added bootstrap/join to servers/replicas after that was created
Can either use bootstrap with the full cluster membership or bootstrap one node and join all the others
I'll be out for the Super Bowl for a few hours :-)
Jon Hall
@jhall11
Feb 05 2017 22:37
yeah, I’m just killing time before it starts