These are chat archives for atomix/atomix

3rd
May 2018
Johno Crawford
@johnou
May 03 2018 00:37
i think we can merge atomix/atomix#503 in now?
Jordan Halterman
@kuujo
May 03 2018 00:42
I will fix the election issue now
Johno Crawford
@johnou
May 03 2018 00:43
committed the math fix
election issue with the client?
Jordan Halterman
@kuujo
May 03 2018 00:45
yeah
Jordan Halterman
@kuujo
May 03 2018 00:57
@johnou how do you push to someone else’s pull request?
I can’t seem to figure that out
the PR was from a fork
Johno Crawford
@johnou
May 03 2018 00:57
yes
he gave permission for us to modify it
it's an option when you open a pr
Jordan Halterman
@kuujo
May 03 2018 00:57
ahh gotcha
Johno Crawford
@johnou
May 03 2018 00:57
allow maintainers to edit
Jordan Halterman
@kuujo
May 03 2018 00:58
oh interesting
I never noticed that
sweet I can modify the other one then
Johno Crawford
@johnou
May 03 2018 00:58
i just modified it directly from github, but i'm pretty sure (certain) you can also check out their fork and branch, then modify that + push
this is a red flag for me, would have never considered using a cf in thread local
but I guess the atomix thread model might make it safeish
as long as it's the correct thread in all code paths
Johno Crawford
@johnou
May 03 2018 01:04
you killed my commits on the local-member-names branch, was that intentional?
Jordan Halterman
@kuujo
May 03 2018 01:05
I did?
Johno Crawford
@johnou
May 03 2018 01:05
c055307046faecec551c42ecf31825746f320178
yeah let me grab it
if you didn't like the changes that's ok, just wanted to check
Jordan Halterman
@kuujo
May 03 2018 01:07
bah damnit sorry not intentional
haha
I thought that was something else
I knew that would happen
Johno Crawford
@johnou
May 03 2018 01:08
all good
atomix/atomix#535
Jordan Halterman
@kuujo
May 03 2018 01:11
I think I want to remove the generic type from AtomixCluster actually
it can just return Void from start()
Johno Crawford
@johnou
May 03 2018 01:11
yeah I think that would be nicer
Johno Crawford
@johnou
May 03 2018 01:18
omg how is it 3am already
good night
Jordan Halterman
@kuujo
May 03 2018 01:18
adios
Johno Crawford
@johnou
May 03 2018 07:21
Unknown partition group type: multi-primary (through reference chain: io.atomix.core.AtomixConfig["partition-groups"]
io.atomix.core.config.jackson.JacksonConfigProviderTest.testYaml
Jordan Halterman
@kuujo
May 03 2018 07:21
changed to primary-backup
Johno Crawford
@johnou
May 03 2018 07:21
Ah you renamed it, right?
Huibai Huang
@baymaxhuang
May 03 2018 08:13
@kuujo I find the homepage of Atomix has been updated. Although it has not been fully finished, I looked through the contents of the homepage and did not find the documents of raft state machine and the interaction between raft clients and raft severs, which could be found in the original documents about CopyCat. I think they are very helpful for us to understand the implementation details about raft in Atomix. I hope in the new homepage these documents about the implementation details of raft protocol could be reserved.
Jordan Halterman
@kuujo
May 03 2018 08:13
They’re out of date
The website is for Atomix 2.1, and the Raft implementation in Atomix was extensively refactored many months ago. The new documentation will have more up-to-date information
Huibai Huang
@baymaxhuang
May 03 2018 08:20
Yeah, could you leave a link to the original page? I think it could be helpful for many old users.
The format of the old website is a lot different than the new website. It will be non-trivial to make it available on the website. I have to spend my time on the documentation for supported versions before worrying about the unsupported versions.
Huibai Huang
@baymaxhuang
May 03 2018 08:34
I hope the new website could be finished soon. Thanks.
Jordan Halterman
@kuujo
May 03 2018 08:34
It’s a lot of documentation
Some of the sections are done. The others will be done over the next few weeks
Jordan Halterman
@kuujo
May 03 2018 08:54
@baymaxhuang what specific documentation do you desire? The Raft implementation is a broad topic, and there’s now a lot more on top of it.
finally think I got this PartitionGroupMembershipService working
Johno Crawford
@johnou
May 03 2018 08:59
woo!
nice work
iteration 100?
Jordan Halterman
@kuujo
May 03 2018 08:59
indeed
a better programmer would have been on iteration 50 :-P
At first I read that as irritation 100, and I also thought that was correct. :100:
Huibai Huang
@baymaxhuang
May 03 2018 09:01
For example, in atomix, how to realize linearizable write and sequential read? How to save and compact the raft log?
Jordan Halterman
@kuujo
May 03 2018 09:02
The Raft implementation is just an implementation detail now. It will eventually be documented, but it’s probably last on the list. The system is far simpler and more efficient using the high level APIs.
it is still usable by itself though
ditto the primary-backup protocol
Johno Crawford
@johnou
May 03 2018 09:04
only thing really bothering me is the timeouts with phi
Jordan Halterman
@kuujo
May 03 2018 09:04
damnit I think I’m getting sick
what do you mean?
Johno Crawford
@johnou
May 03 2018 09:04
would you be against an option to disable that and only have the max timeout?
Jordan Halterman
@kuujo
May 03 2018 09:04
you mean messaging timeouts?
Johno Crawford
@johnou
May 03 2018 09:04
heh you and me both, think I picked up a cold from my kid
yeah
Jordan Halterman
@kuujo
May 03 2018 09:05
They can’t be disabled globally. The reason they exist is because the Raft implementation relies on them. It can’t wait 5 or 10 seconds for timeouts.
the failure threshold is very low right now
Johno Crawford
@johnou
May 03 2018 09:06
seemed to throw quite a few false positives
especially before that hack to ignore responses below 100ms
Jordan Halterman
@kuujo
May 03 2018 09:13
It’s a tunable algorithm. Just tune the PHI_FAILURE_THRESHOLD until it resolves your issues, then commit the change.
I think I’m actually going to change the data structure a little bit though
Jordan Halterman
@kuujo
May 03 2018 09:20
a phi value of 5 is really aggressive
Huibai Huang
@baymaxhuang
May 03 2018 09:22
I find in the newest atomix, it seems that the phi failure detector has not been used for leader election. But the 2.0.0-raft-beta1 version, it has ever used phi failure detector for leader election. I am curious what has happened about it?
https://github.com/atomix/atomix/blob/c6ec9c793d2fe005cc64f038f509f496589fea4c/protocols/raft/src/main/java/io/atomix/protocols/raft/roles/FollowerRole.java#L98
Johno Crawford
@johnou
May 03 2018 09:23
iirc it was moved to another component
Jordan Halterman
@kuujo
May 03 2018 09:24
the phi failure detector is in ClusterMembershipService, and the ClusterMembershipService is still used for Raft leader election
@johnou atomix/atomix@f3c694b
Johno Crawford
@johnou
May 03 2018 09:29
that looks clever, need to give it a run though some tests
Jordan Halterman
@kuujo
May 03 2018 09:30
actually need to make these things configurable too
basically, it just tracks the max response time for every minute in the last 10 minutes and bases the failure detection on that history
also doesn’t do dynamic timeouts until the window is completely populated
could also change the algorithm used to calculate the timeout - we used to use percentiles and/or max * some factor
Johno Crawford
@johnou
May 03 2018 09:32
wouldn't updating phi once a minute greatly increase the time of a suspect
or am I missing something
Jordan Halterman
@kuujo
May 03 2018 09:33
the window is updated once per minute
with the largest response time from the last minute
the phi value is based on the window
so the response timeout will be based on the largest response times from the last 10 minutes, thus increasing it and making false positives less likely
as opposed to just basing it on the last 100 responses, which is a crazy tiny sample for messaging
a small fraction of a second can completely skew timeous
Johno Crawford
@johnou
May 03 2018 09:35
i'm still not super convinced, need to give it some thought
didn't sleep much :D
Jordan Halterman
@kuujo
May 03 2018 09:36
This would be the same with a simpler algorithm: say the timeout is calculated as the largest response time from the last 10 minutes * 2 as opposed to the largest response time from the last 100 responses
boolean isTimedOut(long elapsedTime) {
  return elapsedTime > samples.getMax() * 2;
}
Johno Crawford
@johnou
May 03 2018 09:39
i just keep thinking of our app where we have multiple services, one handling non blocking io messaging, in some cases it's able to use loopback and response times are sub ms
and another which uses blocking io ops which can range from 10ms to 400ms or even a little higher depending on what it is
eg. search over two fulltext indexed fields is quite expensive
match against or w/e it is called
Jordan Halterman
@kuujo
May 03 2018 09:41
The phi algorithm is particularly designed to take into account large fluctuations in times. The larger the fluctuations, the larger the buffer it will give. A response time that’s consistently 1ms or consistently 1000ms will get a smaller buffer because it doesn’t fluctuate.
Johno Crawford
@johnou
May 03 2018 09:43
yeah but then you might get trapped with scheduled tasks
well maybe that's too specific to my use case
where we might run an import from redshift to the gamedb for dwh of targeted offers
Jordan Halterman
@kuujo
May 03 2018 09:44
That would be a reason to provide a timeout
Johno Crawford
@johnou
May 03 2018 09:44
yeah which was a recent addition right
so if a timeout is provided, phi ignored / not updated?
if yes, I can sleep at night
;)
Jordan Halterman
@kuujo
May 03 2018 09:45
it’s ignored, but the response time is still recorded IIRC
Johno Crawford
@johnou
May 03 2018 09:45
hum
Jordan Halterman
@kuujo
May 03 2018 09:45
for use in the dynamic timeout calculation
I guess an argument could be made for not recording it
Johno Crawford
@johnou
May 03 2018 09:47
yeah seems a bit dangerous for modifying the dynamic timeout calculation for a heavy job with a timeout of say, 1 minute
(we have a lot of data that needs to be migrated / verified across from staging to prod when releasing new gfx in the game)
Jordan Halterman
@kuujo
May 03 2018 09:49
done
Johno Crawford
@johnou
May 03 2018 09:52
lgtm
Jordan Halterman
@kuujo
May 03 2018 09:58
one more little change… I’ll submit a PR
Johno Crawford
@johnou
May 03 2018 09:58
hah I already created it on your branch
but it won't let me review it now
Jordan Halterman
@kuujo
May 03 2018 09:58
haha
lame
just added a minimum sample size so the timeout isn’t also skewed by very infrequent messages
Johno Crawford
@johnou
May 03 2018 10:03
why not use CAS inside addReplyTime instead of sync?
ah wait
double checked locking on lastUpdate
Jordan Halterman
@kuujo
May 03 2018 10:04
alright that’s all my PRs for today
gotta go to sleep and wake up probably sick
ugh
Johno Crawford
@johnou
May 03 2018 10:04
tickle in the throat?
Jordan Halterman
@kuujo
May 03 2018 10:04
yeah
Johno Crawford
@johnou
May 03 2018 10:05
:(
catch you later, good night!
Jordan Halterman
@kuujo
May 03 2018 10:05
It’s my son. That guy is sick like every other week
I usually avoid them but it had to catch up to me some time
Johno Crawford
@johnou
May 03 2018 10:06
ahahah I know the feeling, worse part is when the wife gets it from the kid, then passes it to you, then you pass it back to the kid or the kid gets it again from kindergarten
Jordan Halterman
@kuujo
May 03 2018 10:07
haha
Jordan Halterman
@kuujo
May 03 2018 10:23

@baymaxhuang FYI even though there probably won’t be Raft documentation for a while, I am currently working on the documentation for custom primitives which are the new version of what used to be done in Copycat and Atomix 1. Primitives in Atomix 2 are just way cooler 😎 Copycat state machines are just replicated on a single Raft cluster. Atomix primitives/state machines can be partitioned (by Atomix) across multiple persistent Raft clusters/partitions or in-memory primary-backup clusters/partitions. This is what I mean when I say Atomix 2 is more efficient. For the same amount of work you get a much more flexible and scalable result that can run on multiple different replication protocols. This is the reason there’s no urgency in finishing the Raft docs.

That said, the Raft documentation from Copycat is probably the only documentation I’ll keep and update since it was so extensive. I don’t want to rewrite that crap again. The biggest changes in the Atomix Raft implementation are:
• All state machines are snapshotted. There’s no more incremental compaction algorithm. Incremental compaction was efficient but complicated for state machines to implement, and snapshots can be generalized to both the Raft and primary-backup protocols now used in Atomix.
• Atomix also multiplexes multiple logical Raft sessions on a single Raft client. This allows primitives to be isolated from one another and helps improve performance a lot, especially in multi-threaded environments.
• The log and serialization were also replaced for efficiency.

Aside from that, the vast majority of work on the Raft implementation has gone into stability. There have been massive improvements in stability over the Copycat implementation just from our work in lab/field trials.

A can never resist a rant
Now off to bed
BTW the 2.0 branch is the Raft implementation that we currently use in production. We’re in the process of moving to 2.1 now
It’s on like 2.0.20 now or something
Jordan Halterman
@kuujo
May 03 2018 10:40
BTW you can always remove the generic from AtomixCluster ;-)
Johno Crawford
@johnou
May 03 2018 10:49
i'll do that in that pending pr
goto bed :P
Johno Crawford
@johnou
May 03 2018 12:46
i'm not so sure i like the void return type for start
it feels somewhat clunky
maybe we can use a common super class instead and have both classes implement Managed with the correct type
Jon Hall
@jhall11
May 03 2018 18:25
Might be a bit late, but in regards to pulling someone’s pr: https://help.github.com/articles/checking-out-pull-requests-locally/ I think if the submitter tickes the box to allow maintainers to modify it, maintainers fo the upstream now have push rights to the forked repo’s branch
imperatorx
@imperatorx
May 03 2018 19:34
I see that the custom primitive types get instantiated now by calling class.newInstance(). Is there a way to add non-serializable config information to the primitive type (e.g. a spring applicationcontext) that could be passed in the constructor before? Now we have to pass an instance of the custom primitive type to the atomix builder but all that happens is that the class of the instance gets registered, and an other instance created at io.atomix.primitive.PrimitiveTypeRegistry.<init>(PrimitiveTypeRegistry.java:44)
Johno Crawford
@johnou
May 03 2018 21:19
@imperatorx use an application context holder?
Jordan Halterman
@kuujo
May 03 2018 23:06
working on the tests now
making good improvements on the primary-backup protocol
Jordan Halterman
@kuujo
May 03 2018 23:24
@imperatorx there’s no way to add that type of parameter to primitive types because they have to be loadable on remote nodes. Primitive types are actually just class names that are loaded via ServiceLoader. It’s just a metadata class, and that’s necessary to ensure they can be loaded by standalone nodes (agents) too. The place to setup resources would be either in the primitive builders or services. Any objects that can’t be serialized in the primitive type would generally be a bad idea, because the primitive type is used to create objects on other nodes. They should have empty constructors and use only serializable configurations to construct primitive objects.
PrimitiveType is basically just a named factory object, and enforcing that model ensures the types can work inside the agent as well.