These are chat archives for atomix/atomix

25th
Mar 2018
Johno Crawford
@johnou
Mar 25 2018 08:46
@kuujo to really get max performance we'd need to change the way encoding works to avoid corrupt data
Jordan Halterman
@kuujo
Mar 25 2018 08:46
how so?
Jordan Halterman
@kuujo
Mar 25 2018 08:55
awesome work!
Johno Crawford
@johnou
Mar 25 2018 09:39
well what's really killing the throughput is the allocation and gc of byte arrays
but if we start re-using the underlying bytearrays we need to make sure they are either written to the fs or network before recycling them
and that cannot really be done with an api like this
public <T> byte[] encode(T object) {
        return namespace.serialize(object);
      }
Johno Crawford
@johnou
Mar 25 2018 09:45
in the meantime I guess we can just recreate the ByteArrayOutputStream and think about a larger refactor later if it comes to it
Jordan Halterman
@kuujo
Mar 25 2018 09:52
Agree
Jordan Halterman
@kuujo
Mar 25 2018 10:12
I removed object pooling and reference counting from Atomix 1.x because it can cause challenging bugs. It was removed to focus on stability. But I’ve been thinking about adding it back now. It’s actually needed many more places than in the messaging/serialization code.
I’d prefer to use ByteBuffers or wrap them in Buffer (what used to be done for reference counting) not reuse byte[]s, but switching to ByteBuffer or Buffer would break a lot of APIs because of the use of genetics.
generics*
Jordan Halterman
@kuujo
Mar 25 2018 10:18
I think we could actually get away with changing those APIs though.
Johno Crawford
@johnou
Mar 25 2018 10:19
by reference counting are you referring to the Netty ByteBuf?
Jordan Halterman
@kuujo
Mar 25 2018 10:32
No. Atomix 1 did reference counting on all kinds of things - ByteBuf, state machine operations, log entries, requests/responses
Also used them for incremental log compaction
But just managing the raw bytes with reference counting would be the biggest win now
Jordan Halterman
@kuujo
Mar 25 2018 10:38
I have to take the next couple days off to prepare for a conference talk but then I’ll be back at it. Maybe I’ll take a shot at this
I actually got a little carried away implementing partition member groups tonight. I’ll finish that up too
Just needs some tests
Jordan Halterman
@kuujo
Mar 25 2018 10:44

Basically, you can assign a zone/rack/host to each Node and then a MemberGroupStrategy to a partition group:

PrimaryBackupPartitionGroup.builder(“foo”)
  .withMemberGroupStrategy(MemberGroupStrategy.RACK_AWARE)
  ...

...or a custom MemberGroupProvider. Then, the primary/backups for any primitive created in the group will be stored on separate group members for each partition. I had to write a new state machine to handle group-aware primary election. I’ll use this to work on the primary-backup protocol more too

I also realized there can actually be no limitations on bootstrapping the Atomix cluster. There’s no reason it can’t be started with CORE nodes or just DATA nodes. Basically, forming a cluster with DATA nodes would basically be Hazelcast. Just need to make sure it can work without any Raft partition groups.
All the state machines work on Raft or the primary-backup protocol, so there should be nothing preventing the cluster from running with all consensus, all primary-backup, or a mixture of both.
Some of them are just unwise on the primary-backup protocol :-P
Johno Crawford
@johnou
Mar 25 2018 10:54
oh i was wrong
it is safe
/**
 * Creates a newly allocated byte array. Its size is the current
 * size of this output stream and the valid contents of the buffer
 * have been copied into it.
 *
 * @return  the current contents of this output stream, as a byte array.
 * @see     java.io.ByteArrayOutputStream#size()
 */
public synchronized byte toByteArray()[] {
    return Arrays.copyOf(buf, count);
}
wow ok
Johno Crawford
@johnou
Mar 25 2018 14:05
So for a simple data grid with messaging one could just use data nodes?
Johno Crawford
@johnou
Mar 25 2018 17:26
@jhalterman I think you might be interested in this atomix/atomix#458
Jordan Halterman
@kuujo
Mar 25 2018 19:52
Yep
In response to the data grid question
Holy crap that is awesome!
BTW @jhalterman is my brother and I’m @kuujo :-P
I’ll take a look at it after I knock out some slides and demos
Johno Crawford
@johnou
Mar 25 2018 20:13
@kuujo I know :P he showed interest in the performance comparison though
Jordan Halterman
@kuujo
Mar 25 2018 20:14
:+1:
Johno Crawford
@johnou
Mar 25 2018 20:16
so if the cluster just consisted of data nodes
what features would be lacking without the core nodes
locks and anything else that really requires a leader?
Jordan Halterman
@kuujo
Mar 25 2018 20:18
no features would be lacking, they just wouldn’t be safe
all primitives work on either protocol because there’s an abstraction over them
locks and leader elections wouldn’t be safe
and because partitions essentially use leader elections, they’d risk split brain
but the same goes for Hazelcast
which is why I say then you’re just basically getting Hazelcast with different primitives
I think it’s a pretty interesting model though - the ability to switch between a fast, in-memory protocol and consensus is invaluable. People ask for this all the time in ONOS.
Johno Crawford
@johnou
Mar 25 2018 20:24
yeah allows you to swap between those few modes which I cannot remember off the top of my head
that's cool
Jordan Halterman
@kuujo
Mar 25 2018 20:27

You could e.g. simplify deployment in development environments using just DATA nodes and change the configuration to include CORE nodes in production. This is actually what we might do in ONOS at least initially.

Mostly we need it to use a mixture of protocols: an external CORE cluster for consensus, and embedded DATA nodes for local access of data that doesn’t require a lot of coordination.

Johno Crawford
@johnou
Mar 25 2018 20:29
yeah i think we'd go with something similar, perhaps core for the proxy (peer) nodes and data for player (blocking io tasks) and space (non blocking io tasks) nodes
oh I can actually openly talk about that now
we finally released the game under our company name
Jordan Halterman
@kuujo
Mar 25 2018 20:34
That’s awesome!
keen to see how the metrics look when we make the swap
maybe ByteArrayOutputStreams with byte arrays over size of x wouldn't be added to the queue pool?
i'm concerned about raft operations serialising large payloads then increasing the memory footprint for the rest of the application lifetime
Johno Crawford
@johnou
Mar 25 2018 22:34
updated, trying to keep it as simple as possible
Jordan Halterman
@kuujo
Mar 25 2018 22:34
that’s a good point
that was actually one of my concerns with byte[]s being resized too
Johno Crawford
@johnou
Mar 25 2018 22:35
might make sense declaring another const for the max payload
io.atomix.utils.serializer.KryoNamespace#MAX_BUFFER_SIZE is probably too big
Jordan Halterman
@kuujo
Mar 25 2018 22:36
yeah actually I think there really should be a max of 1MB for any message and then maybe a smaller max size for pooled buffers
larger data needs to be chunked
Johno Crawford
@johnou
Mar 25 2018 22:37
yeah makes sense
Johno Crawford
@johnou
Mar 25 2018 22:43
changed it to 768kb
Johno Crawford
@johnou
Mar 25 2018 22:52
@kuujo hate to say it but now that you are saying a cluster would be possible with data only nodes maybe the old name you had makes more sense than core now
maybe
Jordan Halterman
@kuujo
Mar 25 2018 22:53
Yep I thought of that too 🤔
There may be some other options: PERSISTENT/EPHEMERAL
Doesn’t really cover the purpose of them though
COORDINATION
Johno Crawford
@johnou
Mar 25 2018 22:58
PERSISTENT_DATA
EPHEMERAL_DATA
CLIENT
mh
Jordan Halterman
@kuujo
Mar 25 2018 23:11
wow man… I was editing my slides and accidentally pressed ⌘q instead of ⌘a and….. closed my browser
I could have sworn I configured it to ask me before closing
I HAD SO MUCH STUFF OPEN
Johno Crawford
@johnou
Mar 25 2018 23:13
that's the worst