These are chat archives for atomix/atomix

30th
Jan 2017
Min Zhou
@coderplay
Jan 30 2017 08:20
Can copycat serve multiple writing clients which concurrently modify the same statemachine in the cluster?
Jordan Halterman
@kuujo
Jan 30 2017 08:26

Absolutely. There are a lot of guarantees that are in place that apply to those types of situations though. Ultimately, when multiple clients are writing to the cluster, the individual commands will be interleaved between clients in the order in which they're received by the leader. When they're applied to the state machine, the commands will be sequenced on a single thread/Executor.

But it's totally feasible to build a distributed lock to support something like transactions as is done in Atomix, e.g. submit a lock command from a client, associate the lock with the client's session in the state machine, and only allow state changes from that client until the lock is released. If the session expires then release the lock. Once the lock is released, send a message to any other clients awaiting the lock to allow them to make changes.

Min Zhou
@coderplay
Jan 30 2017 08:37
@kuujo Thanks for the quick reply. The reason why I asked that question is I got little bit confused with the thread model of copycat. If there are concurrent clients, there should be multi-threads call LeaderState#applyCommand , through which in turn multi-threads call context.getLog().append(entry). But getLog().append doesn't have a lock on it. Not sure how to guarantee the correctness.
Jordan Halterman
@kuujo
Jan 30 2017 08:58
gotcha… I can help explain...
Jordan Halterman
@kuujo
Jan 30 2017 09:14

So, Copycat internally uses a really well structured threading model. It relies on ThreadContext and in particular SingleThreadContext objects (which essentially wrap Executors to create an event loop) to dictate the thread on which certain parts of the code are executed. Part of the reason this is confusing inside the server is because it actually assumes (incorrectly) that this is handled by the Tranaport/Server. So, in the NettyServer, when a handler is registered on a NettyConnection, the connection stores the ThreadContext for the thread that registered the handler, and uses the context to put all requests on that thread. So, the request handling inside the server is effectively single threaded. That is, all requests are handled on the same thread, and you'll see checkThread() calls throughout the server code verifying that's the case. But it's also asynchronous/non-blocking, which is why it can support multiple clients. A request is handled and written to the log and then typically a future is created by the LeaderAppender and the thread "exits" and continues on to handle the next request. This may be cheaper than thread per connection and using locks, particularly since write have to be sequenced to the log anyways. Once a command is committed, another thread applies it to the state machine asynchronously, and once the state machine returns a result the thread that originally handled the request will send the response.

A major part of the Copycat 2.0 refactoring is modifying that thread model. The code will still be asynchronous, but Copycat 2.0 adds a read-write lock and support for multiple concurrent readers in the log. So, a request will be handled by the server, written to the log, and then will be handed off to other threads for replication - a thread per follower. That work is still in progress, so it's not clear whether requests will still be handled on a single multiplexed thread or several threads, but it still won't be a thread per connection.

Rally, the rationale behind the single thread/event loop model was reducing complexity to get the implementation correct, which it seems to be now. So, the second phase - Copycat 2.0 - is replacing the single event loop thread model with multiple event loops.
Min Zhou
@coderplay
Jan 30 2017 09:21
@kuujo Are you saying the event loop for netty server is a single thread event loop? Let's say if we create a copycat server with withClientTransport(new NettyTranport()).withServerTransport(new NettyTransport()), is it break the rule ? Since the default thread number is Runtime.getRuntime().availableProcessors()
Min Zhou
@coderplay
Jan 30 2017 09:26
I know netty4's thread model well. But copycat's is different. The confusing point for me is CatalystThread#setContext. If all event loop threads share the same SingleThreadContext, then everything make sense. But I don't know the place where copycat calls CatalystThread#setContext for each event loop thread.
Min Zhou
@coderplay
Jan 30 2017 09:43

I think I understand the implementation now. the magic is from NettyHandler::channelActive.

  @Override
  public void channelActive(ChannelHandlerContext context) throws Exception {
    ...
    CompletableFuture.runAsync(() -> listener.accept(connection), this.context.executor()).join();
  }

You replace the event loop thread with a thread initialized by CopycatServer. That's a weird design.

Jordan Halterman
@kuujo
Jan 30 2017 09:50

Indeed that. It is indeed weird and is therefore gone in Copycat 2.0. The transport can call callbacks on whatever thread it wants. All the threading is handled only inside the server.

This threading model actually comes from Vert.x, to which I was previously a long time contributor. In Vert.x, the calling thread’s context is stored and future callbacks are called using that context, which takes a lot of responsibility for threading off the user’s plate. But the problem with doing that in this case is just that it has made custom Transports more difficult to create. So, now the server is responsible for calling its own internal ThreadContext (https://github.com/atomix/copycat/blob/2.0/server/src/main/java/io/atomix/copycat/server/state/ServerContext.java#L511-L551) which may or may not be replaced with locks as refactoring continues, and Transports can do whatever they want.

What I was going to link are these two blocks of code:

First, NettyConnection stores the current ThreadContext when a handler is added:
https://github.com/atomix/catalyst/blob/master/netty/src/main/java/io/atomix/catalyst/transport/netty/NettyConnection.java#L361-L366

Then, to call the handler it uses the ThreadContext:
https://github.com/atomix/catalyst/blob/master/netty/src/main/java/io/atomix/catalyst/transport/netty/NettyConnection.java#L95-L100

So, yeah. What the code you copied is doing is calling basically an onConnect type callback on the thread that started the server. Then, requests are handled on the thread that registers the respective request handler.
Jordan Halterman
@kuujo
Jan 30 2017 09:56
Copycat 2.0 has a similar but totally rewritten messaging API that’s specific to its protocol and doesn’t have any threading requirements:
https://github.com/atomix/copycat/tree/2.0/protocol/src/main/java/io/atomix/copycat/protocol