These are chat archives for atomix/atomix
Hi again. On closing a replica I get this in the logs (repeatedly):
2016-03-21 13:29:49.845 [Thread-0] INFO eais.atomix - FSA component AtomixComponent - Calling final state 2016-03-21 13:29:49.845 [async-dispatch-16] INFO eais.atomix - Node 10.10.3.11:8010 left group metrics 2016-03-21 13:29:50.060 [copycat-server-server311.engagor.com/10.10.3.11:8020] INFO i.a.catalyst.transport.NettyClient - Connecting to server301.engagor.com/10.10.3.1:8020 2016-03-21 13:29:50.068 [copycat-server-server311.engagor.com/10.10.3.11:8020] WARN io.netty.channel.AbstractChannel - Force-closing a channel whose registration task was not accepted by an event loop: [id: 0x243183a7] java.util.concurrent.RejectedExecutionException: event executor terminated at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:715) ~[eais-standalone.jar:na] at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:300) ~[eais-standalone.jar:na] at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:691) ~[eais-standalone.jar:na] at io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:421) ~[eais-standalone.jar:na] at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:60) [eais-standalone.jar:na] at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:48) [eais-standalone.jar:na] at io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:64) [eais-standalone.jar:na] at io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:315) [eais-standalone.jar:na] at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:134) [eais-standalone.jar:na] at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:116) [eais-standalone.jar:na] at io.atomix.catalyst.transport.NettyClient.connect(NettyClient.java:93) [eais-standalone.jar:na] at io.atomix.copycat.server.state.ConnectionManager.createConnection(ConnectionManager.java:69) [eais-standalone.jar:na] at io.atomix.copycat.server.state.ConnectionManager.getConnection(ConnectionManager.java:47) [eais-standalone.jar:na] at io.atomix.copycat.server.state.AbstractAppender.sendAppendRequest(AbstractAppender.java:242) [eais-standalone.jar:na] at io.atomix.copycat.server.state.LeaderAppender.sendAppendRequest(LeaderAppender.java:356) [eais-standalone.jar:na] at io.atomix.copycat.server.state.LeaderAppender.appendEntries(LeaderAppender.java:206) [eais-standalone.jar:na] at io.atomix.copycat.server.state.LeaderAppender.appendEntries(LeaderAppender.java:110) [eais-standalone.jar:na] at io.atomix.copycat.server.state.LeaderState.appendMembers(LeaderState.java:143) [eais-standalone.jar:na] at io.atomix.catalyst.util.concurrent.Runnables.lambda$logFailure$17(Runnables.java:20) ~[eais-standalone.jar:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_74] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[na:1.8.0_74] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_74] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[na:1.8.0_74] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_74] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_74] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_74] 2016-03-21 13:29:50.069 [copycat-server-server311.engagor.com/10.10.3.11:8020] ERROR i.n.u.c.D.rejectedExecution - Failed to submit a listener notification task. Event loop shut down?
Any ideas? (This is Atomix rc2)
I need to handle a large number of data streams, possibly thousands. Streams consist of records. Here are the requirements:
Each stream should get written to at least three boxes for redundancy.
The order in which the records are written must be the same on each box, and if the master box for a stream goes down, then there needs to be a smooth transition to one of the replicas.
I need to be able to control the placement of the streams on the boxes because they're not all the same. Some streams will be very active, generating a lot of data at a high rate, and some will be very quiet. We'll need to be able to move them around.
Records need to get written in a transaction-safe manner. We can't lose records. This is a system that can afford some inefficiency and lower performance in exchange for safety. A two-phase commit would be ok.
I have to be able to implement some kind of automated healing and repair so that if a box goes down, we can create and populate a replacement so we still have three copies of the data for each stream.
This sounds like it's tailor-made for Kafka, but Kafka doesn't do automatic healing, and it's a very heavy system with tons of dependencies. Also, it uses Zookeeper for metadata management, but not for the data streams themselves, so it's not fully transaction-safe.
Would it be possible to build such a system on top of Atomix?
Transportand starting multiple
CopycatServers. This is fairly straightforward to do, it's purely the resource creation and deletion in Atomix that is the reason for not having implemented it (having do account for failures while e.g. deleting a resource in all shards). But, since sharding through the Raft algorithm has not yet been implemented...
PartitionGroups, and what this gives you is the abstraction necessary to build something like a Kafka.
PartitionGroups are designed exactly like Kafka's partitions. You create a
partitionit into a
PartitionGroup. Each partition can consist of multiple members, and within each
GroupPartitiona unique leader is elected. Consistent hashing is used to hash partitions to nodes. When a node crashes, its petition will be reassigned and a
GroupPartitionMigrationis provided to the user to move data as necessary.
DistributedGroupprovides abstractions for direct messaging between members over the
Transportor persistent medsaging through the Raft algorithm. The obvious problem with this is it's much more low level. It provides the ability to manage a cluster and partition data, but doesn't actually do the partitioning. That is left up to the user. But the reason DGroup has been designed that way is because future resources will be built on top of it. Rather than implementing sharding at the Raft level, resources will use a
PartitionGroupto provide the option of e.g. a partitioned
DistributedMap. But the problem with needing stronger guarantees than Kafka is that
PartitionGroupdoesn't quite get you all the way there. It's fairly easy to implement what Kafka does on top of it, but for stronger guarantees you may end up having to duplicate a lot of what Raft does in terms of ensuring order and things like that...
queue3, each of which are mapped to a different instance of the Raft algorithm. This would be much easier to implement in Atomix and the complexities of managing resources across shards would be left to the user and perhaps implemented in some later version.
Transport, 100 heartbeats would be sent each heartbeat interval. Similarly, client sessions have to be managed for each share. I don't think any of the current sharded implementations do sessions, and this is where you start to run into the issues with coordinating across shards and start needing something like 2pc again. If a client is communicating with
nshards then it needs to register
nsessions and send
nkeep alives. This would require a bit more work to perhaps create a
ShardedServerthat can split a single keep alive from the client into one per shard.