These are chat archives for atomix/atomix

7th
Aug 2018
Junbo Ruan
@aruanruan
Aug 07 2018 00:09
@kuujo why the raft timer accuration in server nodes depends on client session timeout?
for the timer, i found the code
/**
 * Raft protocol configuration.
 */
public class MultiRaftProtocolConfig extends PrimitiveProtocolConfig<MultiRaftProtocolConfig> {
  private String group;
  private Partitioner<String> partitioner = Partitioner.MURMUR3;
  private Duration minTimeout = Duration.ofMillis(250);
  private Duration maxTimeout = Duration.ofSeconds(30);
  private ReadConsistency readConsistency = ReadConsistency.SEQUENTIAL;
  private CommunicationStrategy communicationStrategy = CommunicationStrategy.LEADER;
  private Recovery recoveryStrategy = Recovery.RECOVER;
  private int maxRetries = 0;
  private Duration retryDelay = Duration.ofMillis(100);
maxTimeout is 30 seconds default
Junbo Ruan
@aruanruan
Aug 07 2018 00:15
when i set to 5 seconds, it is very closed to my settings
Sungoesdown
@Sungoesdown87_twitter
Aug 07 2018 01:35
Morning. I was wondering...is it possible to setup volatile storage? I am primarily using a ConsistentMap. Thanks!
Junbo Ruan
@aruanruan
Aug 07 2018 12:29
i upgraded to version '-rc7', when i killed the one node, the cluster was not stable leader changed in different nodes, just now, a fatal occure:
20:25:22.750 [atomix-bootstrap-heartbeat-receiver] INFO  i.a.c.d.BootstrapDiscoveryProvider - Joined
20:25:22.751 [atomix-bootstrap-heartbeat-receiver] INFO  i.a.c.i.DefaultClusterMembershipService - node-3 - Member activated: Member{id=node-3, address=10.93.79.103:5000, properties={}}
20:25:22.808 [atomix-bootstrap-heartbeat-receiver] INFO  i.a.c.i.DefaultClusterMembershipService - Started
20:25:22.808 [atomix-cluster-0] INFO  i.a.c.m.i.DefaultClusterCommunicationService - Started
20:25:22.811 [atomix-cluster-0] INFO  i.a.c.m.i.DefaultClusterEventService - Started
20:25:22.822 [atomix-0] INFO  i.a.p.p.i.DefaultPartitionGroupMembershipService - Started
20:25:22.844 [atomix-0] INFO  i.a.p.p.i.HashBasedPrimaryElectionService - Started
20:25:22.895 [atomix-0] INFO  i.a.p.r.p.impl.RaftPartitionServer - Starting server for partition PartitionId{id=1, group=system}
Exception in thread "main" java.util.concurrent.CompletionException: java.lang.IllegalArgumentException: limit cannot be negative
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:991)
        at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
        at io.atomix.primitive.partition.impl.DefaultPartitionService.lambda$start$2(DefaultPartitionService.java:150)
        at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
        at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
        at io.atomix.primitive.partition.impl.DefaultPartitionService.start(DefaultPartitionService.java:138)
        at io.atomix.core.Atomix.lambda$startServices$3(Atomix.java:733)
        at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
        at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
        at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: limit cannot be negative
        at io.atomix.storage.buffer.AbstractBuffer.limit(AbstractBuffer.java:256)
        at io.atomix.storage.journal.JournalSegmentReader.readNext(JournalSegmentReader.java:152)
        at io.atomix.storage.journal.JournalSegmentReader.next(JournalSegmentReader.java:113)
        at io.atomix.storage.journal.SegmentedJournalReader.next(SegmentedJournalReader.java:163)
        at io.atomix.storage.journal.SegmentedJournalReader.initialize(SegmentedJournalReader.java:44)
        at io.atomix.storage.journal.SegmentedJournalReader.<init>(SegmentedJournalReader.java:33)
        at io.atomix.storage.journal.SegmentedJournal.openReader(SegmentedJournal.java:591)
        at io.atomix.protocols.raft.storage.log.RaftLog.openReader(RaftLog.java:70)
        at io.atomix.protocols.raft.cluster.impl.RaftMemberContext.resetState(RaftMemberContext.java:77)
        at io.atomix.protocols.raft.cluster.impl.RaftClusterContext.<init>(RaftClusterContext.java:95)
        at io.atomix.protocols.raft.impl.RaftContext.<init>(RaftContext.java:166)
        at io.atomix.protocols.raft.impl.DefaultRaftServer$Builder.build(DefaultRaftServer.java:246)
        at io.atomix.protocols.raft.impl.DefaultRaftServer$Builder.build(DefaultRaftServer.java:222)
maybe the log data damaged
buko
@buko
Aug 07 2018 12:32
what is the difference between WorkQueue and DistributedQueue?
Junbo Ruan
@aruanruan
Aug 07 2018 12:38
item in workQueue is task
buko
@buko
Aug 07 2018 12:41
@aruanruan so the DistributedQueue isn't like that? Items put on a DistributedQueue are available to all cluster members even if one member removes an item from the queue?
buko
@buko
Aug 07 2018 12:51
Didn't atomix support transactional messaging in a cluster? Is it possible for a cluster leader to send a message to its followers and block until the followers ack the message?
suimi
@suimi
Aug 07 2018 12:54
how to set some spring resource at the custom primitive type?
have any solution?
Junbo Ruan
@aruanruan
Aug 07 2018 14:17
@suimi i think atomix do not match this style
Jordan Halterman
@kuujo
Aug 07 2018 18:14
hmm I’ve never seen the negative limit exception
it’s a corrupted log entry
Johno Crawford
@johnou
Aug 07 2018 18:15
maybe jvm was not shut down gracefully
Jordan Halterman
@kuujo
Aug 07 2018 18:18
that should be fine
can’t stop a node from being unplugged
the Raft log uses checksums and ignores corrupted entries, so I’ve never seen this
hmm
actually maybe this condition if (length == 0) { should just be if (length <= 0) {
Jordan Halterman
@kuujo
Aug 07 2018 18:24
I think the problem is probably that the length bytes were corrupted
JournalSegmentWriter seems to handle this correctly but not the reader
Johno Crawford
@johnou
Aug 07 2018 18:25
  buffer.skip(cachedEntry.size() + Bytes.INTEGER + Bytes.INTEGER);
unless it skipped into the buffer too far or too little
Jordan Halterman
@kuujo
Aug 07 2018 18:26
that would be a very frequent bug if that were the case
also since the reader is being initialized the cache shouldn’t even have any entries in it yet
Johno Crawford
@johnou
Aug 07 2018 19:09
anyway
i don't agree <= 0 is the way to go
once the stream is out of sync
it's out of sync
Jordan Halterman
@kuujo
Aug 07 2018 19:09
the problem is not something being out of sync
it’s the bytes in the file being incomplete
because of a node crash
Johno Crawford
@johnou
Aug 07 2018 19:09
attempting to read another four b oh?
Jordan Halterman
@kuujo
Aug 07 2018 19:10
yeah
Johno Crawford
@johnou
Aug 07 2018 19:11
oh right because it's a segment
well can it be covered somehow?
Jordan Halterman
@kuujo
Aug 07 2018 19:13
Not sure what you mean. Entries are appended to the log. When the log is re-read, the first empty or invalid entry (according to either the length or the checksum) is considered the end of the log. We stop iterating and start writing at that point. Raft guarantees if an entry has been committed then it won’t be corrupted, and there’s nothing wrong with ignoring and removing missing/corrupted entries.
The only way to guard against corrupted data is by handling it in this way. We have no real control over what the hard drive is writing when it crashes. There’s always a chance for some weird bytes to be in the log, and this is how they’re detected and discarded.
Johno Crawford
@johnou
Aug 07 2018 19:41
Sure, just making sure by handling it this way it can continue to read from the correct index
Like after reading a corrupt size it ignores that then the buffer index is off which means the next read into would fail too
Jordan Halterman
@kuujo
Aug 07 2018 19:43
the buffer index is not off
      // If the buffer length is zero then return.
      if (length <= 0) {
        buffer.reset();
        nextEntry = null;
        return;
      }
Johno Crawford
@johnou
Aug 07 2018 19:43
:ok_hand:
Jordan Halterman
@kuujo
Aug 07 2018 19:43
buffer.reset() resets it back to the correct position to read the length again
Johno Crawford
@johnou
Aug 07 2018 20:09
Not IMO
Looks broken
Reset goes back to where mark was called
What I think is possibly broken is that we don't check if the bytes are available to read
So it should check if it can read an int, if it cannot then it resets and returns
Otherwise it might be possible that an exception is thrown when reading and reset is not called
Oh wait Java API, is BufferUnderflowException the only possible exception we would need to handle
Jordan Halterman
@kuujo
Aug 07 2018 20:22
Right buffer.mark() is called before the length is read, and BufferUnderflowException is thrown if the bytes can’t be read, and the buffer is reset to mark() if the entry is invalid. This code skipping invalid/corrupted entries has been used for a long time, just hasn’t run into a case where the length is invalid. That might be something about the Windows implementation.
Johno Crawford
@johnou
Aug 07 2018 20:38
Yeah just navigated through the storage byte buffer implementation
protected int checkRead(int offset, int length) {
checkOffset(offset);
if (limit == -1) {
if (offset + length > capacity) {
if (capacity < maxCapacity) {
if (this.offset + offset + length <= bytes.size()) {
capacity = bytes.size() - this.offset;
} else {
capacity(calculateCapacity(offset + length));
}
} else {
throw new BufferUnderflowException();
}
}
} else {
if (offset + length > limit)
throw new BufferUnderflowException();
}
return offset(offset);
}
Jordan Halterman
@kuujo
Aug 07 2018 23:03
We released rc8 but more optimizations coming for ONOS so will be doing an rc9 this evening as well
#776 is a good one
Junbo Ruan
@aruanruan
Aug 07 2018 23:37
morning~
Jordan Halterman
@kuujo
Aug 07 2018 23:38
Evening 😏
Junbo Ruan
@aruanruan
Aug 07 2018 23:39
when i kill java process abnormal, it is very odd with different bugs
Jordan Halterman
@kuujo
Aug 07 2018 23:40
which other ones?
saw the IllegalArgumentException
may be return a big number
?
Jordan Halterman
@kuujo
Aug 07 2018 23:41
that’s fine
although I guess we should limit it
to avoid the memory consumption
Junbo Ruan
@aruanruan
Aug 07 2018 23:42
02:04:46.839 [raft-server-consensus-partition-1] WARN  i.a.p.raft.roles.CandidateRole - RaftServer{consensus-partition-1}{role=CANDIDATE} - io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused: 10.93.85.253/10.93.85.253:5000
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.BootstrapMethodError: call site initialization exception02:06:39.656 [raft-server-consensus-partition-1] WARN  i.a.p.raft.roles.CandidateRole - RaftServer{consensus-partition-1}{role=CANDIDATE} - io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused: 10.93.85.253/10.93.85.253:5000

        at java.lang.invoke.CallSite.makeSite(CallSite.java:341)
        at java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:307)
        at java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:297)
        at io.atomix.cluster.AtomixCluster.stopServices(AtomixCluster.java:304)
        at io.atomix.core.Atomix.lambda$stopServices$9(Atomix.java:758)
        at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
 ... 9 more
Caused by: java.lang.OutOfMemoryError: Java heap space
another node report out of memory error
Jordan Halterman
@kuujo
Aug 07 2018 23:42
probably from the very large size
Junbo Ruan
@aruanruan
Aug 07 2018 23:42
i think so
Jordan Halterman
@kuujo
Aug 07 2018 23:43
I think there’s probably something about Windows that allows the sizes to be partially written that we don’t see in Linux
Junbo Ruan
@aruanruan
Aug 07 2018 23:44
i tested it in linux
centos 7.2 64bit
Jordan Halterman
@kuujo
Aug 07 2018 23:45
same exceptions?
Junbo Ruan
@aruanruan
Aug 07 2018 23:45
yes
Jordan Halterman
@kuujo
Aug 07 2018 23:45
how are you killing the nodes?
Junbo Ruan
@aruanruan
Aug 07 2018 23:46
cat /proc/version
Linux version 3.10.0-514.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Nov 22 16:42:41 UTC 2016
ps -ef |grep AtomixAgent | grep -v grep |awk '{print $2}'| xargs kill -9
Jordan Halterman
@kuujo
Aug 07 2018 23:46
strange
so not the OS
Maybe the disk
Junbo Ruan
@aruanruan
Aug 07 2018 23:48
how check the disk?
Jordan Halterman
@kuujo
Aug 07 2018 23:49
I’ve killed thousands of nodes with kill -9 and poweroff and have never seen this, so you’re going to have to test some fixes on your end 🙏
It’s fine to set a maximum size limit on Raft log entries. There really should be anyways, it’s just never been fully implemented.
Junbo Ruan
@aruanruan
Aug 07 2018 23:52
ok, i will do it now
Jordan Halterman
@kuujo
Aug 07 2018 23:53
actually I often kill hundreds of nodes in a single test, so maybe I’ve killed tens of thousands of nodes? A hundred thousand? Would be an interesting statistic 🤔
thanks!
Johno Crawford
@johnou
Aug 07 2018 23:53
Does memory.clear().limit(length) alloc
Jordan Halterman
@kuujo
Aug 07 2018 23:53
it’s a good find
I think it may be allocating a byte[] somewhere
Johno Crawford
@johnou
Aug 07 2018 23:54
Yeah plenty of places like that
memory is being resized
Johno Crawford
@johnou
Aug 07 2018 23:54
Even in the protocol
Jordan Halterman
@kuujo
Aug 07 2018 23:54
to fit the length
that will allocate a new byte[] for the memory
really should limit the message size in the Netty encoder/decoder too
there’s been a TODO note in the decoder for… forever