These are chat archives for atomix/atomix

14th
Mar 2017
Jordan Halterman
@kuujo
Mar 14 2017 00:23
Copycat shouldn't allow a join() call to block indefinitely, but the correct approach to handle it is by instead calling get() with a timeout.
Jordan Halterman
@kuujo
Mar 14 2017 07:11

@redvasily in reference to one node appearing to be blocked while the other can continute to submit incrementAndGet, does that blocking appear to be indefinite? I’d be curious to see the client’s logs. Both clients will not necessarily be able to start progressing immediately once both nodes are back up. They each have to independently find the new leader and potentially register a new session before continuing. I’ll try to reproduce it myself.

The ApplicationException is really odd. That typically indicates that an exception occurred inside the state machine itself.

arshad khan
@arshadwkhan_twitter
Mar 14 2017 07:51
@kuujo I implemented custom resource but the problem is that it works only when there is single node in cluster. When there are two nodes the the bootstrapping hangs. I went ahead and removed all references of custom resource that I had created but now the cluster is not even starting and I see following exception. Any idea what it could be ? 2017-03-14 13:15:41 DEBUG ClusterState:441 - /10.191.206.183:5001 - Sending serv
er identification to /10.191.206.183:5000
2017-03-14 13:15:41 DEBUG MetaStore:96 - Store vote 0
2017-03-14 13:15:41 DEBUG FollowerState:59 - /10.191.206.183:5001 - Sent Configu
reResponse[status=OK, error=null]
2017-03-14 13:15:41 DEBUG FollowerState:51 - /10.191.206.183:5001 - Received App
endRequest[term=1, leader=180348306, logIndex=0, logTerm=0, entries=2, commitI
ndex=0, globalIndex=0]
2017-03-14 13:15:41 DEBUG SegmentManager:346 - Created segment: Segment[id=1, ve
rsion=1, index=0, length=0]
2017-03-14 13:15:41 ERROR SingleThreadContext:23 - An uncaught exception occurre
d
java.lang.IndexOutOfBoundsException: inconsistent index: 1
at io.atomix.catalyst.util.Assert.index(Assert.java:45)
at io.atomix.copycat.server.storage.Segment.append(Segment.java:287)
at io.atomix.copycat.server.storage.Log.append(Log.java:294)
at io.atomix.copycat.server.state.ActiveState.appendEntries(ActiveState.
java:107)
at io.atomix.copycat.server.state.PassiveState.checkGlobalIndex(PassiveS
tate.java:150)
at io.atomix.copycat.server.state.PassiveState.handleAppend(PassiveState
.java:124)
at io.atomix.copycat.server.state.ActiveState.append(ActiveState.java:47
)
at io.atomix.copycat.server.state.FollowerState.append(FollowerState.jav
a:191)
at io.atomix.copycat.server.state.ServerContext.lambda$connectServer$18(
ServerContext.java:548)
at io.atomix.catalyst.transport.netty.NettyConnection.handleRequest(Nett
yConnection.java:113)
at io.atomix.catalyst.transport.netty.NettyConnection.lambda$handleReque
st$2(NettyConnection.java:97)
at io.atomix.catalyst.concurrent.Runnables.lambda$logFailure$2(Runnables
.java:20)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51
1)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.
access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.
run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-03-14 13:15:41 DEBUG ClusterState:481 - /10.191.206.183:5001 - Cancelling j
Jordan Halterman
@kuujo
Mar 14 2017 07:52
What version are you using? How are you bootstrapping the cluster? Have you tried deleting the storage directory?
looks like a really odd error that I’d love to reproduce
I can’t imagine what could cause literally the first entry in the log to be inconsistent
hmm
especially on that line
arshad khan
@arshadwkhan_twitter
Mar 14 2017 07:59
   defaultLibraries 'io.atomix.copycat:copycat-server:1.2.3'
    defaultLibraries 'io.atomix.copycat:copycat-client:1.2.3'
    defaultLibraries 'io.atomix:atomix-variables:1.0.3'
    defaultLibraries 'io.atomix:atomix:1.0.3'
    defaultLibraries 'io.atomix:atomix-resource:1.0.3'
    defaultLibraries 'io.atomix.catalyst:catalyst-netty:1.1.2'
Jordan Halterman
@kuujo
Mar 14 2017 07:59
how are you bootstrapping the servers?
arshad khan
@arshadwkhan_twitter
Mar 14 2017 08:02
I am using in memory storage ..... String host = "10.191.206.183";
    List<Address> cluster = Arrays.asList(
            new Address(host, 5000),
            new Address(host, 5001)
    );

    Address address = new Address(host, 5000);

    atomix = AtomixReplica.builder(address)
            .withTransport(new NettyTransport())
            .withStorage(new Storage(StorageLevel.MEMORY))        
            .build();


    atomix.bootstrap(cluster).join();
the second instance also calls bootstrap on a different port String host = "10.191.206.183";
    List<Address> cluster = Arrays.asList(
            new Address(host, 5000),
            new Address(host, 5001)
    );

    Address address = new Address(host, 5001);

    atomix = AtomixReplica.builder(address)
            .withTransport(new NettyTransport())
            .withStorage(new Storage(StorageLevel.MEMORY))
            .build();


    atomix.bootstrap(cluster).join();
Jordan Halterman
@kuujo
Mar 14 2017 08:04
definitely right
arshad khan
@arshadwkhan_twitter
Mar 14 2017 08:05
this was working flawlessly but I suspect (not very sure) after I introduced "custom resource" it went bad
arshad khan
@arshadwkhan_twitter
Mar 14 2017 08:25
It started working after i removed copycat.meta and copycat-1-1.log
Jordan Halterman
@kuujo
Mar 14 2017 08:26
Probably just an old configuration on disk
arshad khan
@arshadwkhan_twitter
Mar 14 2017 17:04
Hi @kuujo ... I have implemented custom resource but facing some issues. It works for few calls where I incrementandget the resource but after some time start getting this exception.
2017-03-14 22:03:14 INFO NettyClient:66 - Connecting to /192.168.100.5:5001
2017-03-14 22:03:14 INFO NettyClient:98 - Connected to /192.168.100.5:5001
2017-03-14 22:03:14 WARN LeaderAppender:315 - /192.168.100.5:5000 - AppendReque
st to /192.168.100.5:5001 failed. Reason: failed to instantiate reference: must
provide a single argument constructor
AD EXCEPTION:null
=====>decrement ref count : 0
^^^^^SIZE OF ACTIVES :2 ACTIVEQ:2
**RESOURCE SLIDING CONSTRUTOR:
java.lang.InterruptedException
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.ja
va:347)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:189
5)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:
40)
Jordan Halterman
@kuujo
Mar 14 2017 19:36
Some serializable object doesn't have a default constructor
Probably a state machine command
Jordan Halterman
@kuujo
Mar 14 2017 20:20
@jhall11 new and improved: split_logs.py some/karaf.log
import sys, os, re
from os.path import isfile, isdir, dirname, basename

path = sys.argv[1]

INDENT = 25
SEP = ' | '

def find_pattern(f, pattern):
    values = set()
    prog = re.compile(pattern)
    for line in f:
        match = prog.search(line)
        if match:
            values.add(match.group(1))
    f.seek(0)
    return values

def split_line(line):
    split = line.split('|')
    if len(split) > 5:
        ts = split[0].strip()
        cls = split[3].strip()
        log = split[5].strip(' /')
        tsbuf = ''.join([' ' for i in range(INDENT - len(ts))])
        clsbuf = ''.join([' ' for i in range(INDENT - len(cls))])
        return ts + tsbuf + SEP + cls + clsbuf + SEP + log
    return ''

def split_log(infile, matcher, filemaker, matchmaker):
    for match in matcher(infile):
        matcher = matchmaker(match)
        with open(filemaker(match), 'w+') as outfile:
            for line in infile:
                if matcher in line:
                    outfile.write(split_line(line))
        infile.seek(0)

def split_partitions(filepath, f):
    split_log(
        f,
        lambda f: find_pattern(f, '9876-partition-([0-9]+)'),
        lambda partition: filepath + '.partition-' + partition,
        lambda partition: 'partition-' + partition)

def split_sessions(filepath, f):
    split_log(
        f,
        lambda f: find_pattern(f, 'session=([0-9]+)'),
        lambda session: filepath + '.session-' + session,
        lambda session: session)

def split_file(dirpath, filename):
    filepath = dirpath + os.sep + filename
    with open(filepath, 'r') as f:
        split_partitions(filepath, f)
        split_sessions(filepath, f)

def split_files(path):
    for (dirpath, dirnames, filenames) in os.walk(path):
        for filename in filenames:
            if '.log' in filename:
                split_file(dirpath, filename)

if __name__ == '__main__':
    path = sys.argv[1]
    if isfile(path):
        split_file(dirname(path), basename(path))
    else:
        split_files(path)
Jon Hall
@jhall11
Mar 14 2017 20:23
it might be usefull to check this in somewhere :)
Jordan Halterman
@kuujo
Mar 14 2017 20:25
yep… where that is? I dunno yet. Where does one put random things like this?
Jon Hall
@jhall11
Mar 14 2017 20:31
well the patterns only really work with ONOS? so maybe onos/tools/dev/bin/?
Jordan Halterman
@kuujo
Mar 14 2017 20:31
yeah needs to be in ONOS… sweet
Vasily Sulatskov
@redvasily
Mar 14 2017 22:04
@kuujo Thanks for your responses. I guess .get() with a timeout works well enough for me provided there's no some sort of dangling references deep inside of atomix. As for the weird problems I am encountering I will prepare a small demo where you'll be able to reproduce the problem. (In a day or two).
Jordan Halterman
@kuujo
Mar 14 2017 23:43
:clap: