These are chat archives for atomix/atomix

12th
Apr 2018
Jordan Halterman
@kuujo
Apr 12 2018 03:10
Actually, they’re connection pool improvements. There’s a memory leak in the connection pool that happens during network partitions
I’ll move some of the new ONOS code over whenever I manage to figure out this bug I’ve been chasing all day.
Jordan Halterman
@kuujo
Apr 12 2018 04:13
FINALLY
Tests should be getting closer to stable now. Will keep hacking on them until they get there, but we should try to get my big ass PR merged tomorrow so we can be done with big ass PRs
Jordan Halterman
@kuujo
Apr 12 2018 05:17

Seems back to average instability now. Will have to figure out the document tree tests separately.

This week or next week I’ll do the last beta release. Pretty soon will be release candidates as we test Atomix 2 in ONOS.

Johno Crawford
@johnou
Apr 12 2018 08:47
:clap:
Johno Crawford
@johnou
Apr 12 2018 15:02
@kuujo so close.. [ERROR] RaftConsistentTreeMapTest.treeMapFunctionsTest » Completion java.lang.Assertio...
i was eyeing that queue with the fixed size sigh
Jordan Halterman
@kuujo
Apr 12 2018 15:02
yeah I’ve been trying to figure that one out
it’s a really weird bug
the state machine is actually returning the correct result but when the client gets it it’s incorrect
it seems like some sort of polution across tests
Johno Crawford
@johnou
Apr 12 2018 15:03
the fixed size queue isn't a hack for that test is it
Jordan Halterman
@kuujo
Apr 12 2018 15:03
never happens when I just run the method
Johno Crawford
@johnou
Apr 12 2018 15:03
ah right, race condition then
Jordan Halterman
@kuujo
Apr 12 2018 15:06
may just have to create a new cluster for each test method rather than reusing the same cluster across test methods
that seems to fix it
or not...
Johno Crawford
@johnou
Apr 12 2018 15:07
sounds like sweeping a bug under the rug
Jordan Halterman
@kuujo
Apr 12 2018 15:08
I don’t think it’s an Atomix bug… it’s a test bug. State from previous test runs is materializing.

this line returns a null result:
https://github.com/atomix/atomix/blob/master/core/src/main/java/io/atomix/core/map/impl/ConsistentMapService.java#L421

That somehow becomes Versioned{value=bar, version=4, creationTime=2018-04-12 11:09:35,325} by the time it gets to the client ¯_(ツ)_/¯

Johno Crawford
@johnou
Apr 12 2018 15:13
well it has to change somewhere
or is the object id the same
Jordan Halterman
@kuujo
Apr 12 2018 15:13
and it only happens when multiple tests are run, which indicates it’s pollution from a prior test
Johno Crawford
@johnou
Apr 12 2018 15:13
i'm going to regret saying this but what if you disable the kryo pooling
Jordan Halterman
@kuujo
Apr 12 2018 15:13
almost always on the second run
Jordan Halterman
@kuujo
Apr 12 2018 15:21
strangely, that made it always happen on the first run
fixed it
Johno Crawford
@johnou
Apr 12 2018 15:21
what was it?
Jordan Halterman
@kuujo
Apr 12 2018 15:22
I have no idea why it only happened on the second run without the Kryo pooling change, and always on the first run with it.
has nothing to do with Kryo
Johno Crawford
@johnou
Apr 12 2018 15:23
storage is still serialised with kryo
iirc
Jordan Halterman
@kuujo
Apr 12 2018 15:24
the problem as pollFirstEntry and pollLastEntry being treated as queries that could be handled by a single node, so the client polls an entry on one node and then switches servers and that entry hasn’t been removed. No idea why that would consistently happen in this pattern
oh well fixed
Johno Crawford
@johnou
Apr 12 2018 15:25
So how's command differ from query?
Jordan Halterman
@kuujo
Apr 12 2018 15:33
Queries presumably don’t modify the state of the RSM, so they don’t need to go through the Raft log and don’t need to be replicated. But if they do modify the state of the RSM then it will result in inconsistencies like this one. Good find actually
No idea why this test doesn’t fail in ONOS
Johno Crawford
@johnou
Apr 12 2018 15:35
Any other queries that need swapping to command?
Only other nit I had was the naming of the json type key
Jordan Halterman
@kuujo
Apr 12 2018 15:38

Ugly API then:

POST v1/primitives/foo
{
  “_type”: “consistent-map”,
  “null-values”: true
}
primitives:
  foo:
    _type: consistent-map
    null-values: true

Will add a special character to all configurations

Johno Crawford
@johnou
Apr 12 2018 15:38
Hmm
So you don't think it would overlap with anything?
Maybe not
Jordan Halterman
@kuujo
Apr 12 2018 15:44
If someone adds a type field to a custom configuration, but maybe that should just be validated
Johno Crawford
@johnou
Apr 12 2018 15:48
Yeah reserved / restricted field
Great work
Jordan Halterman
@kuujo
Apr 12 2018 15:52
just need to figure out how to get the fields for a class from Jackson
Johno Crawford
@johnou
Apr 12 2018 15:53
Oh now I get it
Because poll removes the entry
Derp
Jordan Halterman
@kuujo
Apr 12 2018 16:04
yep
Hmm… actually the configurations already have a type property so maybe should just leave it. If someone adds a type property to a custom configuration they’ll just have a bug on their hands, but it should be pretty obvious since all of the polymorphic configurations require a type field already
one of the tests passed :-)
I think the other stalled
Johno Crawford
@johnou
Apr 12 2018 16:09
Did you restart it
Started just now
Johno Crawford
@johnou
Apr 12 2018 16:15
Maybe build node wasn't available
Jordan Halterman
@kuujo
Apr 12 2018 17:08
ugh
The log length has exceeded the limit of 4 MB (this usually means that the test suite is raising the same exception over and over).
bah there’s some trace logs in there
Johno Crawford
@johnou
Apr 12 2018 17:49
Boooo
Johno Crawford
@johnou
Apr 12 2018 17:58
Did you also fix that backup issue?
Happened on tree test notifications iirc
Johno Crawford
@johnou
Apr 12 2018 18:20
@kuujo holy shit it's green :)
Jordan Halterman
@kuujo
Apr 12 2018 18:56
:clap:
lol
yeah I fixed that too
Jordan Halterman
@kuujo
Apr 12 2018 19:12

Merged!

Time to go home!

We’ll meet back here in the morning and rewrite it ;-)

Johno Crawford
@johnou
Apr 12 2018 19:12
ahahahha
if my colleague doesn't call me again at 330..