These are chat archives for atomix/atomix

27th
Jul 2018
toni
@digevil
Jul 27 2018 02:42
@kuujo thanks, I am still working on it, so far so good, btw, another question, If i want to register a new class into tree, i need to use code like:
.atomicDocumentTreeBuilder(MODEL)
                .withSerializer(Serializer.using(Namespace.DEFAULT.builder()
                        .register(Model.class)
if the Model class has some memeber and the members are user defined type, i need to add them one by one correct?
if we have a reflection version of register method, which will do all this automatically (just need to register the top class)
toni
@digevil
Jul 27 2018 06:59
@kuujo sorry i just found that method registerSubTypes() actually doing the job...
Jordan Halterman
@kuujo
Jul 27 2018 08:19
yeah… actually some PRs that were just merged today add additional methods to handle this like withNodeType(SomeType.class) for AtomicDocumentTree or withKeyType/withValueType for maps etc
Also, a global configuration allows user types to optionally be serialized without registration
@digevil TBH the reflection code that handles that registration is pretty lazy and probably has issues. It really should try to register generic type arguments too. I’d be interested to see your implementation if it’s more thorough
Jordan Halterman
@kuujo
Jul 27 2018 08:24
TBH registering types in that way has potential to cause problems for upgrades. But so does the current serialization API more generally
toni
@digevil
Jul 27 2018 08:25
if withNodeType(SomeType.class) and withKeyType/withValueType is to be provided then better to follow
Jordan Halterman
@kuujo
Jul 27 2018 08:26
it’s already in master and will be released probably tomorrow
along with the path changes
toni
@digevil
Jul 27 2018 08:26
ice cool
Jordan Halterman
@kuujo
Jul 27 2018 08:26
I just need to update ONOS in the morning
Jordan Halterman
@kuujo
Jul 27 2018 08:33
ugh my push to the website apprently didn’t work
toni
@digevil
Jul 27 2018 08:47
our use case is sth like this:
  1. we will build a document tree in raft partitions which represents a nas file tree, which is organized by data team, they divide the pmml files into group then group into application
  2. we scan the fs periodically and once any change, apply to atomic tree by a leader, who won from the leader election
  3. the models will provide real time service, so roll model looks pretty much like roll code, so we design logical stage in memory, which represented by another atomix tree, it has same structure as the file system atomix tree, difference is that under the group level, there is prod and test node and under prod node, there is primary and secondary, request will mostly come into primary but once primary is rolling or anything wrong, secondary takes over
  4. the roll action will change the env tree, an hystrix command factory listen to the env tree event, and if any change, it changes the model it will call
  5. we plan to use atomix semaphore to control concurrency of file system io
  6. we plan use cyclic barrier to make sure a specific version of model is properly loaded into all nodes
  7. all these efforts is because that we are building a universal model evaluation service, so many application visits us for different model evaluation, so need to keep every model group consistency
Jordan Halterman
@kuujo
Jul 27 2018 08:49
sounds awesome!
love to see you get use out of that many different abstractions
toni
@digevil
Jul 27 2018 08:50
thanks, working on it
wow you update the document, cool
Jordan Halterman
@kuujo
Jul 27 2018 08:52
The cyclic barrier is a new primitive and should really be marked @Beta. If it makes sense to change something on it we can.
the final release will be next month which is when we have the next ONOS release, so squeezing minor changes in before then
toni
@digevil
Jul 27 2018 08:54
ok
wanghhao
@wanghhao
Jul 27 2018 09:50
@kuujo , I use atomix-3.0.0-rc5,start three agents, comment rest server, the cpu is 100%, my cpu is i7-7700hq, 2.8g
Johno Crawford
@johnou
Jul 27 2018 09:51
@wanghhao can you use jmc or another profiler to see what's burning the cpu?
wanghhao
@wanghhao
Jul 27 2018 09:52

cluster.discovery {
type: bootstrap
nodes.1 {
id: member1
address: "localhost:5001"
}
nodes.2 {
id: member2
address: "localhost:5002"
}
nodes.3 {
id: member3
address: "localhost:5003"
}
}

profiles.1 {
type: consensus
partitions: 3
members: [member1, member2, member3]
}

profiles.2 {
type: data-grid
partitions: 32
}

-m member1 -a localhost:5001 -p 6001
Jordan Halterman
@kuujo
Jul 27 2018 09:52
Simple configuration. Strange. Maybe http://github.com/atomix/atomix-test will come in handy here. Going to try to reproduce it
What did you do on the rest api?
Yeah the best solution is to just use a profiler
And share the results
I have to take a nap so I’ll have to check it out with that info in the morning
wanghhao
@wanghhao
Jul 27 2018 09:55
didn't start rest server
toni
@digevil
Jul 27 2018 09:56
@kuujo its pretty late, better sleep now
wanghhao
@wanghhao
Jul 27 2018 11:16
image.png
image.png
thread netty-messaging-event-nio-client-9 is most active

atomix-0 Waiting CPU usage on sample: 0ms
sun.misc.Unsafe.park(boolean, long) Unsafe.java (native)
java.util.concurrent.locks.LockSupport.park(Object) LockSupport.java:175
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() AbstractQueuedSynchronizer.java:2039
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() ScheduledThreadPoolExecutor.java:1081
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() ScheduledThreadPoolExecutor.java:809
java.util.concurrent.ThreadPoolExecutor.getTask() ThreadPoolExecutor.java:1074
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1134
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:624
java.lang.Thread.run() Thread.java:748

atomix-bootstrap-heartbeat-receiver Waiting CPU usage on sample: 0ms
sun.misc.Unsafe.park(boolean, long) Unsafe.java (native)
java.util.concurrent.locks.LockSupport.park(Object) LockSupport.java:175
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() AbstractQueuedSynchronizer.java:2039
java.util.concurrent.LinkedBlockingQueue.take() LinkedBlockingQueue.java:442
java.util.concurrent.ThreadPoolExecutor.getTask() ThreadPoolExecutor.java:1074
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1134
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:624
java.lang.Thread.run() Thread.java:748

atomix-bootstrap-heartbeat-sender Waiting CPU usage on sample: 0ms
sun.misc.Unsafe.park(boolean, long) Unsafe.java (native)
java.util.concurrent.locks.LockSupport.parkNanos(Object, long) LockSupport.java:215
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) AbstractQueuedSynchronizer.java:2078
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() ScheduledThreadPoolExecutor.java:1093
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() ScheduledThreadPoolExecutor.java:809
java.util.concurrent.ThreadPoolExecutor.getTask() ThreadPoolExecutor.java:1074
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1134
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:624
java.lang.Thread.run() Thread.java:748

atomix-cluster-0 Waiting CPU usage on sample: 0ms
sun.misc.Unsafe.park(boolean, long) Unsafe.java (native)
java.util.concurrent.locks.LockSupport.park(Object) LockSupport.java:175
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() AbstractQueuedSynchronizer.java:2039
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() ScheduledThreadPoolExecutor.java:1081
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() ScheduledThreadPoolExecutor.java:809
java.util.concurrent.ThreadPoolExecutor.getTask() ThreadPoolExecutor.java:1074
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1134
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:624
java.lang.Thread.run() Thread.java:748

atomix-cluster-event-executor-0 Waiting CPU usage on sample: 0ms
sun.misc.Unsafe.park(boolean, long) Unsafe.java (native)
java.util.concurrent.locks.LockSupport.parkNanos(Object, long) LockSupport.java:215
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) AbstractQueuedSynchronizer.java:2078
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() ScheduledThreadPoolExecutor.java:1093
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() ScheduledThreadPoolExecutor.java:809
java.util.concurrent.ThreadPoolExecutor.getTask() ThreadPoolExecutor.java:1074
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1134
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:

Johno Crawford
@johnou
Jul 27 2018 11:21
you also say 100% cpu, what's the heap look like?
could be full gc
wanghhao
@wanghhao
Jul 27 2018 11:22
image.png
image.png
only used 177M
@johnou , can you see the pics I uploaded
wanghhao
@wanghhao
Jul 27 2018 11:31
image.png
Johno Crawford
@johnou
Jul 27 2018 11:45
Yep but I'm not seeing much that could cause it
Could I trouble you for a flight recording
wanghhao
@wanghhao
Jul 27 2018 11:45

profiles.1 {
type: consensus
partitions: 3
members: [member1, member2, member3]
}

profiles.2 {
type: data-grid
partitions: 32
}

Johno Crawford
@johnou
Jul 27 2018 11:45
With jmc
wanghhao
@wanghhao
Jul 27 2018 11:46
too many partitions?
if I change all partitions to 1, the cpu is lower 10%
Johno Crawford
@johnou
Jul 27 2018 11:58
i think a flight recording would still be interesting
wanghhao
@wanghhao
Jul 27 2018 11:59
sorry, I'm not native speaker, what's the flight recording
Johno Crawford
@johnou
Jul 27 2018 11:59
is the java home bin dir on your path?
you could open the java mission control application with the command jmc
wanghhao
@wanghhao
Jul 27 2018 12:13
image.png
ajay185
@ajay185
Jul 27 2018 14:29
Hi...
Can anyone help me with the documentation of Atomix version 1.0.8 ??
Jordan Halterman
@kuujo
Jul 27 2018 17:09
@ajay185 you have to use the way back machine. It’s not currently hosted anywhere. Or you can just checkout older commits from http://github.com/atomix/atomix.github.io and run the Jekyll server. I should probably tag the old docs
I also don’t see what’s causing it
Johno Crawford
@johnou
Jul 27 2018 17:12
@wanghhao yep that's it, need to send the jfr file
Jordan Halterman
@kuujo
Jul 27 2018 17:14

@wanghhao if you’re running all these nodes on the same machine (it looks like you are) then you need to set separate data directories for each node. I usually run hem in containers with a volume at a path named with the node name. You can do this in HOCON by using an environment variable or system property for the node name.
```
cluster.member-id: ${atomix.memberid}

partition-groups.raft.data-directory: .data/${atomix.memberid}

Ugh I messed that up
cluster.member-id: ${atomix.memberid}

partition-groups.raft.data-directory: .data/${atomix.memberid}
Can’t edit on the Gitter iOS client. Stupid
I suspect all these nodes are reading/writing the same data directory and it’s causing them to go haywire
Need to add these warnings to the docs
That configuration isn’t even correct 🤷‍♂️
cluster.node.id: ${atomix.memberid}

partition-groups.raft.data-directory: .data/${atomix.memberid}
There we go
Jordan Halterman
@kuujo
Jul 27 2018 17:20
In atomix-test I found using variables in configuration files to be a really convenient way of sharing a single configuration across nodes:
https://github.com/atomix/atomix-test/blob/master/config/consensus.conf
The atomix-test framework applies the configuration file to all the nodes by name
with create_cluster(‘consensus’, nodes=3) as cluster:
  cluster.node(1).kill()
cow12331
@cow12331
Jul 27 2018 21:35
I have a question about performance. I have ConsistentMap<String, Long> across about 10 hosts and there could be about thousands of keys. I am wondering whether I should use a map or just create a atomixValue directly. In addition, if some keys or atomixValue no longer are required, if any way to remove them based on time automatically.
Jordan Halterman
@kuujo
Jul 27 2018 21:47

@cow12331 AtomicMap supports TTLs:

map.put(“foo”, “bar”, Duration.ofSeconds(10))

However, the TTL is from the write time, not the last read time (which would be extremely costly to do consistently)

There’s some benefit to creating multiple AtomicValues rather than a map, but there’s a cost at that scale. The benefit is that each value is a separate session with consistency guarantees distinct from other values. That allows for concurrency across all the values, whereas AtomicMap provides consistency guarantees for all keys in the map. For example, if a client changes key “foo” before key “bar”, all other instances of the map will see key “foo” change before key “bar”. Whereas, separate values named “foo” and “bar” don’t have the same consistency across them. In the case of values, there’s only a sequential consistency within the value, meaning of some client sets the value to “bar” and then “baz” all other instances of the value will see those changes in that order.

But there’s also some overhead to creating many instances of a primitive. Each instance has a logical session when using Multi-Raft or multi-primary. Each of those protocols uses the session to associate operations with a specific primitive instance on a specific node. But there’s overhead to maintaining sessions. In Raft, the overhead is a periodic keep-alive to each partition. Atomix does a lot to minimize the overhead of multiple sessions, batching keep-alives for many primitives into a single request and a single write, and limiting session failure detectors for all sessions on a single node to a single heartbeat from each server, but there’s still a lot of independent coordination that has to happen for each primitive session.

So, at that scale I’d suggest it’s better to use an AtomicMap, and you still have a unit of scalability: the partition.

If you have a suggestion to make for the AtomicMap interface it’s welcome. One idea is to allow clients to acquire a reference to a key and remove the key when all references have been lost (sessions are closed or expired). I could see that type of mechanism being really useful.

It’s also really easy to implement
cow12331
@cow12331
Jul 27 2018 22:15
@kuujo Thanks. For my use case, it is to build a distributed in-memory rate limiter. So I want the object idle for long time can be ruined automatically. If Atomix can have some advance class like Cache in Guava, it would be very useful. It seems any key changed in map will result in the whole map sync.
Jordan Halterman
@kuujo
Jul 27 2018 22:15
what do you mean by “the whole map sync”?
Well, it depends on which primitives and which protocols you use. When you change a key in a map configured with the MultiRaftProtocol, the cost is a write to a Raft partition. That means writing the change to disk and replicating it to followers on some subset of the nodes in the cluster. When you change a key in a map configured with the MultiPrimaryProtocol, it’s replicated in memory in the same manner. When you use a DistributedMap configured with the AntiEntropyProtocol, it’s a local write and changes are replicated asynchronously.
cow12331
@cow12331
Jul 27 2018 22:17
I don't know how a map store in Atomix. I mean if a key in the map change, will the whole map sync or just update the key across cluster.
Johno Crawford
@johnou
Jul 27 2018 22:18
just the change
Jordan Halterman
@kuujo
Jul 27 2018 22:18
right
for Raft and multi-primary, the state of the map is represented as an append-only commit log, so actually the change to the key is replicated
for anti-entropy, the single key itself is replicated
nothing ever replicates an entire primitive unless catching up a new node
cow12331
@cow12331
Jul 27 2018 22:19
ok
Jordan Halterman
@kuujo
Jul 27 2018 22:30

To be clear, strong consistency for all operations on a single primitive does not require replicating the entire primitive. The strong consistency is gained by sharing a session across all the keys in the map, not replication. The session handles ordering of operations and events and allows the primitive to switch Raft or primary-backup nodes without seeing state go back in time. For example, when you do map.put(“foo”, …) and map.put(“bar”, …), the session applies a sequence number to the put foo and put bar operations. When a Raft leader receives the operations, it ensures those operations are applied to the repliated log in order and to the state machines in order. Similarly, when a Raft state machine publishes a sequence of events (e.g. AtomicMapEvent.Type.UPDATE, AtomicMapEvent.Type.REMOVE, etc), the session is responsible for ensuring listeners added via map.addListener(…) are called in the order in which those events occurred in the replicated state machine.

There is a linearizable aspect to the Raft consistency model, too, where it guarantees that once a change has been committed it is immediately visible to all other instances of the primitive. But that guarantee is implemented more or less by electing a leader and simply forcing writes and reads on the map to go to the same place.

So, the reason strong consistency is lost across primitives is just because they use separate sessions and thus order operations/events separately