Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Dec 02 20:13

    mjpt777 on master

    [C++] Simplify ReplayMerge exam… (compare)

  • Dec 02 18:02
    mjpt777 closed #220
  • Dec 02 18:02
    mjpt777 commented #220
  • Dec 02 17:59
    mjpt777 closed #974
  • Dec 02 17:59
    mjpt777 commented #974
  • Dec 02 17:56
    mjpt777 closed #1351
  • Dec 02 17:56
    mjpt777 closed #1379
  • Dec 02 17:56
    mjpt777 commented #1379
  • Dec 02 17:56
    mjpt777 labeled #1379
  • Dec 02 17:32
    mikeb01 assigned #1387
  • Dec 02 14:44
    vyazelenko assigned #1388
  • Dec 02 13:00
    reissGRVS opened #1388
  • Dec 01 19:18
    seh4nc opened #1387
  • Dec 01 17:22

    vyazelenko on master

    [Java] Use `DriverConductor#not… (compare)

  • Nov 30 20:47

    mikeb01 on master

    [Java] Increase timeout on syst… (compare)

  • Nov 30 16:38

    vyazelenko on master

    [Java] Increase timeout and awa… (compare)

  • Nov 30 15:49

    vyazelenko on master

    [Java] Remote `TestNode#errors`… (compare)

  • Nov 30 13:15

    vyazelenko on master

    [Batch] Add missing script to l… [C] Allocate the message buffer… (compare)

  • Nov 30 11:18

    vyazelenko on master

    [C] Move reset of the counter t… (compare)

  • Nov 30 11:14

    vyazelenko on master

    [C] Print extended information … [C] Do not rely on the position… (compare)

Carlo
@entangled90
It doesn't sound normal for two processes in the same rack
This message was deleted
Martin Thompson
@mjpt777
@entangled90 NAKs can happen even over loopback when buffer sizes are not correct given sending rate or congestion.
Carlo
@entangled90
ok thanks
What's the best way to investigate missing "messages" in a stream? I tried the LossReport but the timestamps don't match
Martin Thompson
@mjpt777
@entangled90 Messages will not go missing due to loss. Loss gets recovered. Without spending time understanding your application it is difficult to guess what can be wrong.
Mostly when people say messages have gone missing when we have investigated it was a bug in their app.
Are you registering image unavailable handlers to see if you have connections dropping out and reconnecting?
Carlo
@entangled90
yes, but I don't explicitly reconnect
Martin Thompson
@mjpt777
Aeron will reconnect if the publication and subscription are still active unless rejoin=false is set on the URI for the subscription.
Carlo
@entangled90
ok, so in fact It reconnects normally
I don't understand one thing about the LossReport:
I have lines where LAST_OBSERVATION = 11:02:19 & FIRST_OBSERVATION = 13:19:01
Martin Thompson
@mjpt777
Not only should exclusive publications not be shared across threads, subscriptions should never be shared across threads.
Carlo
@entangled90
shouldn't first_observation be before last_observation?
Martin Thompson
@mjpt777
Which media driver?
Carlo
@entangled90

Not only should exclusive publications not be shared across threads, subscriptions should never be shared across threads.

Now i'm using ConcurrentPublication and a single thread with a Subscription

C Media driver
Martin Thompson
@mjpt777
There is a bug in the recording of loss observations in the C media driver. It is writing the last and first observations the wrong way round. I'll fix it.
Carlo
@entangled90
ah ok, thanks
btw, is it normal that the loss report is not empty?
Martin Thompson
@mjpt777
It depends on congestion and how you size your buffers. A little loss is common.
Carlo
@entangled90
ok..Is there a way to check if NAK sent by one media driver are received by the other?
I suppose that the output of AeronStat is enough: NAKs received & Retransmits sent should mean it's receiving NAK from the other media driver
Martin Thompson
@mjpt777
Correct.
Carlo
@entangled90
I still there is something strange happening. Yesterday I restarted all the services in the afternoon and no loss was reported for the remaining time. Today I log in and I see that application is not receiving some messages. It startied at 07:42 on machine 1 (machine 2 reports no losses). LossReport is filled with rows starting from 07:42. Around 280 rows for that stream id.
pulisher publishes around 3k msgs/s of about 700 bytes each
Carlo
@entangled90
is there a "golden rule"for sizing buffer in the media driver given a throughput?
Martin Thompson
@mjpt777
@entangled90 You need to read up on Bandwidth Delay Product and queuing theory.
@entangled90 If you need more detailed help we can provide consulting.
Carlo
@entangled90
ok I'll take a look
@entangled90 If you need more detailed help we can provide consulting.
atm we are still assessing aeron, I'll let you know in the future
Ghost
@ghost~5fab2f45d73408ce4ff3c0e0
Hello I had a quick question about how aeron cluster nodes behave. This is not a support question, I am just trying to understand how the nodes interact and what impact network latency would have. My questions is If you have nodes in different regions where there is much more latency for one region than another is it expected that the slower region would slow down all nodes? Is there communications that are happening that would have that type of effect possibly in how the RAFT is communicating with each node? Thanks for any help in understanding performance characteristics of the cluster.
Martin Thompson
@mjpt777
@pieceofchum We only answer cluster questions to those on a support contract.
Ghost
@ghost~5fab2f45d73408ce4ff3c0e0
ok np
William
@ilove7412369_twitter

I think there is a bug in aeron IPC.
I run aeron archive on the channel say ipc stream id 1001

I publish thing in order,
but the subscriber read message 100057 before 100055, out of order and shown in log

The archive, however is totally in correct order.
Aeron version i use is 1.28.2

i wonder if such is a know issue or not.

Martin Thompson
@mjpt777
@ilove7412369_twitter We are not aware of such an issue. We have tests that assert the correct order. You can use the LogInspector on the log buffers to see contents. Are you certain the logic of your usage is correct? Subscriptions are not concurrent so each thread requires its own instance.
William
@ilove7412369_twitter
Thanks, i will investigate further.
Judd Gaddie
@juddgaddie

I have been looking at the design of the TimerService in Aeron Cluster and it places strict requirements on the ClusteredService to always call scheduleTimer in the same sequence after a replay or snapshot and subsequent replay.
The constraint of a ClusteredService to not do anything completely non-deterministic i.e. new Random() etc - feels reasonable to me.

However, it also requires ClusteredService to record and restore all its state in a snapshot in order to correctly call scheduleTimer() for it to be reliable. While strictly speaking the snapshot of a ClusteredService should store its entire state. Sometimes a user may only store a subset of the ClusteredService state when snapshotted, this may still result in correct behavior of the ClusteredService however it may not be a deterministic sequence of “schedule” and “cancel” calls to the TimerService therefore it may be doing NoOps when an attempt to schedule following the loading of the snapshot.

I did see the javadoc on the Cluster does hint at this.

Martin Thompson
@mjpt777
@juddgaddie The timers are recorded in the snapshot managed by the consensus module. The service does not need to worry about that.
Judd Gaddie
@juddgaddie
It needs to worry about it a little bit as the state of the ClusteredService needs to be in sync with the state of the timers in the consensus module (at least wrt the sequence of calls to schedule and cancel and correlationIds used). Otherwise because of expiredTimerCountByCorrelationIdMap in the consensus module some scheduled timers may be ignored.
Martin Thompson
@mjpt777
@juddgaddie You can take this up as a support issue as this is not the place to discuss. I'm not sure you are using things correctly.
Judd Gaddie
@juddgaddie
Fair enough, thanks
feyyaz91
@feyyaz91
Hello there, is there a way to use ReplayMerge when the live publication is on a different media driver to the archived recording? At the moment i am aware the recording and publication must share a media driver to successfully merge
William
@ilove7412369_twitter
Is it illegal to modify the buffer of the callback onFragment?
Martin Thompson
@mjpt777
@ilove7412369_twitter The buffer should be considered readonly.
Ivan Zemlyanskiy
@QIvan
hi! Happy New year, guys!
if you recall I asked some time ago about restarting archive and ports issue, when with java md everything worked fine, but with C md didn't (you said because C md is a way faster than Java one and that's cause of the problem)
Could you help to clarify this scenario: I create an aeron instance (with Aeron.connect()) and add a subscription for an endpoint let's say 10.1.1.1:1234, I work with it for some time, then I close the subscription, close the aeron instance and repeat everything once again. A question: what should I do, and should I, before I start everything over?
thank you in advance
Todd L. Montgomery
@tmontgomery
@QIvan you will want to wait until the counter associated with the channel has get removed.
Ivan Zemlyanskiy
@QIvan
but if I closed the aeron where can I get the CounterReader from?
I mean, I tried to do it with aeron.countersReader(), but I got a segFault =)
Martin Thompson
@mjpt777
@QIvan Same way AeronStat does it.