Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 11:24
    wojciech-adaptive updated the wiki
  • Nov 29 13:08
    mjpt777 commented #1093
  • Nov 29 12:43
    tipame commented #1093
  • Nov 29 10:12
    mjpt777 commented #1093
  • Nov 29 10:12
    mjpt777 closed #1093
  • Nov 29 10:12
    mjpt777 labeled #1093
  • Nov 29 08:05
    tipame opened #1093
  • Nov 28 14:55

    mjpt777 on master

    [Java] Upgrade to Mockito 3.6.2… (compare)

  • Nov 28 14:35

    mjpt777 on master

    [Java] Upgrade to Mockito 3.6.2… (compare)

  • Nov 28 12:49

    mjpt777 on master

    [Java] LogPublisher should clea… (compare)

  • Nov 27 18:56

    mjpt777 on master

    [Java] Disconnect LogAdapter wh… (compare)

  • Nov 26 17:03

    vyazelenko on master

    [Java] Fix a race in re-schedul… (compare)

  • Nov 26 16:54

    mjpt777 on master

    [C] Warnings and type fixes. (compare)

  • Nov 26 16:52

    mjpt777 on master

    [C] Warnings and type fixes. (compare)

  • Nov 26 16:38

    mjpt777 on master

    [C] Warnings and type fixes. (compare)

  • Nov 26 15:59

    mjpt777 on master

    [C] Warnings and type fixes. (compare)

  • Nov 26 15:40

    mjpt777 on master

    [C] Warnigns and type fixes. (compare)

  • Nov 26 15:11

    mjpt777 on master

    [C/Java] Align event logging be… (compare)

  • Nov 26 15:11
    mjpt777 closed #1091
  • Nov 26 15:11
    mjpt777 closed #1072
Todd L. Montgomery
@tmontgomery
feyyaz91
@feyyaz91
Thank you, will take a look
VadimMolodyh
@VadimMolodyh
Hi,
I'm looking for a low-latency messaging system supporting strict ordering of messages with mutliple writters and readers for the same queue (i.e. with resolved distributed consensus issue). I see that Kafka provides it but has quite high latency (~15ms for p95 end-to-end latency). It looks like Aeron Cluster provides this features as well, but I do not see actual end-to-end benchmarks.
Plus, I could not find C++ API, is it in plans?
Thanks, Vadim.
Martin Thompson
@mjpt777
@VadimMolodyh Aeron Cluster can beat Kafka by a few orders of magnitude on latency. At present there are Java and C# APIs for Cluster. We would be interested in doing a C++ API if someone was willing to sponsor the development.
VadimMolodyh
@VadimMolodyh
Got it. Thanks!
thangnc2707
@thangnc2707

@mjpt777 I use .zsh (on Macbook pro) running ./gradlew. I run test case io.aeron.agent.DriverLoggingAgentTest, it failed. Detail log

java.util.concurrent.TimeoutException: logAll() timed out after 10 seconds
        at org.junit.jupiter.engine.extension.TimeoutInvocation.createTimeoutException(TimeoutInvocation.java:70)
        at org.junit.jupiter.engine.extension.TimeoutInvocation.proceed(TimeoutInvocation.java:59)
        at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
        at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
        at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
        at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
        at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
        at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
        at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
        at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
        at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
        at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
        at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
        at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210)
        at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
        at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206)
        at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131)
        at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65)
        at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
        at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
        at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
        at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)

when I change cmd from zsh to bash. Test case pass.

Michael Barker
@mikeb01
I suspect this is more of an issue with the gradlew script than Aeron. If you change the first line of the script to #!/usr/bin/env bash does it work?
thangnc2707
@thangnc2707
yes, I run the first line "chsh -s /bin/bash".
Dmitry Vyazelenko
@vyazelenko
@thangnc2707 I'm running zsh on MacOS as well but don't have a problem you described.
Since you are getting a timeout from the test maybe your system was busy when the tests were executed? For instance I found out recently that accountsd process can use lots of CPU (I've seen 700%) and when this happens I get problems with multiple projects whose tests are time-based.
thangnc2707
@thangnc2707
@vyazelenko thank you. Maybe my system was busy. Now I run pass script with zsh.
thangnc2707
@thangnc2707
I confused when run /usr/bin/bash so test pass.
feyyaz91
@feyyaz91
Hi, i have a aeron setup where server A instantiates a recorded publication and sends packets to server B. At the same time server C instantiates a replication and receives the replicated recording from A. When i stop/start C, i would like it to ReplayMerge by replaying the replicated recording from C then merge in the publication from A. Is this possible or is my setup incorrect? I am currently seeing an error of “session-id:xxx must reference a network publication” when the ReplayMerge is started. Let me know if anything is unclear.
cheers, feyyaz
*correction to the above. Trying to stop/start Server B
tedvash
@tedvash
Hi, is there any interplay between archive replays and using an ExclusivePublication? If I test a replay of a concurrent publication I can verify all data, but if I switch to exclusive the replay never starts. All from one process. Does the replay re-use my Publication instance by any chance?
tedvash
@tedvash
The exception message says "The requested replay start position 0 must be less than highest recorded position 0". But this does NOT happen with ConcurrentPublication.
Martin Thompson
@mjpt777
@tedvash Can you provide a test case as a PR or Gist?
@feyyaz91 Can you provide a test case as a PR or Gist?
feyyaz91
@feyyaz91

@mjpt777 Apologies, a bit difficult to provide the test case but can provide a few more details: Publication is on channel aeron:udp?tags=1,100001|control=hostnameA:40456|control-mode=dynamic. Subscription on host B is specified as aeron:udp?endpoint=hostnameB:49101|control=hostnameA:40456
Replication is started on hostC using same using liveDestination same as subscription defined above and connecting to the source archive on hostA using srcControlChannel=aeron:udp?endpoint=hostnameA:40461.

The replication is started successfully and i can see a replicate recording on hostC.
When i start hostB, the first thing it does is try to replay the recording from hostC and then merge into the live stream from hostA. The replaySubscriberChannel is defined as aeron:udp?endpoint=hostnameB:40491|session-id=tag:100001 and the liveDestination same as subscription above for the ReplayMerge. However when the replaymerge starts it throws “must reference a network publication” when trying to look up the network publication by sessionid tag 100001.

When i moved the replicate from hostC to hostA, i was able to replayMerge perfectly from hostB.

It looks as though the media driver on hostC is unable to locate the publication
Jussi Virtanen
@jvirtanen

The media driver contains two hard-coded constants regarding network activity:

  • PENDING_SETUPS_TIMEOUT_NS (1 s): dictates how often a receiver sends out setup-eliciting Status Messages to the sender, periodic in Multi-Destination-Cast (MDC)
  • PUBLICATION_SETUP_TIMEOUT_NS (100 ms): dictates how often a sender sends out Setup Frames to receivers, periodic in MDC

Is there any background on how these specific numbers were selected or why they cannot be altered (unlike e.g. Receiver Status Message Timeout)?

Martin Thompson
@mjpt777
@feyyaz91 It seems like you are using quite an advanced setup. Without a test case we cannot help unless you are on a support contract.
@feyyaz91 As a simple bit of feedback. Tag referencing publications and subscriptions only works on the same driver.
Akhil Nagpal
@akhil_n_twitter
Hello - I am new to Aeron. We have a following use-case . Aeron media driver starts as a standalone app as User1 of Group1. Media driver client User 2 of Group 1 connects to this standalone app. The Aeron directory has user/group - read/write/execute permissions. We are getting few issues while accessing Aeron directories files. Caused by: java.nio.file.AccessDeniedException: at $AERON-DIR/cnc.dat (aeron = Aeron.connect(context)) . However if i give others as r/w access , I do not see these issues
Martin Thompson
@mjpt777
@jvirtanen This settings are based on approximate round trip times and rate limiting. We have the difficult balance to strike between making everything configurable setting some reasonable defaults that cover typical usage to keep the surface API manageable. By working with are customers on support contracts we ensure their needs are meet. Others without support have to hope they within a reasonable window. Competitive products to Aeron cost a lot more than what we change in support so it is great value.
Jussi Virtanen
@jvirtanen
@mjpt777 Alright, I understand. Thank you for the answer; we'll have a look at the support options.
Martin Thompson
@mjpt777
@akhil_n_twitter You need to ensure you are setting the correct umask to enable group permissions. Aeron clients need read and write access to the files.
Akhil Nagpal
@akhil_n_twitter
@mjpt777 - thank you very much.
rinkigoyal
@rinkigoyal
I am trying to use aeron archive functionality for reproducing a state for debugging purpose.If your subscriber is subscribing to two streams -X & Y, then process few -10 messages from X && then same from Y (if available), and publish new messages on new stream Z.
My requirement is to be able to reproduce a state of stream Z by replaying exact sequence of events from streamx & y.
i.e. for an instant t, state has been derived by consuming 200 messages from X && 100 messages from Y,
we can not reproduce exact state because exact sequence of message is not available if using default aeron archive solution.
What is the best way to handle these kind of scenarios?
Ethan
@ethanf
@rinkigoyal I'm not an authority here, but I can point you in the right direction. Messages are sequenced per channel/streamid/sessionid. So if X/Y are on different streams, then there is no sequencing between them. The only sequencing happens in your application as you poll the two subscriptions. I suggest re-publishing the messages from X, Y into a new stream XY, and have your application subscribe to that stream and have the archive record it so you can replay the exact sequence of messages. You could also have your application re-publish the messages in the order it sees as more of an audit that gets recorded.
Todd L. Montgomery
@tmontgomery
Should also note that a sequenced stream like that is what Cluster does. The leader acts as the re-publisher for the clients. And all members see the same sequence.
tedvash
@tedvash
Hi, in terms of features, is there something in Aeron resembling the lbm late joiner feature?
Michael Barker
@mikeb01
@tedvash The closest would be to use Aeron Archive and MergeReplay. I would have a look at those to see if they meet your needs. https://github.com/real-logic/aeron/wiki/Aeron-Archive
Ivan Zemlyanskiy
@QIvan

hi! (please don't treat me as an employer of my company only, this question is more about my own curiosity)
if I've got an archive and I need a connection to it, I can come up with code like this one:

        try (MediaDriver md = MediaDriver.launch();
             Archive archive = Archive.launch();
             AeronArchive aeronArchive =
                     AeronArchive.connect(new AeronArchive.Context().aeron(archive.context().aeron())))
        {

        }

but some times it throws this exception

    io.aeron.exceptions.TimeoutException: ERROR - Archive connect timeout: correlationId=12 step=4
        at io.aeron.archive.client.AeronArchive$AsyncConnect.checkDeadline(AeronArchive.java:3056)
        at io.aeron.archive.client.AeronArchive$AsyncConnect.poll(AeronArchive.java:2942)
        at io.aeron.archive.client.AeronArchive.connect(AeronArchive.java:214)

looks like that code smells, but I can't see why, could you help me to figure it out?
thanks!

Martin Thompson
@mjpt777
@QIvan That code looks fine. The machine might be starved or resource so does not complete in time.
Michael Barker
@mikeb01
@QIvan The problem seems to be due to reusing the Aeron client that was constructed internally by the archive as a client for for the AeronArchive. If you create a new Aeron client or use the client field already defined in the test then it works without error.
Martin Thompson
@mjpt777
@mikeb01 good spot. The Aeron client used in the Archive is single threaded with no lock and using the invoker.
Michael Barker
@mikeb01
Best not to reuse the Archive's Aeron client if the Archive 'owns' the client instance.
Locally for me it gets stuck trying to add the subscription because nothing is invoking the client's conductor to service the admin requests.
Ivan Zemlyanskiy
@QIvan
@mikeb01 thanks for the explanation!
but I'm still not quite sure how it works. I run Archive in DEDICATED mode, there should be a thread in the Archive (ArchiveConductor) which owns an Aeron instance and keeps calling invoke() method, right?
why if I create a subscription with that Aeron instance why it ends up badly? thanks

Best not to reuse the Archive's Aeron client if the Archive 'owns' the client instance.

worth adding to the Archive javadoc, what do you think?

Michael Barker
@mikeb01
If you don't supply an Aeron client to the original Archive, it constructs one internally. The internally constructed client runs in invoker mode, so when connecting the AeronArchive client call addSubscription which blocks.
If you want to share an Aeron client between the Archive and AeronArchive I would construct it externally so that you know which threading mode it is using and pass it into both.
Yes, we should document that.
I'm also looking at whether there is some up front checking we can do, e.g. in AeronArchive check to see if the supplied client is running in invoker mode and throw immediately.
Ivan Zemlyanskiy
@QIvan

so when connecting the AeronArchive client call addSubscription which blocks.

sorry, maybe I'm dumb today. But ok, we block the current thread where we call addSubscription, but there is another one, the Archive conductor, which calls invoke for the Aeron instance

am I missing something?
Ivan Zemlyanskiy
@QIvan
Michael Barker
@mikeb01
supplied.
It's that the internally constructed Aeron client has it's lock mode set to NoOp, so it can only be safely access by one thread. So sharing it will result in undefined behaviour. https://github.com/real-logic/aeron/blob/master/aeron-archive/src/main/java/io/aeron/archive/Archive.java#L848