Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 11:24
    wojciech-adaptive updated the wiki
  • Nov 29 13:08
    mjpt777 commented #1093
  • Nov 29 12:43
    tipame commented #1093
  • Nov 29 10:12
    mjpt777 commented #1093
  • Nov 29 10:12
    mjpt777 closed #1093
  • Nov 29 10:12
    mjpt777 labeled #1093
  • Nov 29 08:05
    tipame opened #1093
  • Nov 28 14:55

    mjpt777 on master

    [Java] Upgrade to Mockito 3.6.2… (compare)

  • Nov 28 14:35

    mjpt777 on master

    [Java] Upgrade to Mockito 3.6.2… (compare)

  • Nov 28 12:49

    mjpt777 on master

    [Java] LogPublisher should clea… (compare)

  • Nov 27 18:56

    mjpt777 on master

    [Java] Disconnect LogAdapter wh… (compare)

  • Nov 26 17:03

    vyazelenko on master

    [Java] Fix a race in re-schedul… (compare)

  • Nov 26 16:54

    mjpt777 on master

    [C] Warnings and type fixes. (compare)

  • Nov 26 16:52

    mjpt777 on master

    [C] Warnings and type fixes. (compare)

  • Nov 26 16:38

    mjpt777 on master

    [C] Warnings and type fixes. (compare)

  • Nov 26 15:59

    mjpt777 on master

    [C] Warnings and type fixes. (compare)

  • Nov 26 15:40

    mjpt777 on master

    [C] Warnigns and type fixes. (compare)

  • Nov 26 15:11

    mjpt777 on master

    [C/Java] Align event logging be… (compare)

  • Nov 26 15:11
    mjpt777 closed #1091
  • Nov 26 15:11
    mjpt777 closed #1072
Ivan Zemlyanskiy
@QIvan

hi! I was exploring your benchmarks and I wonder what do you think about running benchmarks with docker containers? I found it very useful and very convenient for everybody else to reproduce the results on their machines, but I guess you might have concerns about affecting the results.
My experience tells it barely noticeable:

thanks

Martin Thompson
@mjpt777
@QIvan Are you using loopback in your benchmarks? Try around machines and see if there is a difference.
Ivan Zemlyanskiy
@QIvan
do you mean network=host or just localhost?
with network=host should be no difference from network point of view, the program should use core directly for sending frames
Ivan Zemlyanskiy
@QIvan
at least it how I understand it now
Martin Thompson
@mjpt777
@QIvan I mean what endpoint are using using in the Aeron channel?
If source and destination are on the same machine then loopback will be used on Linux regardless.
Ivan Zemlyanskiy
@QIvan
yes, it's localhost by default
Martin Thompson
@mjpt777
Not a real test of Docker then.
You need to run on different machines.
Ivan Zemlyanskiy
@QIvan
got it, I'll try to find ones
thanks for the pointer
Martin Thompson
@mjpt777
You're welcome ;-)
feyyaz91
@feyyaz91
Hi, i have a recorded publication on host A. I would like to replicate this to a destination archive on host B. When attempting to call replicate() on host B to begin the replication, how can i obtain the srcRecordingId from from host A that i require? Its not clear to me how i would get this. Any pointers appreciated. Cheers, Feyyaz
Todd L. Montgomery
@tmontgomery
feyyaz91
@feyyaz91
Thank you, will take a look
VadimMolodyh
@VadimMolodyh
Hi,
I'm looking for a low-latency messaging system supporting strict ordering of messages with mutliple writters and readers for the same queue (i.e. with resolved distributed consensus issue). I see that Kafka provides it but has quite high latency (~15ms for p95 end-to-end latency). It looks like Aeron Cluster provides this features as well, but I do not see actual end-to-end benchmarks.
Plus, I could not find C++ API, is it in plans?
Thanks, Vadim.
Martin Thompson
@mjpt777
@VadimMolodyh Aeron Cluster can beat Kafka by a few orders of magnitude on latency. At present there are Java and C# APIs for Cluster. We would be interested in doing a C++ API if someone was willing to sponsor the development.
VadimMolodyh
@VadimMolodyh
Got it. Thanks!
thangnc2707
@thangnc2707

@mjpt777 I use .zsh (on Macbook pro) running ./gradlew. I run test case io.aeron.agent.DriverLoggingAgentTest, it failed. Detail log

java.util.concurrent.TimeoutException: logAll() timed out after 10 seconds
        at org.junit.jupiter.engine.extension.TimeoutInvocation.createTimeoutException(TimeoutInvocation.java:70)
        at org.junit.jupiter.engine.extension.TimeoutInvocation.proceed(TimeoutInvocation.java:59)
        at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
        at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
        at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
        at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
        at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
        at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
        at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
        at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
        at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
        at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
        at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
        at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210)
        at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
        at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206)
        at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131)
        at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65)
        at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
        at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
        at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
        at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)

when I change cmd from zsh to bash. Test case pass.

Michael Barker
@mikeb01
I suspect this is more of an issue with the gradlew script than Aeron. If you change the first line of the script to #!/usr/bin/env bash does it work?
thangnc2707
@thangnc2707
yes, I run the first line "chsh -s /bin/bash".
Dmitry Vyazelenko
@vyazelenko
@thangnc2707 I'm running zsh on MacOS as well but don't have a problem you described.
Since you are getting a timeout from the test maybe your system was busy when the tests were executed? For instance I found out recently that accountsd process can use lots of CPU (I've seen 700%) and when this happens I get problems with multiple projects whose tests are time-based.
thangnc2707
@thangnc2707
@vyazelenko thank you. Maybe my system was busy. Now I run pass script with zsh.
thangnc2707
@thangnc2707
I confused when run /usr/bin/bash so test pass.
feyyaz91
@feyyaz91
Hi, i have a aeron setup where server A instantiates a recorded publication and sends packets to server B. At the same time server C instantiates a replication and receives the replicated recording from A. When i stop/start C, i would like it to ReplayMerge by replaying the replicated recording from C then merge in the publication from A. Is this possible or is my setup incorrect? I am currently seeing an error of “session-id:xxx must reference a network publication” when the ReplayMerge is started. Let me know if anything is unclear.
cheers, feyyaz
*correction to the above. Trying to stop/start Server B
tedvash
@tedvash
Hi, is there any interplay between archive replays and using an ExclusivePublication? If I test a replay of a concurrent publication I can verify all data, but if I switch to exclusive the replay never starts. All from one process. Does the replay re-use my Publication instance by any chance?
tedvash
@tedvash
The exception message says "The requested replay start position 0 must be less than highest recorded position 0". But this does NOT happen with ConcurrentPublication.
Martin Thompson
@mjpt777
@tedvash Can you provide a test case as a PR or Gist?
@feyyaz91 Can you provide a test case as a PR or Gist?
feyyaz91
@feyyaz91

@mjpt777 Apologies, a bit difficult to provide the test case but can provide a few more details: Publication is on channel aeron:udp?tags=1,100001|control=hostnameA:40456|control-mode=dynamic. Subscription on host B is specified as aeron:udp?endpoint=hostnameB:49101|control=hostnameA:40456
Replication is started on hostC using same using liveDestination same as subscription defined above and connecting to the source archive on hostA using srcControlChannel=aeron:udp?endpoint=hostnameA:40461.

The replication is started successfully and i can see a replicate recording on hostC.
When i start hostB, the first thing it does is try to replay the recording from hostC and then merge into the live stream from hostA. The replaySubscriberChannel is defined as aeron:udp?endpoint=hostnameB:40491|session-id=tag:100001 and the liveDestination same as subscription above for the ReplayMerge. However when the replaymerge starts it throws “must reference a network publication” when trying to look up the network publication by sessionid tag 100001.

When i moved the replicate from hostC to hostA, i was able to replayMerge perfectly from hostB.

It looks as though the media driver on hostC is unable to locate the publication
Jussi Virtanen
@jvirtanen

The media driver contains two hard-coded constants regarding network activity:

  • PENDING_SETUPS_TIMEOUT_NS (1 s): dictates how often a receiver sends out setup-eliciting Status Messages to the sender, periodic in Multi-Destination-Cast (MDC)
  • PUBLICATION_SETUP_TIMEOUT_NS (100 ms): dictates how often a sender sends out Setup Frames to receivers, periodic in MDC

Is there any background on how these specific numbers were selected or why they cannot be altered (unlike e.g. Receiver Status Message Timeout)?

Martin Thompson
@mjpt777
@feyyaz91 It seems like you are using quite an advanced setup. Without a test case we cannot help unless you are on a support contract.
@feyyaz91 As a simple bit of feedback. Tag referencing publications and subscriptions only works on the same driver.
Akhil Nagpal
@akhil_n_twitter
Hello - I am new to Aeron. We have a following use-case . Aeron media driver starts as a standalone app as User1 of Group1. Media driver client User 2 of Group 1 connects to this standalone app. The Aeron directory has user/group - read/write/execute permissions. We are getting few issues while accessing Aeron directories files. Caused by: java.nio.file.AccessDeniedException: at $AERON-DIR/cnc.dat (aeron = Aeron.connect(context)) . However if i give others as r/w access , I do not see these issues
Martin Thompson
@mjpt777
@jvirtanen This settings are based on approximate round trip times and rate limiting. We have the difficult balance to strike between making everything configurable setting some reasonable defaults that cover typical usage to keep the surface API manageable. By working with are customers on support contracts we ensure their needs are meet. Others without support have to hope they within a reasonable window. Competitive products to Aeron cost a lot more than what we change in support so it is great value.
Jussi Virtanen
@jvirtanen
@mjpt777 Alright, I understand. Thank you for the answer; we'll have a look at the support options.
Martin Thompson
@mjpt777
@akhil_n_twitter You need to ensure you are setting the correct umask to enable group permissions. Aeron clients need read and write access to the files.
Akhil Nagpal
@akhil_n_twitter
@mjpt777 - thank you very much.
rinkigoyal
@rinkigoyal
I am trying to use aeron archive functionality for reproducing a state for debugging purpose.If your subscriber is subscribing to two streams -X & Y, then process few -10 messages from X && then same from Y (if available), and publish new messages on new stream Z.
My requirement is to be able to reproduce a state of stream Z by replaying exact sequence of events from streamx & y.
i.e. for an instant t, state has been derived by consuming 200 messages from X && 100 messages from Y,
we can not reproduce exact state because exact sequence of message is not available if using default aeron archive solution.
What is the best way to handle these kind of scenarios?
Ethan
@ethanf
@rinkigoyal I'm not an authority here, but I can point you in the right direction. Messages are sequenced per channel/streamid/sessionid. So if X/Y are on different streams, then there is no sequencing between them. The only sequencing happens in your application as you poll the two subscriptions. I suggest re-publishing the messages from X, Y into a new stream XY, and have your application subscribe to that stream and have the archive record it so you can replay the exact sequence of messages. You could also have your application re-publish the messages in the order it sees as more of an audit that gets recorded.
Todd L. Montgomery
@tmontgomery
Should also note that a sequenced stream like that is what Cluster does. The leader acts as the re-publisher for the clients. And all members see the same sequence.
tedvash
@tedvash
Hi, in terms of features, is there something in Aeron resembling the lbm late joiner feature?
Michael Barker
@mikeb01
@tedvash The closest would be to use Aeron Archive and MergeReplay. I would have a look at those to see if they meet your needs. https://github.com/real-logic/aeron/wiki/Aeron-Archive
Ivan Zemlyanskiy
@QIvan

hi! (please don't treat me as an employer of my company only, this question is more about my own curiosity)
if I've got an archive and I need a connection to it, I can come up with code like this one:

        try (MediaDriver md = MediaDriver.launch();
             Archive archive = Archive.launch();
             AeronArchive aeronArchive =
                     AeronArchive.connect(new AeronArchive.Context().aeron(archive.context().aeron())))
        {

        }

but some times it throws this exception

    io.aeron.exceptions.TimeoutException: ERROR - Archive connect timeout: correlationId=12 step=4
        at io.aeron.archive.client.AeronArchive$AsyncConnect.checkDeadline(AeronArchive.java:3056)
        at io.aeron.archive.client.AeronArchive$AsyncConnect.poll(AeronArchive.java:2942)
        at io.aeron.archive.client.AeronArchive.connect(AeronArchive.java:214)

looks like that code smells, but I can't see why, could you help me to figure it out?
thanks!

Martin Thompson
@mjpt777
@QIvan That code looks fine. The machine might be starved or resource so does not complete in time.
Michael Barker
@mikeb01
@QIvan The problem seems to be due to reusing the Aeron client that was constructed internally by the archive as a client for for the AeronArchive. If you create a new Aeron client or use the client field already defined in the test then it works without error.