Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 15 20:06

    mjpt777 on master

    [Java] Upgrade to SBE 1.21.0 (compare)

  • Jan 14 18:00
    tmontgomery commented #1124
  • Jan 14 17:27
    mjpt777 commented #1124
  • Jan 14 17:26
    mjpt777 commented #1124
  • Jan 14 17:21
    vyazelenko commented #1124
  • Jan 14 16:30
    tmontgomery commented #1124
  • Jan 14 16:19
    danny-pav commented #1121
  • Jan 14 15:25

    vyazelenko on master

    [Doc] Update the list of Java v… (compare)

  • Jan 14 11:42

    mjpt777 on master

    Warnings clean up. (compare)

  • Jan 14 11:10
    vyazelenko opened #1124
  • Jan 14 11:03

    mjpt777 on master

    [Java] Warnings and typos clean… (compare)

  • Jan 14 10:40
    sshanks-kx opened #1123
  • Jan 14 10:23

    mjpt777 on master

    [Java] Fix warnings. (compare)

  • Jan 14 10:10

    mjpt777 on master

    [C++] Fix formatting and warnin… (compare)

  • Jan 14 09:52

    mjpt777 on master

    [C] Implementation for #1115, a… (compare)

  • Jan 14 09:52
    mjpt777 closed #1118
  • Jan 14 07:58

    mjpt777 on master

    [Java] Increase client session … (compare)

  • Jan 14 07:56

    mjpt777 on master

    [Java] Stop sending second fram… (compare)

  • Jan 14 07:56
    mjpt777 closed #1122
  • Jan 14 06:35
    mikeb01 commented #1121
feyyaz91
@feyyaz91

@mjpt777 Apologies, a bit difficult to provide the test case but can provide a few more details: Publication is on channel aeron:udp?tags=1,100001|control=hostnameA:40456|control-mode=dynamic. Subscription on host B is specified as aeron:udp?endpoint=hostnameB:49101|control=hostnameA:40456
Replication is started on hostC using same using liveDestination same as subscription defined above and connecting to the source archive on hostA using srcControlChannel=aeron:udp?endpoint=hostnameA:40461.

The replication is started successfully and i can see a replicate recording on hostC.
When i start hostB, the first thing it does is try to replay the recording from hostC and then merge into the live stream from hostA. The replaySubscriberChannel is defined as aeron:udp?endpoint=hostnameB:40491|session-id=tag:100001 and the liveDestination same as subscription above for the ReplayMerge. However when the replaymerge starts it throws “must reference a network publication” when trying to look up the network publication by sessionid tag 100001.

When i moved the replicate from hostC to hostA, i was able to replayMerge perfectly from hostB.

It looks as though the media driver on hostC is unable to locate the publication
Jussi Virtanen
@jvirtanen

The media driver contains two hard-coded constants regarding network activity:

  • PENDING_SETUPS_TIMEOUT_NS (1 s): dictates how often a receiver sends out setup-eliciting Status Messages to the sender, periodic in Multi-Destination-Cast (MDC)
  • PUBLICATION_SETUP_TIMEOUT_NS (100 ms): dictates how often a sender sends out Setup Frames to receivers, periodic in MDC

Is there any background on how these specific numbers were selected or why they cannot be altered (unlike e.g. Receiver Status Message Timeout)?

Martin Thompson
@mjpt777
@feyyaz91 It seems like you are using quite an advanced setup. Without a test case we cannot help unless you are on a support contract.
@feyyaz91 As a simple bit of feedback. Tag referencing publications and subscriptions only works on the same driver.
Akhil Nagpal
@akhil_n_twitter
Hello - I am new to Aeron. We have a following use-case . Aeron media driver starts as a standalone app as User1 of Group1. Media driver client User 2 of Group 1 connects to this standalone app. The Aeron directory has user/group - read/write/execute permissions. We are getting few issues while accessing Aeron directories files. Caused by: java.nio.file.AccessDeniedException: at $AERON-DIR/cnc.dat (aeron = Aeron.connect(context)) . However if i give others as r/w access , I do not see these issues
Martin Thompson
@mjpt777
@jvirtanen This settings are based on approximate round trip times and rate limiting. We have the difficult balance to strike between making everything configurable setting some reasonable defaults that cover typical usage to keep the surface API manageable. By working with are customers on support contracts we ensure their needs are meet. Others without support have to hope they within a reasonable window. Competitive products to Aeron cost a lot more than what we change in support so it is great value.
Jussi Virtanen
@jvirtanen
@mjpt777 Alright, I understand. Thank you for the answer; we'll have a look at the support options.
Martin Thompson
@mjpt777
@akhil_n_twitter You need to ensure you are setting the correct umask to enable group permissions. Aeron clients need read and write access to the files.
Akhil Nagpal
@akhil_n_twitter
@mjpt777 - thank you very much.
rinkigoyal
@rinkigoyal
I am trying to use aeron archive functionality for reproducing a state for debugging purpose.If your subscriber is subscribing to two streams -X & Y, then process few -10 messages from X && then same from Y (if available), and publish new messages on new stream Z.
My requirement is to be able to reproduce a state of stream Z by replaying exact sequence of events from streamx & y.
i.e. for an instant t, state has been derived by consuming 200 messages from X && 100 messages from Y,
we can not reproduce exact state because exact sequence of message is not available if using default aeron archive solution.
What is the best way to handle these kind of scenarios?
Ethan
@ethanf
@rinkigoyal I'm not an authority here, but I can point you in the right direction. Messages are sequenced per channel/streamid/sessionid. So if X/Y are on different streams, then there is no sequencing between them. The only sequencing happens in your application as you poll the two subscriptions. I suggest re-publishing the messages from X, Y into a new stream XY, and have your application subscribe to that stream and have the archive record it so you can replay the exact sequence of messages. You could also have your application re-publish the messages in the order it sees as more of an audit that gets recorded.
Todd L. Montgomery
@tmontgomery
Should also note that a sequenced stream like that is what Cluster does. The leader acts as the re-publisher for the clients. And all members see the same sequence.
tedvash
@tedvash
Hi, in terms of features, is there something in Aeron resembling the lbm late joiner feature?
Michael Barker
@mikeb01
@tedvash The closest would be to use Aeron Archive and MergeReplay. I would have a look at those to see if they meet your needs. https://github.com/real-logic/aeron/wiki/Aeron-Archive
Ivan Zemlyanskiy
@QIvan

hi! (please don't treat me as an employer of my company only, this question is more about my own curiosity)
if I've got an archive and I need a connection to it, I can come up with code like this one:

        try (MediaDriver md = MediaDriver.launch();
             Archive archive = Archive.launch();
             AeronArchive aeronArchive =
                     AeronArchive.connect(new AeronArchive.Context().aeron(archive.context().aeron())))
        {

        }

but some times it throws this exception

    io.aeron.exceptions.TimeoutException: ERROR - Archive connect timeout: correlationId=12 step=4
        at io.aeron.archive.client.AeronArchive$AsyncConnect.checkDeadline(AeronArchive.java:3056)
        at io.aeron.archive.client.AeronArchive$AsyncConnect.poll(AeronArchive.java:2942)
        at io.aeron.archive.client.AeronArchive.connect(AeronArchive.java:214)

looks like that code smells, but I can't see why, could you help me to figure it out?
thanks!

Martin Thompson
@mjpt777
@QIvan That code looks fine. The machine might be starved or resource so does not complete in time.
Michael Barker
@mikeb01
@QIvan The problem seems to be due to reusing the Aeron client that was constructed internally by the archive as a client for for the AeronArchive. If you create a new Aeron client or use the client field already defined in the test then it works without error.
Martin Thompson
@mjpt777
@mikeb01 good spot. The Aeron client used in the Archive is single threaded with no lock and using the invoker.
Michael Barker
@mikeb01
Best not to reuse the Archive's Aeron client if the Archive 'owns' the client instance.
Locally for me it gets stuck trying to add the subscription because nothing is invoking the client's conductor to service the admin requests.
Ivan Zemlyanskiy
@QIvan
@mikeb01 thanks for the explanation!
but I'm still not quite sure how it works. I run Archive in DEDICATED mode, there should be a thread in the Archive (ArchiveConductor) which owns an Aeron instance and keeps calling invoke() method, right?
why if I create a subscription with that Aeron instance why it ends up badly? thanks

Best not to reuse the Archive's Aeron client if the Archive 'owns' the client instance.

worth adding to the Archive javadoc, what do you think?

Michael Barker
@mikeb01
If you don't supply an Aeron client to the original Archive, it constructs one internally. The internally constructed client runs in invoker mode, so when connecting the AeronArchive client call addSubscription which blocks.
If you want to share an Aeron client between the Archive and AeronArchive I would construct it externally so that you know which threading mode it is using and pass it into both.
Yes, we should document that.
I'm also looking at whether there is some up front checking we can do, e.g. in AeronArchive check to see if the supplied client is running in invoker mode and throw immediately.
Ivan Zemlyanskiy
@QIvan

so when connecting the AeronArchive client call addSubscription which blocks.

sorry, maybe I'm dumb today. But ok, we block the current thread where we call addSubscription, but there is another one, the Archive conductor, which calls invoke for the Aeron instance

am I missing something?
Ivan Zemlyanskiy
@QIvan
Michael Barker
@mikeb01
supplied.
It's that the internally constructed Aeron client has it's lock mode set to NoOp, so it can only be safely access by one thread. So sharing it will result in undefined behaviour. https://github.com/real-logic/aeron/blob/master/aeron-archive/src/main/java/io/aeron/archive/Archive.java#L848
So best practise, if you want to share the Aeron client amongst multiple services, construct it up front with the appropriate configuration and pass it to the various services via their context.
Ivan Zemlyanskiy
@QIvan
ah, I see, ok, so it's more about concurrent access to the Aeron instance
thank you very much!
Michael Barker
@mikeb01
No problem.
Ivan Zemlyanskiy
@QIvan

oh, yes, and the last thing I'd like to mention:
it's confusing when Aeron says

io.aeron.exceptions.DriverTimeoutException: FATAL - no response from MediaDriver within (ns): 10000000000

in a case when the MediaDriver kicks off our Aeron connection. Can we make a difference in such cases?
thanks

Michael Barker
@mikeb01
I would have to look into the specific cases to find out.
Ivan Zemlyanskiy
@QIvan
like in my last PR https://github.com/real-logic/aeron/pull/1085/checks?check_run_id=1350771446
my understanding we didn't update our timestamp because of the concurrent issue, correct?
Michael Barker
@mikeb01
Those cases are difficult as it is a concurrency problem you can be sure what the failure will be. I got a different failure with your test. I have some ideas about some up front checks that could be done to prevent issues, but will need to find some time to look into them.
tedvash
@tedvash
Any plans to support linux pipes?
Todd L. Montgomery
@tmontgomery
@tedvash not really. But, if someone wanted to do the equivalent of "nc" for Aeron, that would be cool.
Ivan Zemlyanskiy
@QIvan
@tmontgomery nc for Aeron would be very awesome! But how do you see it? Like a regular subscription? Or do you envision something more intelligent, like read from the logbuffer directly?
Todd L. Montgomery
@tmontgomery
reading from the log buffer directly would not be that useful
Chris Anderson
@injinj
A note on linking libaeron_static.a: gcc -o basic_sub basic_sub.o sample_util.o -rdynamic -laeron_static -ld
l -lm -pthread -lrt
Without the -rdynamic, the dlsym() function cannot find aeron_idle_strategy_sleeping
I can't find a way to export that symbol with attribute visibility on the definition
David Raymond
@pieceofchum

Hello I am trying to get the Aeron Cluster Basic Auction Cluster Application to run on 4 AWS EC2 instances.

Three of the instances are node0, node1, and node2 running the cluster members and one EC2 instance is running the Aeron Basic Auction Cluster Client. The cluster nodes (node0, node1, node2) come up and the RAFT seems to work if I kill the leader then there is an election and a new leader is chosen.

However the client just throws the following exception: connect timeout, step=3 egress.isConnected=false

io.aeron.exceptions.TimeoutException: connect timeout, step=3 egress.isConnected=false
at io.aeron.cluster.client.AeronCluster$AsyncConnect.checkDeadline(AeronCluster.java:1542)
at io.aeron.cluster.client.AeronCluster$AsyncConnect.poll(AeronCluster.java:1498)
at io.aeron.cluster.client.AeronCluster.connect(AeronCluster.java:119)
at org.aeron.simple.BasicAuctionClusterClient.main(BasicAuctionClusterClient.java:205)

I did replace the Consensus Module Context Cluster Members with the IP addresses of the cluster members in the Basic Auction Clustered Service Node code. For the Basic Auction Cluster Client I pass in the IP addresses of the cluster members to the ingressEndPoints whcih creates the ingressEndPoints to pass in to the Aeron Cluster. The UDP ports are open in the Security Group for the EC2 instances.

I am not sure what else I need to do or if there are known issues trying the get that sample working in AWS. Has anyone been able to get that sample working in AWS that could maybe help me.

Thanks

Martin Thompson
@mjpt777
@pieceofchum We only help those using Cluster on a support contract.