Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 18:01
    StephanDollberg synchronize #1127
  • 17:13
    StephanDollberg commented #1127
  • 16:35
    StephanDollberg synchronize #1127
  • 15:01
    StephanDollberg synchronize #1127
  • 14:44
    StephanDollberg opened #1127
  • Jan 23 15:44

    mjpt777 on master

    [Java] Increase cluster termina… (compare)

  • Jan 22 18:48
    mjpt777 closed #1126
  • Jan 22 18:48
    mjpt777 labeled #1126
  • Jan 22 16:23
    tmontgomery commented #1126
  • Jan 22 13:23
    ratanasov1 opened #1126
  • Jan 20 19:02

    mjpt777 on master

    Tidy up warnings and formatting. (compare)

  • Jan 20 16:07

    mjpt777 on master

    [C++] Better example of ReplayM… (compare)

  • Jan 20 14:39

    mjpt777 on master

    [C++] Extra debugging info for … (compare)

  • Jan 20 14:28

    mjpt777 on master

    [C++] Remove the need to split … (compare)

  • Jan 19 20:17

    mjpt777 on master

    [C++] Tidy up archive tests. (compare)

  • Jan 19 19:41

    mjpt777 on master

    [C++] Tidy up archive tests. (compare)

  • Jan 19 19:05

    mjpt777 on master

    [C++] Fix archive tests to run … (compare)

  • Jan 19 16:01

    mjpt777 on master

    [Java] Improve ArchiveTest in s… (compare)

  • Jan 19 14:16

    mjpt777 on master

    [Java] Avoid boxing of boolean … (compare)

  • Jan 19 14:07

    mjpt777 on master

    [Java] Naming. (compare)

Jussi Virtanen
@jvirtanen
@mjpt777 Alright, I understand. Thank you for the answer; we'll have a look at the support options.
Martin Thompson
@mjpt777
@akhil_n_twitter You need to ensure you are setting the correct umask to enable group permissions. Aeron clients need read and write access to the files.
Akhil Nagpal
@akhil_n_twitter
@mjpt777 - thank you very much.
rinkigoyal
@rinkigoyal
I am trying to use aeron archive functionality for reproducing a state for debugging purpose.If your subscriber is subscribing to two streams -X & Y, then process few -10 messages from X && then same from Y (if available), and publish new messages on new stream Z.
My requirement is to be able to reproduce a state of stream Z by replaying exact sequence of events from streamx & y.
i.e. for an instant t, state has been derived by consuming 200 messages from X && 100 messages from Y,
we can not reproduce exact state because exact sequence of message is not available if using default aeron archive solution.
What is the best way to handle these kind of scenarios?
Ethan
@ethanf
@rinkigoyal I'm not an authority here, but I can point you in the right direction. Messages are sequenced per channel/streamid/sessionid. So if X/Y are on different streams, then there is no sequencing between them. The only sequencing happens in your application as you poll the two subscriptions. I suggest re-publishing the messages from X, Y into a new stream XY, and have your application subscribe to that stream and have the archive record it so you can replay the exact sequence of messages. You could also have your application re-publish the messages in the order it sees as more of an audit that gets recorded.
Todd L. Montgomery
@tmontgomery
Should also note that a sequenced stream like that is what Cluster does. The leader acts as the re-publisher for the clients. And all members see the same sequence.
tedvash
@tedvash
Hi, in terms of features, is there something in Aeron resembling the lbm late joiner feature?
Michael Barker
@mikeb01
@tedvash The closest would be to use Aeron Archive and MergeReplay. I would have a look at those to see if they meet your needs. https://github.com/real-logic/aeron/wiki/Aeron-Archive
Ivan Zemlyanskiy
@QIvan

hi! (please don't treat me as an employer of my company only, this question is more about my own curiosity)
if I've got an archive and I need a connection to it, I can come up with code like this one:

        try (MediaDriver md = MediaDriver.launch();
             Archive archive = Archive.launch();
             AeronArchive aeronArchive =
                     AeronArchive.connect(new AeronArchive.Context().aeron(archive.context().aeron())))
        {

        }

but some times it throws this exception

    io.aeron.exceptions.TimeoutException: ERROR - Archive connect timeout: correlationId=12 step=4
        at io.aeron.archive.client.AeronArchive$AsyncConnect.checkDeadline(AeronArchive.java:3056)
        at io.aeron.archive.client.AeronArchive$AsyncConnect.poll(AeronArchive.java:2942)
        at io.aeron.archive.client.AeronArchive.connect(AeronArchive.java:214)

looks like that code smells, but I can't see why, could you help me to figure it out?
thanks!

Martin Thompson
@mjpt777
@QIvan That code looks fine. The machine might be starved or resource so does not complete in time.
Michael Barker
@mikeb01
@QIvan The problem seems to be due to reusing the Aeron client that was constructed internally by the archive as a client for for the AeronArchive. If you create a new Aeron client or use the client field already defined in the test then it works without error.
Martin Thompson
@mjpt777
@mikeb01 good spot. The Aeron client used in the Archive is single threaded with no lock and using the invoker.
Michael Barker
@mikeb01
Best not to reuse the Archive's Aeron client if the Archive 'owns' the client instance.
Locally for me it gets stuck trying to add the subscription because nothing is invoking the client's conductor to service the admin requests.
Ivan Zemlyanskiy
@QIvan
@mikeb01 thanks for the explanation!
but I'm still not quite sure how it works. I run Archive in DEDICATED mode, there should be a thread in the Archive (ArchiveConductor) which owns an Aeron instance and keeps calling invoke() method, right?
why if I create a subscription with that Aeron instance why it ends up badly? thanks

Best not to reuse the Archive's Aeron client if the Archive 'owns' the client instance.

worth adding to the Archive javadoc, what do you think?

Michael Barker
@mikeb01
If you don't supply an Aeron client to the original Archive, it constructs one internally. The internally constructed client runs in invoker mode, so when connecting the AeronArchive client call addSubscription which blocks.
If you want to share an Aeron client between the Archive and AeronArchive I would construct it externally so that you know which threading mode it is using and pass it into both.
Yes, we should document that.
I'm also looking at whether there is some up front checking we can do, e.g. in AeronArchive check to see if the supplied client is running in invoker mode and throw immediately.
Ivan Zemlyanskiy
@QIvan

so when connecting the AeronArchive client call addSubscription which blocks.

sorry, maybe I'm dumb today. But ok, we block the current thread where we call addSubscription, but there is another one, the Archive conductor, which calls invoke for the Aeron instance

am I missing something?
Ivan Zemlyanskiy
@QIvan
Michael Barker
@mikeb01
supplied.
It's that the internally constructed Aeron client has it's lock mode set to NoOp, so it can only be safely access by one thread. So sharing it will result in undefined behaviour. https://github.com/real-logic/aeron/blob/master/aeron-archive/src/main/java/io/aeron/archive/Archive.java#L848
So best practise, if you want to share the Aeron client amongst multiple services, construct it up front with the appropriate configuration and pass it to the various services via their context.
Ivan Zemlyanskiy
@QIvan
ah, I see, ok, so it's more about concurrent access to the Aeron instance
thank you very much!
Michael Barker
@mikeb01
No problem.
Ivan Zemlyanskiy
@QIvan

oh, yes, and the last thing I'd like to mention:
it's confusing when Aeron says

io.aeron.exceptions.DriverTimeoutException: FATAL - no response from MediaDriver within (ns): 10000000000

in a case when the MediaDriver kicks off our Aeron connection. Can we make a difference in such cases?
thanks

Michael Barker
@mikeb01
I would have to look into the specific cases to find out.
Ivan Zemlyanskiy
@QIvan
like in my last PR https://github.com/real-logic/aeron/pull/1085/checks?check_run_id=1350771446
my understanding we didn't update our timestamp because of the concurrent issue, correct?
Michael Barker
@mikeb01
Those cases are difficult as it is a concurrency problem you can be sure what the failure will be. I got a different failure with your test. I have some ideas about some up front checks that could be done to prevent issues, but will need to find some time to look into them.
tedvash
@tedvash
Any plans to support linux pipes?
Todd L. Montgomery
@tmontgomery
@tedvash not really. But, if someone wanted to do the equivalent of "nc" for Aeron, that would be cool.
Ivan Zemlyanskiy
@QIvan
@tmontgomery nc for Aeron would be very awesome! But how do you see it? Like a regular subscription? Or do you envision something more intelligent, like read from the logbuffer directly?
Todd L. Montgomery
@tmontgomery
reading from the log buffer directly would not be that useful
Chris Anderson
@injinj
A note on linking libaeron_static.a: gcc -o basic_sub basic_sub.o sample_util.o -rdynamic -laeron_static -ld
l -lm -pthread -lrt
Without the -rdynamic, the dlsym() function cannot find aeron_idle_strategy_sleeping
I can't find a way to export that symbol with attribute visibility on the definition
David Raymond
@pieceofchum

Hello I am trying to get the Aeron Cluster Basic Auction Cluster Application to run on 4 AWS EC2 instances.

Three of the instances are node0, node1, and node2 running the cluster members and one EC2 instance is running the Aeron Basic Auction Cluster Client. The cluster nodes (node0, node1, node2) come up and the RAFT seems to work if I kill the leader then there is an election and a new leader is chosen.

However the client just throws the following exception: connect timeout, step=3 egress.isConnected=false

io.aeron.exceptions.TimeoutException: connect timeout, step=3 egress.isConnected=false
at io.aeron.cluster.client.AeronCluster$AsyncConnect.checkDeadline(AeronCluster.java:1542)
at io.aeron.cluster.client.AeronCluster$AsyncConnect.poll(AeronCluster.java:1498)
at io.aeron.cluster.client.AeronCluster.connect(AeronCluster.java:119)
at org.aeron.simple.BasicAuctionClusterClient.main(BasicAuctionClusterClient.java:205)

I did replace the Consensus Module Context Cluster Members with the IP addresses of the cluster members in the Basic Auction Clustered Service Node code. For the Basic Auction Cluster Client I pass in the IP addresses of the cluster members to the ingressEndPoints whcih creates the ingressEndPoints to pass in to the Aeron Cluster. The UDP ports are open in the Security Group for the EC2 instances.

I am not sure what else I need to do or if there are known issues trying the get that sample working in AWS. Has anyone been able to get that sample working in AWS that could maybe help me.

Thanks

Martin Thompson
@mjpt777
@pieceofchum We only help those using Cluster on a support contract.
David Raymond
@pieceofchum
Ok np thanks for the quick response.
Ivan Zemlyanskiy
@QIvan
hi! could you provide more details why is it so important to provide own errorCounter if we provide own Aeron instance for Archive?
I think it'd be better with a comment over there
thanks!
Ivan Zemlyanskiy
@QIvan

hi once again... I hope I'm not bothering you too much with my messages
I notice that the approach to provide to RecordingSignalAdapter instance the subscription from aeronArchive.controlResponsePoller().subscription(); is kind of "asking for troubles" if you use that instance of aeronArchive for any requests/response (like listRecordings, stopPosition, etc)
for example here https://github.com/real-logic/aeron/blob/932717568bc3906cdadfa48d4173c0f7adffb50a/aeron-system-tests/src/test/java/io/aeron/archive/ReplicateRecordingTest.java#L569

because aeronArchive polls exactly the same subscription for responses, so if you've got both instances you can lose either a response or a signal

it's hard to reproduce such behavior but it's still possible even you don't use different threads for RecordingSignalAdapter and sending requests for aeronArchive