Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Nestor Marin
@nemacrux
Sure it's the best option, thanks. Right now I'm having an issue with the zmq bridge, for instance I have an app with an UI in Java (its a publisher client), and this app let me specify the transport/middleware/topic and start the client. It also lets to stop and restart the client any moment and even with different values for those parameters. Now, using zmq bridge, when I click the stop button, it not only stops the client but also makes the JVM to crash (and the whole app crashes). It seems that the crash is due to a null pointer in the JNI layer but I'm not sure. It was tested with the qpid bridge and its ok, you can start/stop the connection whenever you want (and using any transport) but when you use zmq as middleware and stops the client then happens what I'm reporting. It is happening in both, windows and ubuntu. This is the environment: Windows 10/java 1.8 x86/OpenMama 2.4.0 x86/zmq bridge x86/ and Ubuntu 14.04/java 1.8/OpenMama 2.4.0/zmq. In both cases I'm using the forwarder (which act as a broker) given by ZMQ in the link you posted in your blog, but I don't think this has something to do since it also crashes using simple pub/sub. Do you have any idea? (if you need more information, please let me know)
fquinner
@fquinner
Hmm - this is an interesting one - does the crash happen as soon as you click the "stop" button or does it only crash after you press "start" again? What MAMA functions do the stop / start buttons call?
Nestor Marin
@nemacrux
It happens as soon as you click the stop button. This button calls a stop method which starts a new thread to shutdown the client. This code is pretty much the same in MamaProxy.java (shutdownMama() method) example with the difference that this one its only a publisher client. The start button calls a start function also similar to the one in MamaProxy.Java (start() method). If you want I can share the code
fquinner
@fquinner
So it's only the publishing application that crashes when you attempt to stop it?
Nestor Marin
@nemacrux
I'm only observed this issue on the publishing application since its the only one what can start/stop an openmama connection without closing itself
fquinner
@fquinner
when you do this though the subscribing applications stay up?
Nestor Marin
@nemacrux
Yes, they do, so when the publisher crashes I can start it again and publish some new messages (orders) which are delivered without problem to the subscribing applications (those ones haven't stopped)
fquinner
@fquinner
Interesting and has data flow stopped at this point? You know what the last function you call before the crash is? e.g. stop / transport destroy?
Nestor Marin
@nemacrux
I'm not sure, the stop button calls a shutdown() function and this call in first place the publisher.destroy() (which is a MamaPublisher), then Mama.stop(bridge), then it waits till Mama.start(bridge) is stopped, then it calls transport.destroy() , then Mama.Close() and finally the thread which started the connection is stopped; If I put a log message before each one of those calls I can see that all of them are called, but can see which one produced the issue. For the first question, yes, the data flow is stopped before stop the connection since the data flow is manually produced by clicking a "send order" button (this app is like a simple trader desktop app).
fquinner
@fquinner
Hi Nestor, nothing obvious comes to mind though most of the shutdown testing stuff is done on the subscribing side rather than the publisher side and we do have publisher events in play now, can you raise an issue on the github project with your recreation code and a stack dump from the crash?
Nestor Marin
@nemacrux
Hi frank, I have created a java project with the necessary code to recreate the issue and pushed it to github. I'm going to raise the issue and provide the link to the java project. Its ok?
fquinner
@fquinner
That is perfect thanks Nestor I'll have a look
Nestor Marin
@nemacrux
@fquinner, I have just created the issue and pushed the app to github, you can find it here: https://github.com/macrux/publisherapp, thanks for your support
fquinner
@fquinner
Wow thanks nestor that's quite the write up thanks - I'll see what I can do when I get to a machine
fquinner
@fquinner
OK Nestor I can recreate here - problem certainly does not appear to be obvious... will keep digging
Nestor Marin
@nemacrux
Yes, it is quite strange and deep since the problem I think occurs in the JNI layer. Again, thanks for your continuous support
fquinner
@fquinner
What a nightmare this issue is turning out to be. Turns out I can work around the issue by simply not dlclosing the library in mama.c ... a crash there would usually mean there is a thread or event about to come back which is going to attempt to access a function which is defined in the shared object but I can't see where that would come from... or why qpid is immune because its bridge's threading model is pretty much identical to zmq
Damian Maguire
@dmagOM
Mmmm, that's pretty interesting.
Is there a core file or anything?
fquinner
@fquinner
Yip though it didn't really provide much of use for some reason I'm finding it hard to get gdb to force-load the zmq lib keeps ignoring it for some reason
Nestor Marin
@nemacrux
using the default queue rather than a queue from group's queue seems to be a good solution, thanks
fquinner
@fquinner
OK... I think this is an application issue the non-default queue is being left behind
Default queue is only a workaround I wouldn't recommend it :)
I can fix in application by adding a stopDispatch call and queue destroyTimedWait before the stop... I have no idea why qpid works - it really shouldn't which is what completely threw me
Nestor Marin
@nemacrux
About using the default queue for now (and for me) it's ok just to get out of trouble... has it any additional consideration you think I should take into account?
fquinner
@fquinner
The default queue is used for some mama internal stuff so saturating it might impact things like timer precision under stress but realistically it's fine, just not a great habit to get used to, plus it typically limits event processing to a single thread (as opposed to queue groups of N threads)
Nestor Marin
@nemacrux
Well that last part about processing in a single thread its interesting since I'm building an OMS so the events are going to be orders and order updates, I would like those ones to be processed as fast as it can :worried:
fquinner
@fquinner
Ha, well its one of those things you can change when you need to its pretty easy to retro-fit
Nestor Marin
@nemacrux
Sure, anyway using the default queue is going to be the option for now, thanks

About this :

I can fix in application by adding a stopDispatch call and queue destroyTimedWait before the stop...

could I do this in the shutdown method (I mean, before call the stop method for the publisher) or I misunderstood something?
fquinner
@fquinner
Yup that's where to do it - I had some rough code that prevented the crash there but I'm afk now - I'll clean up and update ticket with details tomorrow
Nestor Marin
@nemacrux
Ok, I going to work on this :+1:
fquinner
@fquinner
Have fun :smile:
Nestor Marin
@nemacrux
Hi @fquinner, I added just two lines before the starting the shutdown thread: queueGroup.stopDispatch(); queueGroup.destroyTimedWait(1000); and it worked fine with the queue group's queue
fquinner
@fquinner
Yeah that's the one glad to see it worked for you too!
fquinner
@fquinner
This message was deleted
darogross
@darogross
Does anyone know if I can compile the ZeroMQ Bridge for OSX so that I can use the python zeromq module to send messages from my application through the Bridge? (I am new to ZeroMQ Bridge and am not certain of the architecture, but I am hoping to keep everything python for portability and speed of development.) Thanks.
fquinner
@fquinner
@darogross Funny I have actually been thinking of that recently. The issue with using python from zeromq is that zeromq itself doesn't have a binary format for the payload so you need a library which will understand that format in python. Now there are qpid bindings for python which might work if you send a qpid payload over a zeromq link, but personally I think a json payload format would be ideal for stuff like this. Are you planning on pub / sub / both with python?