by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Craig Patrick
    @cpats007
    Hi all - anyone know the best way to reduce latency in PHP talking to Kafka? I'm currently running at around 30ms for just the process of writing data to Kafka - seems like a long time to me?
    Magnus Edenhill
    @edenhill
    @cpats007 use librdkafka 0.9.4 and set queue.buffering.max.ms to your liking
    Craig Patrick
    @cpats007

    thanks @edenhill - I've already set it to 1ms, but just going to upgrade to version 0.9.4 and see if that helps. Also, the way I was calling poll was with the following:

    $this->getProducer()->poll(-1);

    which from what I can gather waits until it gets a response. I tried changing to a while loop of $producer->poll(10); etc but didn't see any difference - so are there any implications in using my original methodology of $this->getProducer()->poll(-1); or any reasons not to do it that way etc?

    Magnus Edenhill
    @edenhill
    produce(); poll(-1); will not necessarily serve the delivery report of the message produced by that specific call to produce(). It may return sooner than that because of other callbacks (stats, error ,.etc)
    Craig Patrick
    @cpats007

    ah I see, so it’s best to use the loop method?

    also, I’ve just upgraded to 0.9.4 - ran a few quick tests and still getting around 30ms for Kafka

    my configs are:
    $conf->set('client.id', 'track');
    $conf->set('socket.blocking.max.ms', 1);
    $conf->set('queue.buffering.max.ms', 1);
    $conf->set('batch.num.messages', 1);
    Magnus Edenhill
    @edenhill
    Maybe thats the network + broker latency? If you enable stats and check the per-broker average rtt (see https://github.com/edenhill/librdkafka/wiki/Statistics), or enable debug=protocol and look for rtt printouts of the ProduceRequests you will get an idea of that latency
    I also suggest you not to use batch.num.messages=1, there is really no point in forcing single message batches, it wil only (potentially) slow things down. The batching is already limited by queue.buffering.max.ms
    Craig Patrick
    @cpats007

    okay, thanks for that recommendation i’ll change it

    how would I go about enabling stats in PHP?

    Magnus Edenhill
    @edenhill
    not sure as I havent used that client myself, but look for something resembling, stats_cb, stats, statistics, ..
    if that's not implemented, go with the debug=protocol approach
    Craig Patrick
    @cpats007

    okay, appreciate that - thanks

    how do I set the debug? ;)

    Magnus Edenhill
    @edenhill
    conf..set('debug', 'protocol');
    Craig Patrick
    @cpats007
    thanks @edenhill - appreciate the help
    does that output to anywhere in particular?
    Magnus Edenhill
    @edenhill
    stderr by default
    Craig Patrick
    @cpats007
    thanks
    Craig Patrick
    @cpats007

    I can see a couple of these curently in my logs:

    Unable to write data to Kafka: Local: Message timed out

    but this is before doing the changes above
    Magnus Edenhill
    @edenhill
    is that from your delivery report callback?
    Craig Patrick
    @cpats007
    yeah
    I have no idea how to get statistics in the PHP client lol, and nothing in the error logs from FPM or anything
    Magnus Edenhill
    @edenhill
    debug=protocol is easier
    Craig Patrick
    @cpats007
    I’ve got $conf->set('debug', 'protocol’); but again, no logs and unable to get the stderr output - unless I’m missing something?
    Magnus Edenhill
    @edenhill
    should just print to the terminal
    unless you are running it from some web server thingie, in that case you
    will probably need to do something else. Im not a php man so I wouldnt know
    Craig Patrick
    @cpats007
    yeah this is a web based application, so is running through php-fpm - I’ve no idea how to do it lol :D

    I can’t even set a stats_cb as a config value as it throws the following error:

    RdKafka\Exception: Property "stats_cb" must be set through dedicated .._set_..() function

    Craig Patrick
    @cpats007
    Anyone able to explain how the Kafka batch mesdsaging works in an FPM environment? ie: does the librdkafka batch messages or are they batched per application? Let’s say I have 100 fpm processes running each producing messages etc, if they are set to batch will they batch on a library level or per application / worker?
    Craig Patrick
    @cpats007
    also, does it create an asynchronous producer by default?
    Paul Dragoonis
    @dragoonis
    Hey @edenhill that's me getting closer to the delivery of this big data migration project :)
    Prod is using docker with AlpineOS - I assume there isn't any alpine packages for your C lib ? :)
    Paul Dragoonis
    @dragoonis
    I've put this together
    # Kafka Installation
    RUN apk update && apk add make g++ python autoconf php5-dev && \
        # librdkafka
        cd /tmp && git clone --depth 1 --branch v0.9.4 https://github.com/edenhill/librdkafka.git && \
        cd /tmp/librdkafka && ./configure && make && make install && rm -rf /tmp/librdkafka && \
    
        # php-rdkafka
        cd /tmp && git clone --depth 1 --branch 3.0.1 https://github.com/arnaud-lb/php-rdkafka.git && \
        cd /tmp/php-rdkafka && phpize && ./configure && make all -j 5 && make install && \
        echo "extension=rdkafka.so" > /etc/php5/conf.d/rdkafka.ini && \
        rm -rf /tmp/php-rdkafka
    Magnus Edenhill
    @edenhill
    you can probably use 'set -e' at the start of that thing to avoid all the &&
    Paul Dragoonis
    @dragoonis
    it's a Dockerfile, not a bash script
    If I moved it to a bash script, then indeed I could use -e :) but i'm lazy
    Craig Patrick
    @cpats007
    hey @edenhill - do you know how librdkafka is handled in an FPM environment? ie: does it instantiate the librarye for each “worker” (handling up to x thouhsand connections using the same instantiation) or does every process fire up the library? Also, does it handle queuing per library, or per application - ie: if I set queuing to 1000 before sending is that 1000 per application or 1000 stored in the lib before they are sent (or somewhere locally?) trying to optimise performance and resources
    Magnus Edenhill
    @edenhill
    I have no idea, I don't use PHP. Since there is a startup and termination cost to each kafka client instance I would recommend reusing them as far as possible.
    Craig Patrick
    @cpats007
    yeah that’s what I’m hoping to do - in a “normal” FPM set up, it’s common to have low workers to handle a high amount of requests to avoid that issue - I just wondered how it worked with librdkafka :) thanks for responding
    Craig Patrick
    @cpats007
    different note @edenhill - using the rdkafka_performance function in the examples directory, how / where do I define the hosts / brokers to send the messages to?
    Magnus Edenhill
    @edenhill
    -b <broker1,broker2>
    try rdkafka_performance -h for help
    Craig Patrick
    @cpats007
    perfect, thank you
    ahhh I was checking the source - i’ll go for that superb - thanks
    Magnus Edenhill
    @edenhill
    Make sure to buckle up before you run it!
    Craig Patrick
    @cpats007
    :D
    Magnus Edenhill
    @edenhill
    (that will inevitably come back and bite me)
    Craig Patrick
    @cpats007
    lol