Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Ahmed Abdul Hamid
    @ahmedahamid

    oh, here it is:

    Within an uncommitted transaction, all updates (UPDATE, DELETE, or INSERT) that change transactional tables such as InnoDB tables are cached until a COMMIT statement is received by the server. At that point, mysqld writes the entire transaction to the binary log before the COMMIT is executed.

    Modifications to nontransactional tables cannot be rolled back. If a transaction that is rolled back includes modifications to nontransactional tables, the entire transaction is logged with a ROLLBACK statement at the end to ensure that the modifications to those tables are replicated.

    blafasel4
    @blafasel4
    Thanks, never read that actually, Im not 100% what that means ATM will dig into it a little deeper
    Ahmed Abdul Hamid
    @ahmedahamid
    of course
    please keep me posted
    kg1911
    @kg1911

    Hi, has some tried or know how to execute the reassign-partition tool on the __consumer_offsets topic in order to scale the partitions to new hosts? all of that while data is being written to this topic

    From my test, i have producer error when somehow the partition where data is produced is being migrated.

    Ex: Got error produce response with correlation id 13 on topic-partition test__consumer_offset-9, retrying (2 attempts left). Error: NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)

    Ran Tavory
    @rantav
    Question: How may I configure brooklin to mirror a topic from one cluster to another cluster, using a different topic name?. For example I'd like to replicate topic0 => topic0-aggregate. It sounds like an easy one but I haven't figured out how to do that..
    Ran Tavory
    @rantav
    And another question: I am replicating kafka topic from cluster source to cluster destination; I noticed that in a failure condition, such as a kafka broker in the destination cluster crashes, brooklin workers enter an error storm for about 5 minutes until they recover. During that error storm no data is being replicated and there's a log of errors in the logs (KafkaException: Producer is closed forcefully.) and metrics that indicate the errors. Is this expected? Is there a way to mitigate such error conditions (e.g. help brooklin recover faster)
    Ran Tavory
    @rantav
    And (yet) another question. I'm replicating a remote kafka cluster (latency>50ms) and probably as a result I'm seeing a lot of messages like INFO [Consumer clientId=consumer-1, groupId=mirror] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 0: org.apache.kafka.common.errors.DisconnectException. (org.apache.kafka.clients.FetchSessionHandler). Although these message are INFO and I assume there's a retry etc, I suspect they are indication of something not going right and perhaps they hinder throughput. What are the recommended settings for high latency consumption or in specific these types of messages symptoms?
    adosapati
    @adosapati
    Did anyone containerize brooklin? We are currently running brooklin in our test environment and have big memory machines (128 GB RAM) in the brooklin cluster to support mirroring 1000's of topics. Is it good idea to containerize to start with?
    Ran Tavory
    @rantav
    adosapati
    @adosapati
    oh cool, Thanks!
    Sanjay Kumar
    @sanjay24
    Has anyone tried mirroring compacted topics. I'm seeing the following error.
    com.linkedin.datastream.server.api.transport.SendFailedException: com.linkedin.datastream.common.DatastreamRuntimeException: org.apache.kafka.common.errors.CorruptRecordException: This message has failed its CRC checksum, exceeds the valid size, or is otherwise corrupt.
            at com.linkedin.datastream.server.EventProducer.onSendCallback(EventProducer.java:293)
            at com.linkedin.datastream.server.EventProducer.lambda$send$0(EventProducer.java:194)
            at com.linkedin.datastream.kafka.KafkaTransportProvider.doOnSendCallback(KafkaTransportProvider.java:189)
            at com.linkedin.datastream.kafka.KafkaTransportProvider.lambda$send$0(KafkaTransportProvider.java:155)
            at com.linkedin.datastream.kafka.KafkaProducerWrapper.lambda$null$0(KafkaProducerWrapper.java:198)
            at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1235)
            at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
            at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
            at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:635)
            at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:604)
            at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:561)
            at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:485)
            at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74)
            at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:700)
            at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
            at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:532)
            at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:524)
            at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239)
            at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
            at java.lang.Thread.run(Thread.java:748)
    kg1911
    @kg1911

    Hi, does anyone knows if reassigning partitions with partition-reassignment tools without stopping producer on the topic is something feasible without having messages loss?

    I cant find this information anywhere!

    Ran Tavory
    @rantav
    I've run extensive testing on brooklin, thought you'd be interested in the results https://github.com/AppsFlyer/kafka-mirror-tester/blob/master/results-brooklin.md Plus if there are ways to fix the issues I've encountered with brooklin during those tests, I'd be very happy to hear your suggestions.
    Sonam Mandal
    @somandal
    @sanjay24 are you using "null" keys when you write to the log compacted topic? Log compacted topics cannot have "null" keys.
    Celia Kung
    @celiakung
    @rantav Awesome. Thanks for testing out Brooklin and for publishing the analysis! We have a config called “pausePartitionOnError” documented here: https://github.com/linkedin/brooklin/wiki/Kafka-Connectors-Shared-Logic

    pausePartitionOnError : A flag indicating whether to auto-pause a topic partition if dispatching its data for delivery to the destination system fails

    pauseErrorPartitionDurationMs: The time duration (in milliseconds) to keep a topic partition paused after encountering send errors, before attempting to auto-resume

    I think turning this feature on may help when Brooklin is having trouble producing to the destination cluster
    Flappiness due to consumer rebalance is also something we’re actively working to address. We are coming up with a version of Brooklin which manages and distributes the partitions rather than using Kafka group coordination. When it’s ready, we could use kafka-mirror-tester to re-run the tests :)
    Ran Tavory
    @rantav
    Great @celiakung I'll give pausePartitionOnError a try and update
    Ran Tavory
    @rantav
    @celiakung I ran the experiment again, this time with pausePartitionOnError=true and pauseErrorPartitionDurationMs=30000. Things are somewhat better, still time to recovery (rebalance I suppose) is long. Happy you're working to address that. See the update here https://github.com/AppsFlyer/kafka-mirror-tester/blob/master/results-brooklin.md#experiment-kill-a-broker-in-kafka-destination-aka-kafka-destination-node-flapping-take-2
    Erik
    @strtok
    Hello! Is there any good comparison between the soon to be released MM2 and Brooklin for those just now looking for kafka mirroring options?
    skaur05
    @skaur05
    Thanks a lot for giving us a breather with brooklin release, at this point we are testing it against a very high throughput topic for metrics. And will appreciate your help in tuning efforts. We are doing very extensive setting to make it work for us.
    Are there any recommendations for producer side settings for mirrormaker connector which could be useful for high throughput topics where lag is not affordable even for 10 minutes.
    We are using 1.0.2 version of brooklin
    skaur05
    @skaur05
    here is the error message [2019-12-12 15:39:36,611] ERROR Partition rewind failed due to (com.linkedin.datastream.connectors.kafka.mirrormaker.KafkaMirrorMakerConnectorTask) java.lang.IllegalStateException: No current assignment for partition influx_metrics_monitoring-106 at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:259) at org.apache.kafka.clients.consumer.internals.SubscriptionState.seek(SubscriptionState.java:264) at org.apache.kafka.clients.consumer.KafkaConsumer.seek(KafkaConsumer.java:1508) at com.linkedin.datastream.connectors.kafka.AbstractKafkaBasedConnectorTask.lambda$seekToLastCheckpoint$5(AbstractKafkaBasedConnectorTask.java:615) at java.util.HashMap.forEach(HashMap.java:1289) at com.linkedin.datastream.connectors.kafka.AbstractKafkaBasedConnectorTask.seekToLastCheckpoint(AbstractKafkaBasedConnectorTask.java:615) at com.linkedin.datastream.connectors.kafka.AbstractKafkaBasedConnectorTask.rewindAndPausePartitionOnException(AbstractKafkaBasedConnectorTask.java:251) at com.linkedin.datastream.connectors.kafka.mirrormaker.KafkaMirrorMakerConnectorTask.lambda$sendDatastreamProducerRecord$0(KafkaMirrorMakerConnectorTask.java:259) at com.linkedin.datastream.server.FlushlessEventProducerHandler.lambda$send$1(FlushlessEventProducerHandler.java:77) at com.linkedin.datastream.server.EventProducer.onSendCallback(EventProducer.java:303) at com.linkedin.datastream.server.EventProducer.lambda$send$0(EventProducer.java:194) at com.linkedin.datastream.kafka.KafkaTransportProvider.doOnSendCallback(KafkaTransportProvider.java:189) at com.linkedin.datastream.kafka.KafkaTransportProvider.lambda$send$0(KafkaTransportProvider.java:155) at com.linkedin.datastream.kafka.KafkaProducerWrapper.lambda$null$0(KafkaProducerWrapper.java:198) at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1235) at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204) at org.apache.kafka.clients.producer.internals.ProducerBatch.abort(ProducerBatch.java:157) at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortBatches(RecordAccumulator.java:717) at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortBatches(RecordAccumulator.java:704) at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortIncompleteBatches(RecordAccumulator.java:691) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:185) at java.lang.Thread.run(Thread.java:748)
    AE794596
    @AE794596
    image.png
    Hi ! @celiakung I have some problems about to mirroring kafka clusters.
    Now , I get 3 DCs,the ZK and KAFKA has no correlation between different DCs:
    DC1:10.18.21.1 (Brooklin, Zookeeper)
    DC2:10.18.21.2 (Kafka, Zookeeper)
    DC3:10.18.21.3 (Kafka, Zookeeper)
    The brooklin config on DC1 is here:

    First, I tried mirror Kafka Topic “test1” from DC2 to DC3, and I start the task on DC1:

    bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n ds2to3 -s "kafka://10.18.21.2:9092/test1" -c kafkaMirroringConnector -t kafkaTransportProvider -m '{"owner":"test-user","system.reuseExistingDestination":"false"}'

    it works! I can see the topic “test1” on DC3。But, how to mirror “test2” from DC3 to DC2 without changing the config and restarting Brooklin ?

    After reading the “help”, I tried to add the param “-d” but it dosen`t work:

    bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n ds3to2 -s "kafka://10.18.21.3:9092/test2" -d "kafka://10.18.21.2:9092" -dp 1 -c kafkaMirroringConnector -t kafkaTransportProvider -m '{"owner":"test-user","system.reuseExistingDestination":"false"}'

    AE794596
    @AE794596
    @celiakung is the destination set on "brooklin.server.transportProvider.kafkaTransportProvider.bootstrap.servers" in the server.properties or after the param "-d"?
    ctinmota
    @ctinmota
    Hi all
    I'm trying to run the streaming text files example, but I get this error all the time: java.lang.IllegalArgumentException: zkUtils should be present
    at org.apache.commons.lang.Validate.isTrue(Validate.java:136)
    at com.linkedin.datastream.kafka.KafkaTransportProviderAdmin.createTopic(KafkaTransportProviderAdmin.java:249)
    Do you have any recommendations to solve it? I tried different configurations, but without success :(
    Ahmed Abdul Hamid
    @ahmedahamid
    Hi @ctinmota, which version of Brooklin are you using?
    Ahmed Abdul Hamid
    @ahmedahamid
    I just tried the steps in Streaming Text Files to Kafka with brooklin-1.0.2.tgz but couldn't reproduce the issue
    ctinmota
    @ctinmota
    Hi, thanks for your check. I replace all my components with the ones from example, the version were the same, and now is working. Probably this happened because, previously I tried to run it with 3 instances and something went wrong. I'll recheck this and see if it happens again when switching from one to multiple instances and vice-versa.
    Ahmed Abdul Hamid
    @ahmedahamid
    sure, np
    adosapati
    @adosapati
    We finally rolled out 1st version of Brooklin to production environment. We replaced only one of the mirror makers for the initial phase, we will be replacing entire stack eventually once we gained the confidence. The rebalances taking quite longer like mentioned above by @rantav which is killing us at the moment, we are trying to fine tune few options to minimize them if that's even possible.
    Also, we are running in VMs right now and we will be pushing container version later this week. just to get an idea, @celiakung @ahmedahamid Can you please give your largest brooklin config i.e.. how many nodes, CPU/Memory spec ? I just want to get an idea how the configuration is on your end. Whether its smaller machines with large # of nodes or bigger machines with small? I know it all depends on multiple factors, but would like to get sense of it
    Ahmed Abdul Hamid
    @ahmedahamid
    @adosapati sure, will get back to you shortly
    adosapati
    @adosapati
    Thank you! :+1:
    holstad
    @holstad
    @adosapati the biggest cluster we have today is about 720 nodes with 32 cores and 64 gb ram
    adosapati
    @adosapati
    wow, that's HUGE!
    At Wayfair for Kafka MirrorMakers, we have all together around 2000 nodes with 8 CPUs and 8 GB RAM across all the data centers. The 1st brooklin cluster we released last week is with 20 nodes 64 GB RAM and 16 CPUs. This is only small part, and we are brainstorming whether to grow this cluster or go by individual clusters by application, FWIW
    adosapati
    @adosapati
    Also, Can you share these please?
    1. What % of total memory you are giving to Heap?
    2. What is the ratio of # of tasks to CPU cores? or any recommendation you could give us?
    @ahmedahamid @holstad
    Thomas Law
    @thomaslaw
    1. 32GB for the footprint described above
    2. We generally try to keep < 10 tasks per node.
    @adosapati
    adosapati
    @adosapati
    Hey, Thank you! We have been running 100 tasks on each node which is why we might be experiencing OOMs. Also it is interesting to see that you have only 50% of total memory allocated to JVM?
    May I ask you for a quick 15 - 20 minutes call with our team at Wayfair? We are about to push our brooklin to mirror our largest Kafka Clusters in production, and want to make sure we are on the right track on the system settings. Our questions will be mostly around
    1. Kafka mirrormaker tasks to source kafka topic partitions ratio
    2. number of brookling clusters in relation to # of kafka clusters (source & destination combinations)
    3. Memory allocation for JVM and memory consumption patterns for brooklin
    4. Any other settings for us to focus on?
    Ahmed Abdul Hamid
    @ahmedahamid
    @adosapati No problem. I'll talk to some folks and get back to you.
    adosapati
    @adosapati
    That will be awesome, appreciate it!