Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Ahmed Abdul Hamid
    @ahmedahamid
    @rgibbard we're currently test-driving that. are you interested in following the news of its development knowing that it has a dependency on Oracle's Big data adapter?
    Ahmed Abdul Hamid
    @ahmedahamid
    @sanjay24 We haven't looked into what it would take to have Brooklin work in setups where it's mirroring Kafka clusters that are configured and used to operate under exactly-once semantics (assuming you're referring to this)
    we have been mostly running it with the usual at least once expectations
    Revanth
    @revanthpobala
    I am trying to run brooklin and in the wiki I have this https://github.com/linkedin/brooklin/wiki/Streaming-Text-Files-to-Kafka
    When I execute this command bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n first-file-datastream -s NOTICE -c file -p 1 -t kafkaTransportProvider -m '{"owner":"test-user"}'
    to create a datastream I am getting connection refused exception
    Ahmed Abdul Hamid
    @ahmedahamid
    @revanthpobala I've responded to the ticket you opened.
    Mario Alberto Romero Sandoval
    @mariors
    Hi, I am dealing with the ssl configuration for both source and destination cluter on brooklin. But, is there any way to send those configuration in the POST message to the API? I mean how would the ideal setup be when working with a single destination cluster but multiple sources using tls?
    I found this linkedin/brooklin#619 but I think it will only work with one source cluster
    Yαkηyα Δαβο
    @yakhyadabo
    Question: Has anyone tried to deploy a brooklin cluster using Kubernetes ?
    I did some research but found nothing.
    Sebastian Cheung CQF
    @scheung38
    Hello, like to migrate on-prem data ingestion Kafka/ksqlDB cluster into Azure, what is the best practice and are there any examples to show how easily or difficult this can be done? Is this categorized as mirroring?
    iftachby
    @iftachby

    Hello, I am trying to test Brooklin as a replacement for Kafka MM. I have 3 source clusters in different DCs and 2 aggr clusters (The aggr clusters share DC with 2 of the source clusters - so we have in DC A and B a source + aggr cluster each, and a DC C with just a source cluster) I am trying to mirror from all 3 DCs into each aggr cluster. I created 2 Brooklin clusters, 1 in each DC with an aggr cluster, and on each Brooklin cluster a datastream per source cluster

    It seems that each aggr only gets messages from the source cluster in the same DC - I.E; Aggr in DC A gets messages from source A, aggr in DC B gets messages from source in DC B

    The output of bin/brooklin-rest-client.sh -o READALL (snippet for connection strings)
    "source" : {
    "connectionString" : "kafka://kafka-source.service.A.consul:9092/(topicX|topicY)"
    },

    "source" : {
    "connectionString" : "kafka://kafka-source.service.B.consul:9092/(topicX|topicY)"
    },

    "source" : {
    "connectionString" : "kafka://kafka-source.service.C.consul:9092/(topicX|topicY)"
    },

    Of course, the connection string addresses are pingable from all machines.
    Any idea what I'm doing wrong?

    iftachby
    @iftachby
    Upon further inspection - it seems like Brooklin by default does not write copy to a topic by more than 1 datastream? Is this possible to overcome? In KMM we mirrored all 3 sources into 1 topic on each aggr cluster. Would like to keep it this way if possible
    iftachby
    @iftachby
    Hi. Can anyone please help me understand how to edit a running datastream (if thats possible?) - I want to change the topics mirrored. Do I have to delete it and recreate or is there another way?
    Sanjay Kumar
    @sanjay24
    hi @ahmedahamid when are you release 1.0.3?
    Sanjay Kumar
    @sanjay24
    releasing*
    ivorodrigues
    @ivorodrigues
    Hi all I have a question,
    Is this output from status API reporting negative lag?
    If so how is it possible, and should I be worry about it?
    [
      {
        "key": {
          "topic": "my.topic",
          "partition": 0,
          "datastreamTaskPrefix": "my-kafka-cluster-dc1-dc2",
          "datastreamTaskName": "my-kafka-cluster-dc1-dc2_d83dfd5e-1e80-4422-9d76-cebd07a3d205",
          "connectorTaskStartTime": 1604657249726
        },
        "value": {
          "brokerOffset": 8979,
          "consumerOffset": 9108,
          "assignmentTime": 1604657254571,
          "lastRecordReceivedTimestamp": 1604657479629,
          "lastBrokerQueriedTime": 1604657460109,
          "lastNonEmptyPollTime": 1604657479816
        }
      },
      {
        "key": {
          "topic": "my-topic",
          "partition": 2,
          "datastreamTaskPrefix": "my-kafka-cluster-dc1-dc2",
          "datastreamTaskName": "my-kafka-cluster-dc1-dc2_d83dfd5e-1e80-4422-9d76-cebd07a3d205",
          "connectorTaskStartTime": 1604657249726
        },
        "value": {
          "brokerOffset": 8183,
          "consumerOffset": 8256,
          "assignmentTime": 1604657254571,
          "lastRecordReceivedTimestamp": 1604657479698,
          "lastBrokerQueriedTime": 1604657460268,
          "lastNonEmptyPollTime": 1604657479829
        }
      },
      {
        "key": {
          "topic": "my-topic",
          "partition": 1,
          "datastreamTaskPrefix": "my-kafka-cluster-dc1-dc2",
          "datastreamTaskName": "my-kafka-cluster-dc1-dc2_d83dfd5e-1e80-4422-9d76-cebd07a3d205",
          "connectorTaskStartTime": 1604657249726
        },
        "value": {
          "brokerOffset": 7700,
          "consumerOffset": 7773,
          "assignmentTime": 1604657254571,
          "lastRecordReceivedTimestamp": 1604657479566,
          "lastBrokerQueriedTime": 1604657460198,
          "lastNonEmptyPollTime": 1604657479753
        }
      }
    ]
    ivorodrigues
    @ivorodrigues
    To calculate a lag in this case is consumerOffset-brokerOffset right?
    André Cardoso
    @cardosoa2

    Greetings mates,
    I am exploring the Brooklin framework to be used as solution for data replication at Fanduel.
    But I am getting this error message log:
    Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The message is 1260380 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration. [2020-11-20 12:11:53,118] WARN Detect exception being thrown from callback for src partition: soccer.ly.agglomerated.events-6 while sending, metadata: null , exception: (com.linkedin.datastream.connectors.kafka.mirrormaker.KafkaMirrorMakerConnectorTask) com.linkedin.datastream.server.api.transport.SendFailedException: com.linkedin.datastream.common.DatastreamRuntimeException: org.apache.kafka.common.errors.RecordTooLargeException: The message is 1260380 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.
    The message it is self explain, however my difficulty right now is to change the property of request max size for the producer.
    I realise that the LiKafkaProducerFactory.java file instantiate a producer with 100MB.
    How can I instantiate a consumer with the same size? It is done by configuration file? Or by HTTP request when creating the datastream? Do you have any request example?

    Thank you in advance

    FYI: I using the kafkaMirroringConnector
    André Cardoso
    @cardosoa2
    Current producer config in the logs:
    `

    [2020-11-20 12:11:53,107] INFO ProducerConfig values: acks = 1 batch.size = 16384 bootstrap.servers = [....] buffer.memory = 33554432 client.id = datastream-producer compression.type = none connections.max.idle.ms = 540000 enable.idempotence = false interceptor.classes = [] key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer linger.ms = 0 max.block.ms = 60000 max.in.flight.requests.per.connection = 5 max.request.size = 1048576 metadata.max.age.ms = 300000

    The max.request.size is 1MB... should not be 100MB?

    iftachby
    @iftachby
    Hi everyone, is it possible to tell brooklin to mirror kafka topic from latest or from specific offset?
    Sanjay Kumar
    @sanjay24
    @iftachby you can specify "system.auto.offset.reset": "latest" as metadata attribute while creating your datastream
    iftachby
    @iftachby
    @sanjay24 I already added the same parameter to my brooklin server.properties file and that helped. Thanks!
    Sanjay Kumar
    @sanjay24
    @iftachby Adding that in metadata can help you specify it at datastream level
    iftachby
    @iftachby
    @sanjay24 got it - thanks!
    iftachby
    @iftachby
    Hi everyone
    I wanted to ask if there are plans to release new versions for Brooklin (perhaps with newer Kafka Client? :) )
    Thanks!
    Mike Papetti
    @papetti23
    Is anyone using Brooklin to manage data in influxDB? We’re well over cardinality limits and having to make trade offs between memory, retention and cardinality
    I feel like there might be an opportunity since kafka is already in our stack but I’m not quite sure if mirroring or CDC is the best pattern since the influxes are sharded out
    Mike Papetti
    @papetti23
    Or sybase? Anyone using Brooklin for an old ass version of Sybase for change data capture?
    akhilc1
    @akhilc1
    Screenshot from 2021-02-08 16-25-36.png
    Hi, I am getting connection refused error while trying to run script
    bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311 -n first-file-datastream -s /home/qbuser/Downloads/invoice.pdf -c file -p 1 -t kafkaTransportProvider -m '{"owner":"test-user"}'
    Screenshot from 2021-02-08 16-28-14.png
    Any help is appreciated
    akhilc1
    @akhilc1
    Got solution I did not gradle build the kafka project folder, its not mentioned anywhere in the instructions, could be good if you add it.
    akhilc1
    @akhilc1
    Can I use brooklin to sync data between two postresql databases?
    iftachby
    @iftachby
    @celiakung @ahmedahamid @somandal Hi - are there plans to release new versions of brooklin with bug fixes / new features? saw that lot of fixes were pushed to master, would love to get them in a stable release :)
    Yαkηyα Δαβο
    @yakhyadabo

    Hi,

    I deploy Brooklin with Kubernetes to migrate data from one Kafka cluster to another.
    For some reason the container dies and restart during the migration. At the end of the migration I noticed that there are more messages in the target cluster than there are in the source cluster. Most of the messages are duplicated.

    I’d like to know how can I ensure that no message will be duplicated in the same data stream, even when the pod restarts after failure.

    skaur05
    @skaur05
    Hi @celiakung @ahmedahamid @somandal We are planning to implement brooklin with secure kafka using strimzi and security protocol sasl_ssl, I am using this config as below.
    brooklin.server.transportProvider.kafkaTransportProvider.security.protocol=SASL_SSL
    brooklin.server.transportProvider.kafkaTransportProvider.sasl.mechanism=OAUTHBEARER
    brooklin.server.transportProvider.kafkaTransportProvider.sasl.login.callback.handler.class=io.strimzi.kafka.oauth.client.JaasClientOauthLoginCallbackHandler
    I have been able to set up brooklin with above settings and data stream is create, authorization with keycloak works fine. No errors are observed. But I brooklin is not consuming from source cluster. I am using sasl for both transport porvider and mirror maker connector consumer settings.
    Piotr Wikieł
    @wikp
    Hello All, is there anyone who succeeded to deploy brooklin with large number of datastreams without problems? By "large number" I mean at least 500. We have such an installation (with custom transport provider to Google PubSub) and have problems with service restarts. It simply takes forever to coordinate instance become healthy.
    Also to maintainers: do you have plans to migrate to newer versions of Kafka?
    Mike Papetti
    @papetti23
    Can you conclude anything related to resources?
    To answer your question I have not
    I was looking for a sybase solution
    For change data capture
    Tomer Setty
    @RagingPuppies

    Hi all, after trying to investigate by myself for while, i would like to see if someone is familiar with this issue and can push me into some direction, i'm using brooklin 1.0.2.

    After restarting a broker / broker failures (anything that triggers a leader election) seems like some brooklin TransportProviders cant self heal and get stuck in a loop.
    brooklin is set with "acks": "all"
    which will allow the procuder to produce only in case of minisr => isr,
    after checking we are set with default, which is minisr=1 so leader only should be enough to allow producing.

    brooklin is set with "pausePartitionOnError": "true",
    A flag indicating whether to auto-pause a topic partition if dispatching its data for delivery to the destination system fails.

    when brooklin producer AKA TransportProvider receives an error, it will pause following the configuration "pauseErrorPartitionDurationMs": "180000" (3 minutes).
    The time duration (in milliseconds) to keep a topic partition paused after encountering send errors, before attempting to auto-resume.

    looking on brookling logs i could find the following errors at the corresponding time of the issue:
    "Flush interrupted."
    "This server is not the leader for that topic-partition."
    "Partition rewind failed due to"
    means that at this moment, our brooklin producer is trying to work against a non-leader partition.
    roughly 5 minutes later, i've witnessed the following error messages:
    "Expiring 227 record(s) for <topic_name>-12: 302797 ms has passed since last append"
    after comparing this with the brooklin configuration i've spotted "request.timeout.ms": "300000" which is 5 minutes.

    for the next 20 minutes we received NotLeaderForPartitionException, which means we did not produced data and seems like we did not consumed.
    later on theres only one exception, "Producer is closed forcefully."
    reading a bit online someone said it may be that the produce can't keep with the consume,
    "producersPerTask" and "numProducersPerConnector" in our configuration should do the job.

    At the same time, we have another Datastream that replicates different source to the SAME dst cluster and topics, sharing the same configurations, the failing Datastream have 8 more in maxTasks,
    The source of the failing Datastream is kafka remote cluster while the working Datastream is a kafka local cluster, and the local does not fail at all, not even a single exception.

    brooklin configurations:
    https://pastebin.com/raw/kHACqwcA

    manual service restart solved that issue couple of times
    let me know if there's a need to share full excpetions