Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Ahmed Elbahtemy
    @ahmedahamid
    @akaritakai could you, please, weigh in on @jaisonpjohn's question?
    rakeshsuresh1
    @rakeshsuresh1

    @rakeshsuresh1 All artifacts should be available on Bintray now.

    Awesome :)

    jaisonpjohn
    @jaisonpjohn

    @akaritakai could you, please, weigh in on @jaisonpjohn's question?

    Thanks @ahmedahamid

    Ahmed Elbahtemy
    @ahmedahamid
    my pleasure!
    rakeshsuresh1
    @rakeshsuresh1

    @ahmedahamid Can this PR be taken a look at: linkedin/brooklin#637

    It was approved earlier and now had some conflicts from other PR merges to master, I have resolved the conflicts.

    Sam Obeid
    @sam-obeid
    Anyone has an example or documentation on running Brooklin with two different SSL confgirations for the source and destination clusters?
    Ahmed Elbahtemy
    @ahmedahamid
    @rakeshsuresh1 Sure. I pinged one of my coworkers for a review.
    Thomas Law
    @thomaslaw
    @sam-obeid you should be able to set the properties for the producer under the transportProvider configs. Likewise, you can do the same for the consumer under the connector properties. Can you share your config (removing any secrets)?
    Ahmed Elbahtemy
    @ahmedahamid
    @rakeshsuresh1 merged
    would you need us to release a new version?
    andrejzw
    @andrejzw
    Hi
    does Brooklin have support for aws kinesis today so I can replicate data between kinesis and kafka?
    andrejzw
    @andrejzw
    Wil you build a mqtt connector to exchange messages between mqtt broker and kafka?
    shanrd7
    @shanrd7
    Hello All
    Can we mirror topics between different names using Brooklin ?
    rakeshsuresh1
    @rakeshsuresh1

    @ahmedahamid Would appreciate that very much :)

    would you need us to release a new version?

    shanrd7
    @shanrd7
    @ahmedahamid Please release a new version with all fixes
    Thank you
    shanrd7
    @shanrd7
    Is there a way we cant start the Brooklin on port 80 ?
    I have edited server.properties
    it still fails
    Shun-ping Chiu
    @Jyouping
    @rakeshsuresh1 I just saw your comments about StickyPartitionAssignmentStrategy. We don't really use the value maxPartitionPerTask to compute the partition assignment. The assignment is still computed in a round-robin fashion based on the number of partitions and tasks. For example, 20 partitions with 100 tasks will result 20 tasks work on a single partitions. The reason to have maxPartitionPerTask is just to provide a check and fail hard so that the user will be aware that there are might be too many partitions in a single task, which may cause an unexpected runtime zookeeper issue.
    Unless you have partitions for more than >2000 in a single task, I think you can safely ignore that without setting that value.
    adosapati
    @adosapati
    Hi, Is there anyway we can set the streams to mirror from latest offset or the earliest offset similar to what we have in Kafka Mirrormaker? apologize if someone already asked this question, haven't gone through entire chat here.
    rakeshsuresh1
    @rakeshsuresh1

    @Jyouping I was addressing the scenario of having tasks on standby with the strategy. As you mentioned, 20 partitions with 100 tasks will result in 20 tasks working on 20 partitions, and 80 tasks will be on standby not performing any operation. Is there a reason to create all 100 tasks upfront?

    was suggesting to create tasks that are necessary based on source partitions and scale up when needed by adding more tasks?

    For example, 20 partitions with 100 tasks will result 20 tasks work on a single partitions

    Shun-ping Chiu
    @Jyouping
    @adosapati yes, you can set brooklin.server.connector.kafkaMirroringConnector.consumer.auto.offset.reset=earliest for example
    Shun-ping Chiu
    @Jyouping
    @rakeshsuresh1 Hi, Kafka consumers are quite efficient and we should always allocate multiple partitions to a single consumer to increase resource utilization. As for StickyParititonAssignmentStrategy, it is assuming a scenario where we have too large amounts of traffic/topic-partitions so that we have to create many tasks to handle, in which we may see Kafka rebalance issue due the scale. Generally speaking, the number of partitions is much more than the number of tasks in this case. The usage of StickyPartitionAssignment means Brooklin handles the assignment itself, and it needs to be used with brooklin.server.connector.kafkaMirroringConnector.enablePartitionAssignment = true to take the full effect.
    Yet, if you don't have topic partitions > 10k with more than 100 tasks, you should just go with BroadcastStrategy which relies on Kafka to perform the assignment and it should be efficient enough.
    adosapati
    @adosapati
    @Jyouping got it, Thank you!
    Lee Dongjin
    @dongjinleekr

    @ahmedahamid Hello. I am test-driving Brooklin for Kafka Mirroring task now. Here are some questions:

    1. As of 1.0.0, server health (i.e., /health/) returns the list of running connectors. And it seems like there is no way to retrieve the list of available (i.e., in the classpath but not configured) connectors and transport providers yet. Right?
    2. If the user wants to add or modify the configuration of connectors or transport providers, the only way to do this is to edit the properties file and restart Brooklin instance. Isn't it?
    3. If the user wants to configure more than one destination kafka cluster, they must add the transport providers for each destination kafka cluster in the properties file. Right?

    IMHO, storing all connector & transport provider configurations in properties file not only makes the operation hard but also makes its docker container hard to configure. (As you can remember, I am maintaining the docker image of Brooklin.)

    Do you have any plan to make it dynamically configurable via some REST Api?

    Ahmed Elbahtemy
    @ahmedahamid

    @dongjinleekr Hi, Lee.

    1. Correct.
    2. Yes.
    3. @Jyouping could you, please, confirm?

    Which aspects of operation do you find difficult given the current setup? Why do you need these configs to be dynamically modifiable at runtime?

    Ahmed Elbahtemy
    @ahmedahamid
    @shanrd7 @rakeshsuresh1 A new version (1.0.1) has just been published
    adosapati
    @adosapati
    Awesome news! I ve been looking for it as well. Thank you!
    Ahmed Elbahtemy
    @ahmedahamid
    @adosapati you bet!
    @shanrd7 Are you getting an error like java.net.BindException: Permission denied:80? If the answer is yes, then you may find this useful: https://serverfault.com/questions/112795/how-to-run-a-server-on-port-80-as-a-normal-user-on-linux
    adosapati
    @adosapati
    Morning! have been testing brooklin for the past week before we release to prod. Came across this bug when using few server names in the source. It works fine when I define same host as destination, it only fails when I put it in the source.
    Here is the error message:
    Response status 400, serviceErrorMessage: msg=Invalid input params for create request; cause=kafkac14-cld1-g1-1.c.ma-azu-us-bd-dev.internal is not a valid hostname or ip; instance=kafkac1n3.dev.bo1.azudom.com-0000000066; id=df5a27;
    This could be a bug in the hostname validation?
    shanrd7
    @shanrd7
    @ahmedahamid thank you
    Lee Dongjin
    @dongjinleekr

    @ahmedahamid

    Which aspects of operation do you find difficult given the current setup? Why do you need these configs to be dynamically modifiable at runtime?

    Well, it's not a critical problem. But from the operational view, I think separating Connector/TransportProvider settings from the cluster-wide ones would be better. Here is why: Connector/TransportProvider settings tend to be modified frequently than cluster-wide settings. For example, every time a new system (e.g., Kafka Cluster or BigQuery) is added to the topology, the user must update the settings of all nodes and restart the Brooklin daemon. I experienced this situation so often when I was running Kafka Connect and already witnessed it again when I was test-driving Brooklin for Kafka Mirroring.
    If someone would like to run Brooklin with Docker image and Kubernetes, the situation becomes worse. If we can configure the Connectors/TransportProviders with REST API call, we can avoid this problem.

    @celiakung How do you think?

    Ahmed Elbahtemy
    @ahmedahamid
    @adosapati our hostname validation logic (here) requires a hostname that conforms to RFC 1123 (see this wikipedia page for more details) . Which properties did you set for source and destination?
    adosapati
    @adosapati
    '{
    "name" : "c8_stream_delete",
    "connectorName" : "kafkaMirroringConnector",
    "transportProviderName" : "kafkaTransportProvider_gcp",
    "source" : {
    "connectionString" : "kafka://kafkac8n1.dev.ma1.azudom.com:9092/influx_metrics"
    },
    "Status" : "READY",
    "destination" : {
    "connectionString" : "kafka://kafkac14-cld1-g1-1.c.ma-azu-us-bd-dev.internal:9092/*"
    },
    "metadata" : {
    "datastreamUUID" : "addaeb93-4e26-47eb-89f8-888a4e9a7d1b",
    "group.id" : "c8_stream_delete",
    "owner" : "bigdataapp",
    "system.IsConnectorManagedDestination" : "true",
    "system.creation.ms" : "1568832237695",
    "system.destination.KafkaBrokers" : "kafkac14-cld1-g1-1.c.ma-azu-us-bd-dev.internal:9092",
    "system.destination.identityPartitioningEnabled" : "true",
    "system.reuseExistingDestination" : "false",
    "system.taskPrefix" : "c8_stream_delete"
    }
    }'
    The above code works perfectly fine. Source is : kafka://kafkac8n1.dev.ma1.azudom.com:9092/influx_metrics
    Destination: kafkac14-cld1-g1-1.c.ma-azu-us-bd-dev.internal:9092
    But it doesn't work the other way when I flip the source and destination. It fails with "not a valid hostname"
    Thomas Law
    @thomaslaw
    @dongjinleekr probably not the answer you are looking for directly, but for transport providers, you can dynamically set multiple clusters for a single transport provider, if you do not specify the brokers in the config and instead pass them as metadata in the datastream.
    I'm not sure what other transport-provider specific settings are also override-able this way within the stream metadata. If you have particular settings in mind I can check.
    Shun-ping Chiu
    @Jyouping
    @dongjinleekr That's correct, you need a transport provider for each dest cluster. However, we actually suggest BMM to sit near the destination clusters to reduce buffering in producer. So ideally, it should be rare to configure multiple transport providers in one single BMM.
    Ahmed Elbahtemy
    @ahmedahamid
    @adosapati when you set the destination to kafkac14-cld1-g1-1.c.ma-azu-us-bd-dev.internal:9092, do you see that Brooklin is capable of producing to that destination successfully without any issues?
    adosapati
    @adosapati
    Yep, There are no issues when it is the destination.
    ./brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n scribe_stream -s "kafka://kafkac1n1.dev.ma1.azudom.com:9092/scribeLogs" -c kafkaMirroringConnector -t kafkaTransportProvider_azu -m '{"owner":"bigdataapp","system.reuseExistingDestination":"false"}' 2>/dev/null
    adosapati
    @adosapati
    Thats the code I use to create new Datastreams to that host. Transport provider : kafkaTransportProvider_azu points to kafkac14-cld1-g1-1.c.ma-azu-us-bd-dev.internal:9092 in the config file
    Ahmed Elbahtemy
    @ahmedahamid
    @adosapati and were you able to confirm the destination Kafka cluster/topics received the data as expected?
    adosapati
    @adosapati
    Yes. Sorry should have mentioned that earlier. I see the messages flowing as expected.
    Lee Dongjin
    @dongjinleekr

    @thomaslaw

    but for transport providers, you can dynamically set multiple clusters for a single transport provider, if you do not specify the brokers in the config and instead pass them as metadata in the datastream.

    I see! Then, the meaning of the properties file 'overridable, default value'. Isn't it?

    @Jyouping Then, 1. the Brooklin cluster should be deployed close to where the target storage located. 2. Changing TP settings frequently is not recommended. Right?