Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
    Pritam Sarkar
    Pulling from Maven 0.14.0
    Shirshanka Das
    Can you paste your job pull / conf file here?
    Pritam Sarkar
    Config is very similar to what is described in the doc. Below is a snippet:

    ```#### JOB
    job.description=Gobblin job to pull event logs from Kafka









    Converter, writer, publisher configs per fork


    EventConverter gets the protobuf events and converts them into ParquetGroup to write with ParquetDataWriterBuilder
    Shirshanka Das
    and what is the stack trace you get?
    Pritam Sarkar
    Oops! Looks like my logs are cleared. I looked into the code a couple of times, so let me write it down the stack.
    This message was deleted
    PartitionedDataWriter (106) => CloseOnFlushWriterWrapper (71) => this.writer = writerSupplier.get();
    Shirshanka Das
    and what is the error you get on this?
    is it an NPE? or something else?
    Shirshanka Das
    @pritamsarkar86 : I’m not able to reproduce your issue. Getting a real stack trace would help.
    @shirshanka am getting the below error , am running on windows machine
    Shirshanka Das
    @softkumsh are you building from the top-level directory using ./gradlew ?
    Pritam Sarkar
    @shirshanka will send the stack by tomorrow for the parquet issue. Thanks.
    Shirshanka Das
    thanks @pritamsarkar86, on my local build, I’m able to generate parquet files from avro etc … not seeing the issue you described
    Pritam Sarkar
    Please post your config
    Shirshanka Das
    I have some uncommitted changes related to a test source to generate in-memory json, so the source config won’t work for you
    Pritam Sarkar
    I was trying to directly write ParqueGroup using ParquetDataWriterBuilder, but looks like you might be doing it via avro conversion first.
    I see.
    Shirshanka Das
    I’m basically enhancing the current test source in gobblin to generate in-mem json / avro etc
    so that it is easy to hook it up to any writer / converter pipeline
    dynamic generation of protobuf messages is a bit tricky since proto relies on code-gen from proto file to java class .. unless I’m missing something
    Pritam Sarkar
    I was trying to save that computation step via converting to ParquetGroup . I already have the Protobuf java classes.
    Shirshanka Das
    yeah I will be committing a native proto writer for parquet as well
    so that you can go straight into the parquet writer with a Message class and not worry about converting
    however, your current approach of converting to Parquet Group and handing it off to the Parquet Writer should work (based on my experiments)
    Pritam Sarkar
    The approach that you took for parquet (msg -> json -> avro -> parquet), I took the same approach for ORC and it works fine. Will try this out for Parquet as well.
    For parquet, what converters are you chaining ?
    Shirshanka Das
    here is my config file: https://pastebin.com/8EKQ538H
    it includes a sequential test source, with config that won’t work in the default gobblin
    since they contain uncommitted changes
    Pritam Sarkar
    Thanks @shirshanka
    Shirshanka Das
    @pritamsarkar86 : in the meantime if you can work on building gobblin locally and using the jars from there, it would help once I check in my changes
    Pritam Sarkar
    at org.apache.gobblin.writer.PartitionedDataWriter$2.get(PartitionedDataWriter.java:148) at org.apache.gobblin.writer.PartitionedDataWriter$2.get(PartitionedDataWriter.java:141) at org.apache.gobblin.writer.CloseOnFlushWriterWrapper.<init>(CloseOnFlushWriterWrapper.java:71) at org.apache.gobblin.writer.PartitionedDataWriter.<init>(PartitionedDataWriter.java:140) at org.apache.gobblin.runtime.fork.Fork.buildWriter(Fork.java:534) at org.apache.gobblin.runtime.fork.Fork.buildWriterIfNotPresent(Fork.java:542) at org.apache.gobblin.runtime.fork.Fork.processRecord(Fork.java:502) at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:103) at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:86) at org.apache.gobblin.runtime.fork.Fork.run(Fork.java:243) at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Thread
    @shirshanka this is the crash I was seeing
    Shirshanka Das
    @pritamsarkar86 : do you mind pasting the entire stack trace onto pastebin and sending the link here.. this stack trace doesn’t tell me the exception that was thrown
    Shirshanka Das
    Pritam Sarkar
    Thanks @shirshanka , I will try this out. Meanwhile we tried ORC and that worked without any issues.
    Shirshanka Das
    Shirshanka Das
    @pritamsarkar86 there isn’t a published release for this yet, we are considering building and releasing nightlies to help with earlier integration testing
    Shirshanka Das
    @pritamsarkar86 : it would be great if you could file the issue you were facing with the Parquet writer on Gobblin JIRA so we could take a look in detail. Specifically the stack trace that contains the Exception being thrown.
    Hi, I'm trying to deploy gobblin in yarn mode. When I have launched gobblin by cmd "bin/gobblin-yarn.sh --verbose start" it started onto yarn, but I have seen an issue in the log during fetching metadata from kafka. It is using old version of kafka client (Kafka08ConsumerClient) and my cluster is running in cloudera kafka 3.1.0. It is kafka 1.0.1. Is that happen because of too old kafka client? Do you have any experience with gobblin on yarn? Does it work? There is no documentation in section Deployment/YARN architecture.
    2020-01-28 01:54:00 PST WARN  [DefaultQuartzScheduler_Worker-1] org.apache.gobblin.kafka.client.Kafka08ConsumerClient  - Fetching topic metadata from broker czrtim1hr.oskarmobil.cz:9092 has failed 1 times.
    java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
        at kafka.utils.Utils$.read(Utils.scala:381)
        at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
        at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
        at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
        at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
        at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:79)
        at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68)
        at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:91)
        at kafka.javaapi.consumer.SimpleConsumer.send(SimpleConsumer.scala:68)
        at org.apache.gobblin.kafka.client.Kafka08ConsumerClient.fetchTopicMetadataFromBroker(Kafka08ConsumerClient.java:147)
        at org.apache.gobblin.kafka.client.Kafka08ConsumerClient.getFilteredMetadataList(Kafka08ConsumerClient.java:131)
        at org.apache.gobblin.kafka.client.Kafka08ConsumerClient.getTopics(Kafka08ConsumerClient.java:98)
        at org.apache.gobblin.kafka.client.AbstractBaseKafkaConsumerClient.getFilteredTopics(AbstractBaseKafkaConsumerClient.java:78)
        at org.apache.gobblin.source.extractor.extract.kafka.KafkaSource.getFilteredTopics(KafkaSource.java:765)
        at org.apache.gobblin.source.extractor.extract.kafka.KafkaSource.getWorkunits(KafkaSource.java:212)
        at org.apache.gobblin.runtime.SourceDecorator.getWorkunitStream(SourceDecorator.java:81)
        at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:408)
        at org.apache.gobblin.cluster.GobblinHelixJobLauncher.launchJob(GobblinHelixJobLauncher.java:378)
        at org.apache.gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:487)
        at org.apache.gobblin.cluster.HelixRetriggeringJobCallable.runJobLauncherLoop(HelixRetriggeringJobCallable.java:203)
        at org.apache.gobblin.cluster.HelixRetriggeringJobCallable.call(HelixRetriggeringJobCallable.java:159)
        at org.apache.gobblin.cluster.GobblinHelixJobScheduler.runJob(GobblinHelixJobScheduler.java:228)
        at org.apache.gobblin.cluster.GobblinHelixJob.executeImpl(GobblinHelixJob.java:61)
        at org.apache.gobblin.scheduler.BaseGobblinJob.execute(BaseGobblinJob.java:58)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
    2020-01-28 01:54:01 PST INFO  [DefaultQuartzScheduler_Worker-1] kafka.utils.Logging$class  - Reconnect due to socket error: java.nio.channels.ClosedChannelException
    Chhaya Vankhede

    I am new to gobblin. I have downloaded incubator-gobblin-gobblin_0.11.0.tar.gz. while installing gobblin on windows 10 by following the instructions at getting started when running ./gradlew :gobblin-distribution:buildDistributionTar I am getting below result.
    ``` FAILURE: Build failed with an exception.

    • What went wrong:
      Could not determine the dependencies of task ':gobblin-distribution:buildDistributionTar'.
      > Configuration with name 'dataTemplate' not found.

    • Try:
      Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

    BUILD FAILED ``` how can i solve it

    Shirshanka Das
    @tichyj : Gobblin on Yarn does work, in fact we are running it at LinkedIn currently. Which version of gobblin are you using?
    @csvankhede : can you build from gobblin incubator master on GitHub? https://github.com/apache/incubator-gobblin
    Chhaya Vankhede
    @shirshanka by building from gobblin incubator master on GitHub. I am getting
     > Configure project :gobblin-distribution
    Using flavor:standard for project gobblin-distribution
    FAILURE: Build failed with an exception.
    * Where:
    Build file 'C:\Users\csvan\incubator-gobblin\gobblin-restli\gobblin-flow-config-service\gobblin-flow-config-service-api\build.gradle' line: 1
    * What went wrong:
    Could not compile build file 'C:\Users\csvan\incubator-gobblin\gobblin-restli\gobblin-flow-config-service\gobblin-flow-config-service-api\build.gradle'.
    > startup failed:
      build file 'C:\Users\csvan\incubator-gobblin\gobblin-restli\gobblin-flow-config-service\gobblin-flow-config-service-api\build.gradle': 1: unexpected token: .. @ line 1, column 1.
      1 error
    * Try:
    Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
    * Get more help at https://help.gradle.org
    Deprecated Gradle features were used in this build, making it incompatible with Gradle 5.0.
    Use '--warning-mode all' to show the individual deprecation warnings.
    See https://docs.gradle.org/4.9/userguide/command_line_interface.html#sec:command_line_warnings
    Shirshanka Das
    this might be a windows specific build issue… do you have access to a linux / unix environment?
    Chhaya Vankhede
    @shirshanka No