Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Shirshanka Das
    @shirshanka
    GOBBLIN_JOB_CONFIG_DIR=<job-conf-dir> <gobblin-distribution-dir>/bin/gobblin.sh service standalone start
    Ziauddin135
    @Ziauddin135
    @shirshanka sure i will do the same way. could you please help to know if job directory and config looks fine ?
    Shirshanka Das
    @shirshanka
    looks okay… I’m surprised you’re still getting the error which makes me think it might be pulling the file from a different place
    unfortunately the parse-error is not very specific about which file and the line number of the error… otherwise it would be easy to debug… will file an issue to fix that
    Ziauddin135
    @Ziauddin135
    yes that is strange. is it due to because am using gobblin-standalone.sh and not gobblin.sh script to start the job?. will check it. Also i followed step what was given. build tar file as gradle build and untar it at someother location and started as gobblin-standalone.
    Ziauddin135
    @Ziauddin135
    Hi @shirshanka @jhsenjaliya i am still getting the same error [root@azmaster gobblin-dist]# /gobblin/gobblin-dist/bin/gobblin.sh service standalone start
    WARN: HADOOP_HOME is not defined. Gobblin Hadoop libs will be used in classpath.
    Started the Gobblin standalone process [pid: 7098] ... [DONE]
    [root@azmaster gobblin-dist]# cd logs
    [root@azmaster logs]# ls
    standalone.err standalone.out
    [root@azmaster logs]# cat standalone.err
    Sep 24, 2020 5:22:48 AM com.google.common.util.concurrent.ServiceManager$ServiceListener failed
    SEVERE: Service JobScheduler [FAILED] has failed in the STARTING state.
    com.typesafe.config.ConfigException$BadPath: Reader: 30: Token not allowed in path expression: ':' (you can double-quote this token if you really want it here)
    at com.typesafe.config.impl.Parser.parsePathExpression(Parser.java:1095)
    at com.typesafe.config.impl.Parser.parsePathExpression(Parser.java:1049)
    at com.typesafe.config.impl.Parser.access$000(Parser.java:27)
    at com.typesafe.config.impl.Parser$ParseContext.tokenToSubstitutionExpression(Parser.java:375)
    at com.typesafe.config.impl.Parser$ParseContext.parseValue(Parser.java:518)
    at com.typesafe.config.impl.Parser$ParseContext.consolidateValueTokens(Parser.java:400)
    at com.typesafe.config.impl.Parser$ParseContext.parseObject(Parser.java:796)
    Shirshanka Das
    @shirshanka
    @Ziauddin135 : and you are setting GOBBLIN_JOB_CONFIG_DIR ?
    Ziauddin135
    @Ziauddin135
    yes @shirshanka if i not set we get this error
    Error: GOBBLIN_WORK_DIR or GOBBLIN_JOB_CONFIG_DIR is not set!
    vikrambohra
    @vikrambohra
    @Ziauddin135 Can you paste the exact job config file you are using here
    Ziauddin135
    @Ziauddin135
    hi, i put the application.conf in conf directory and above error is gone now i am getting this error '2020-09-25 10:02:50 UTC ERROR [JobScheduler-0] org.apache.gobblin.scheduler.JobScheduler$NonScheduledJobRunner 637 - Failed to run job GobblinPushToExternalTest
    org.apache.gobblin.runtime.JobException: Failed to run job GobblinPushToExternalTest
    at org.apache.gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:445)
    at org.apache.gobblin.scheduler.JobScheduler$NonScheduledJobRunner.run(JobScheduler.java:635)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.RuntimeException: Failed to create job launcher: java.lang.NullPointerException
    at org.apache.gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:158)
    at org.apache.gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:107)
    at org.apache.gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:85)
    at org.apache.gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:68)
    at org.apache.gobblin.scheduler.JobScheduler.buildJobLauncher(JobScheduler.java:450)
    at org.apache.gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:443)
    ... 4 more
    Caused by: java.lang.NullPointerException
    at org.apache.gobblin.runtime.AbstractJobLauncher.tryLockJob(AbstractJobLauncher.java:827)
    at org.apache.gobblin.runtime.AbstractJobLauncher.<init>(AbstractJobLauncher.java:192)
    at org.apache.gobblin.runtime.local.LocalJobLauncher.<init>(LocalJobLauncher.java:86)
    at org.apache.gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:144)
    ... 9 more'
    Ziauddin135
    @Ziauddin135
    why we are getting null for this line ? 'super(jobProps, metadataTags, instanceBroker);'
    Ziauddin135
    @Ziauddin135

    This is my application config > taskexecutor.threadpool.size=2

    taskretry.threadpool.coresize=1
    taskretry.threadpool.maxsize=2
    fs.uri=hdfs://ip:8020
    writer.fs.uri=hdfs://ip:8020
    state.store.fs.uri=hdfs://ip:8020

    writer.output.format=AVRO
    writer.staging.dir=${env:GOBBLIN_WORK_DIR}/task-staging
    writer.output.dir=${env:GOBBLIN_WORK_DIR}/task-output

    data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher
    data.publisher.final.dir=${env:GOBBLIN_WORK_DIR}/job-output
    data.publisher.replace.final.dir=false

    jobconf.dir=${env:GOBBLIN_JOB_CONFIG_DIR}
    jobconf.fullyQualifiedPath=file://${env:GOBBLIN_JOB_CONFIG_DIR}

    state.store.dir=${env:GOBBLIN_WORK_DIR}/state-store

    task.data.root.dir=${env:GOBBLIN_WORK_DIR}/task
    gobblin.runtime.commit.sequence.store.dir=${env:GOBBLIN_WORK_DIR}/commit-sequence-store

    qualitychecker.row.err.file=${env:GOBBLIN_WORK_DIR}/err

    job.lock.dir=${env:GOBBLIN_WORK_DIR}/locks

    metrics.log.dir=${env:GOBBLIN_WORK_DIR}/metrics

    metrics.enabled=true
    admin.server.enabled=true
    admin.server.port=9000

    rest.server.host=localhost
    rest.server.port=9090
    job.execinfo.server.enabled=false
    job.history.store.enabled=false
    task.status.reportintervalinms=5000
    jobconf.monitor.interval=30000

    @vikrambohra this is my job configuration :- job.name=GobblinPushToExternalTest
    job.description=Gobblin job for pushing data to S3

    data.publisher.final.dir=finaldest

    gobblin.dataset.profile.class=gobblin.data.management.copy.CopyableGlobDatasetFinder
    gobblin.dataset.pattern=/sagar/*

    fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
    writer.encrypted.fs.s3a.access.key=key
    writer.encrypted.fs.s3a.secret.key=value/3yIJ3lmrpx2
    fs.s3a.buffer.dir=/tmp/distcp-buffer-dir
    writer.fs.uri=s3a://my-bucket-30aug
    gobblin.copy.recursive.update=true
    type=hadoopJava
    job.class=gobblin.azkaban.AzkabanJobLauncher
    extract.namespace=gobblin.copy
    source.class=gobblin.data.management.copy.CopySource
    converter.classes=gobblin.converter.IdentityConverter
    writer.builder.class=gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
    data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher
    distcp.persist.dir=/tmp/distcp-persist-dir
    task.maxretries=0
    workunit.retry.enabled=false

    work.dir=/tmp/
    state.store.dir=${work.dir}/state-store
    writer.staging.dir=${work.dir}/taskStaging
    writer.output.dir=${work.dir}

    Ziauddin135
    @Ziauddin135
    these two parameters are null instanceBroker==>null metadataTags==>[]
    and job is not getting launched as due to this error
    Caused by: java.lang.RuntimeException: Failed to create job launcher: java.lang.NullPointerException
    at org.apache.gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:158)
    at org.apache.gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:107)
    at org.apache.gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:85)
    at org.apache.gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:68)
    at org.apache.gobblin.scheduler.JobScheduler.buildJobLauncher(JobScheduler.java:450)
    at org.apache.gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:443)
    vikrambohra
    @vikrambohra
    Let me try to run it locally and get back to you
    Shirshanka Das
    @shirshanka
    @Ziauddin135 : yes my bad I didn’t notice it earlier.. the application.conf file is automatically picked up from gobblin-dist/conf/standalone/application.conf. You don’t need to copy it to the conf directory… that is just for your pipeline (job) configuration file.
    @Ziauddin135 : the oneshot quick app makes this a bit easier and less error-prone (https://gobblin.readthedocs.io/en/latest/user-guide/Gobblin-CLI/#the-oneshot-quick-app)
    Ziauddin135
    @Ziauddin135
    @shirshanka sure but now i am getting error from AbstractJobLauncher class called from LocalJobLauncher class which i am not able to find causing it.is it due to i have mentioned "job.class=gobblin.azkaban.AzkabanJobLauncher" in job config? or is it due to these two parameters are null instanceBroker==>null metadataTags==>[] its not clear from standard out log file provided above
    Shirshanka Das
    @shirshanka
    @Ziauddin135 : are you trying to run a MR job or just a standalone job?
    Ziauddin135
    @Ziauddin135
    @shirshanka yes running in standalone mode only sir
    vikrambohra
    @vikrambohra
    If you are running in standalone mode can you try with LocalJobLauncher?
    vikrambohra
    @vikrambohra
    I just tried the cli with distcp option to copy a file locally and it worked
    bin/gobblin cli run distcp /Users/vbohra/gobblin_work_dir/from /Users/vbohra/gobblin_work_dir/to
    vikrambohra
    @vikrambohra
    This distcp option uses a distcp.template
    Shirshanka Das
    @shirshanka
    @Ziauddin135 : the job.class should definitely NOT be AzkabanJobLauncher since you are just running in standalone mode
    Shirshanka Das
    @shirshanka
    @Ziauddin135 I was able to run distcp after modifying your pull file a bit.. you should comment out the job.class and type
    also data.publisher.final.dir should point to an absolute path .. don’t use relative paths.. so e.g. data.publisher.final.dir=/my-bucket-30aug/finaldest
    Shirshanka Das
    @shirshanka
    Also @Ziauddin135 all your class references in the pull file (e.g. gobblin.data.management.copy.CopyableGlobDatasetFinder need to be prefixed with org.apache. , so org.apache.gobblin.data.management.copy.CopyableGlobDatasetFinder)
    andyblum
    @andyblum
    Hi all I am new here. I have a question. I want to use JUST the underlying metrics and job store portions of the framework. I am not interested in the whole way Source To Target mapping is done. Is there any interest in making the Operational subsystem a separate component? Would there be value to the Gobblin community as a whole for that portion to be contributed back as a separate packageable unit?
    Shirshanka Das
    @shirshanka
    @andyblum welcome! We’re migrating the gitter community to slack today!
    Let’s have the conversation there!
    Shirshanka Das
    @shirshanka
    @/all Gitter has served us well over the years, but we are moving the community conversations to Slack to help the community better.
    Yes, threads are important :)
    Here is the invite link for the channel: https://join.slack.com/t/apache-gobblin/shared_invite/zt-hkwu51id-aVxL3bvtLdi778YHFV1b6A
    See you there!
    Shirshanka Das
    @shirshanka
    @andyblum : let us know if you are having some trouble getting to the slack room.
    Shirshanka Das
    @shirshanka
    @/all Gitter has served us well over the years, but we are moving the community conversations to Slack to help the community better.
    Yes, threads are important :)
    Here is the invite link for the channel: https://join.slack.com/t/apache-gobblin/shared_invite/zt-hkwu51id-aVxL3bvtLdi778YHFV1b6A
    See you there!
    I’ll broadcast this message here for a few days (maybe using exponential backoff ;)) until most folks have had a chance to see it
    Shirshanka Das
    @shirshanka
    @/all Gitter has served us well over the years, but we are moving the community conversations to Slack to help the community better.
    Yes, threads are important :)
    Here is the invite link for the channel: https://join.slack.com/t/apache-gobblin/shared_invite/zt-hkwu51id-aVxL3bvtLdi778YHFV1b6A
    See you there!
    I’ll broadcast this message here for a few days (maybe using exponential backoff ;)) until most folks have had a chance to see it
    priyasharma-crypto
    @priyasharma-crypto
    Hi Everyone. i have a query regarding hivetohive copy using Gobblin
    i am getting this error
    2020-11-03 14:36:06 IST ERROR [request-allocator-0] org.apache.gobblin.data.management.copy.hive.UnpartitionedTableFileSet 84 - Source and target table are not compatible. Aborting copy of table externaltable
    org.apache.gobblin.data.management.copy.hive.HiveTableLocationNotMatchException: Desired target location file:/tmp/gobblintarget-copy and already registered target location hdfs://localhost:8020/tmp/gobblin-copy do not agree.
    can anyone help on the same please?
    if anyone had tried hive2hive copy earlier.
    Shirshanka Das
    @shirshanka
    @priyasharma-crypto : we have moved to slack. Can you join us here? https://join.slack.com/t/apache-gobblin/shared_invite/zt-isebqjkx-o9zPBz25tZbVe624s8MqQA
    Dawid
    @dawiddbb
    Cześć wszystkim. Wdrożyłem aplikację gobblin, która używa hive registration na zabezpiecznym klastrze hadoop. Gdy gobblin próbuje zainicjować HiveMetaStoreClient, leci wyjątek jak poniżej:
    2020-11-18 13:46:21 CET ERROR [HiveMetaStoreBasedRegister] org.apache.thrift.transport.TSaslTransport  - SASL negotiation failure
    javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
        at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
        at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:472)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1560)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:67)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:82)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:73)
        at org.apache.gobblin.hive.HiveMetaStoreClientFactory.createMetaStoreClient(HiveMetaStoreClientFactory.java:103)
        at org.apache.gobblin.hive.HiveMetaStoreClientFactory.create(HiveMetaStoreClientFactory.java:109)
        at org.apache.gobblin.hive.HiveMetaStoreClientFactory.create(HiveMetaStoreClientFactory.java:55)
        at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
        at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:868)
        at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
        at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
        at org.apache.gobblin.util.AutoReturnableObject.<init>(AutoReturnableObject.java:38)
        at org.apache.gobblin.hive.HiveMetastoreClientPool.getClient(HiveMetastoreClientPool.java:135)
        at org.apache.gobblin.hive.metastore.HiveMetaStoreBasedRegister.registerPath(HiveMetaStoreBasedRegister.java:138)
        at org.apache.gobblin.hive.HiveRegister$1.call(HiveRegister.java:113)
        at org.apache.gobblin.hive.HiveRegister$1.call(HiveRegister.java:97)
        at org.apache.gobblin.util.executors.MDCPropagatingCallable.call(MDCPropagatingCallable.java:42)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
        at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
        at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
        at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
        at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSConte
    Does the gobblin support kerberos secured hadoop in communicating with the hive metastor?
    Po stronie AplicationMastera widzę tylko tokeny HDFS_DELEGATION_TOKEN i YARN_AM_RM_TOKEN.
    Dawid
    @dawiddbb
    Czy ktoś mógłby mi pomóc?
    Thanks!
    Sudarshan Vasudevan
    @sv2000
    We have moved all our communication from Gitter to Slack
    Dawid
    @dawiddbb
    Thank You @sv2000.