Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Shankar
    @dallaybatta
    query.lang=${query_lang}
    query.per.page=${query_per_page}
    I have to change the property from the query.per.page and append yt( for youtube ingestion) to get it working. This works well in the standalone mode. Can someone figure out why this is happening, I wanted to know before I dig it further.
    Shankar
    @dallaybatta
    Luckily, I got could see what was happening, there was the property as following in the template
    page.query.parameter=${query}
    And I have to change this to
    page.query.parameter=${query_1}
    It took many hours to figure it out ;) It is a candidate for the FAQ's , an edge use case ;)
    Hung Tran
    @htran1
    @dallaybatta, are you using .pull extension in standalone mode? I think if you use .conf extension then you should have the same problem.
    Shankar
    @dallaybatta
    I am using .pull extension in the standalone mode.
    We are using own simple rest endpoint, the code looks like this
    @RestLiCollection(name = "joblauncher", namespace = "gobblin.rest")
    public class JobLauncherResource extends CollectionResourceTemplate<String, JobLauncherInfo> {
    
        private static final Logger LOGGER = LoggerFactory.getLogger(JobLauncherResource.class);
    
        static HashMap<String, Properties> jobPropertyMap = new HashMap<String, Properties>();
    
        // export GOBBLIN_JOB_CONFIG_DIR=/home/vickey/development/gobblin/gobblin-dist/job-config-bpu/wikipedia.pull
        // http://localhost:8080/joblauncher?q=triggerjob&jobId=wikipedia&search=india&propfile=gobblin-standalone.properties
        @Finder("triggerjob")
        public List<JobLauncherInfo> search(@PagingContextParam PagingContextParam context,
                                            @QueryParam("jobId") String jobId,
                                            @QueryParam("propfile") String propertiesFile,
                                            @QueryParam("fbToken") @Optional String fbToken,
                                            @QueryParam("twToken") @Optional String twToken,
                                            @QueryParam("twTokenSecret") @Optional String twTokenSecret,
                                            @QueryParam("nvToken") @Optional String nvToken,
                                            @QueryParam("fkToken") @Optional String fkToken,
                                            @QueryParam("fkTokenSecret") @Optional String fkTokenSecret,
                                            @QueryParam("ytToken") @Optional String ytToken,                        
                                            @QueryParam("streamId") String streamId,
                                            @QueryParam("search") String search,
                                            @QueryParam("appPayLoad") String appPayLoad)
                throws Exception {
            String status = "jobId " + jobId + " search " + search;
            List<JobLauncherInfo> results = new ArrayList<JobLauncherInfo>();
            //String env_gobblin_work_dir = System.getenv("GOBBLIN_WORK_DIR");
            String env_gobblin_job_dir = System.getenv("GOBBLIN_JOB_CONFIG_DIR");
            /*
            if(env_gobblin_work_dir == null){
                throw new RuntimeException("Environment property for GOBBLIN_WOParameterRK_DIR is not set.");            
            }
            */
            if (env_gobblin_job_dir == null) {
                throw new RuntimeException("Environment property for GOBBLIN_JOB_CONFIG_DIR is not set.");
            }
            String sysFilePropPath = getConfigDirectory(env_gobblin_job_dir) + "/conf/" + propertiesFile;
            String jobFilePropPath = env_gobblin_job_dir + "/" + jobId + ".pull";
    
            // Get it from the cache
            Properties sysProp = jobPropertyMap.get(propertiesFile);
            if (sysProp == null) {
                sysProp = ConfigurationConverter.getProperties(new PropertiesConfiguration(sysFilePropPath));
                jobPropertyMap.put(propertiesFile, sysProp);
            }
            // Get it from the cache
            Properties jobProp = jobPropertyMap.get(jobId);
            if (jobProp == null) {
                jobProp = ConfigurationConverter.getProperties(new PropertiesConfiguration(jobFilePropPath));
                jobPropertyMap.put(jobId, jobProp);
            }
            // Not sure if we need this one. This needs to be confirmed in the forums discussion.
            synchronized (this) {
                //ConfigurationConverter.getProperties(new PropertiesConfiguration(sysFilePropPath));
                //ConfigurationConverter.getProperties(new PropertiesConfiguration(jobFilePropPath));
                if (jobProp.get("page.query.parameter") != null) {
                    jobProp.put("page.query.parameter", search);
                }
    
                if (isNotEmpty(streamId)) {
                    jobProp.put("stream.id", streamId);
                }
                if (isNotEmpty(fbToken)) {
                    jobProp.put("facebook.access.token", fbToken);
                }
                if (isNotEmpty(twToken, twTokenSecret)) {
                    jobProp.put("twitter.access.token", twToken);
                    jobProp.put("access.token.secret", twTokenSecret);
                }
                if (isNotEmpty(nvToken
    I have local tests too which runs fine with the pull, these are unit tests
    Shankar
    @dallaybatta
    public class YTTest2Impl {
    
        private Properties launcherProps = new Properties();
    
        @BeforeClass
        public void startUp() throws Exception {
            this.launcherProps = ConfigurationConverter.getProperties(new PropertiesConfiguration("gobblin-standalone.properties"));
        }
    
        @Test
        public void testLaunchJob2Impl() throws Exception {
            Properties jobProps = loadJobProps();
            jobProps.setProperty(ConfigurationKeys.JOB_NAME_KEY,
                    jobProps.getProperty(ConfigurationKeys.JOB_NAME_KEY) + "-testLaunchJob2Impl");
            try {
                runTest(jobProps);
            } finally {
                jobProps.clear();
            }
        }
    
        static Properties loadJobProps() throws IOException {
            Properties jobProps = new Properties();
            try {
                jobProps = ConfigurationConverter.getProperties(new PropertiesConfiguration("yt-v1.pull"));
            } catch (ConfigurationException e) {
                e.printStackTrace();
            }
            return jobProps;
        }
    
        public void runTest(Properties jobProps) throws Exception {
            String jobName = jobProps.getProperty(ConfigurationKeys.JOB_NAME_KEY);
            String jobId = JobLauncherUtils.newJobId(jobName);
            jobProps.setProperty(ConfigurationKeys.JOB_ID_KEY, jobId);
            Closer closer = Closer.create();
            try {
                JobLauncher jobLauncher = closer.register(JobLauncherFactory.newJobLauncher(this.launcherProps, jobProps));
                jobLauncher.launchJob(null);
            } finally {
                closer.close();
            }
        }
    
    }
    So there are three runtimes that I am working on
    1) The custom rest endpoint which is plugged in as the service in the gobblin, it is like a gobblin service.
    2) The Unit test which I have shared, it uses the standalone mode.
    3) I am now trying migrating to GAAS with docker as here https://github.com/dallaybatta/gaas-docker
    The gaas-docker with the Standalone cluster would be required for addressing our scaling needs. However the configuration/installation is complex, however the gaas-docker making things easier for development.
    I will have to do more work about the documentation which I will do once I am done with my migration which is nearing to end almost.
    Shankar
    @dallaybatta
    Does flow status call work?
    I am trying this from the shell script
    And it is failing with the following error in the GAAS node.
    java.lang.IllegalArgumentException: invalid version format: -X GET -H 'X-RESTLI-PROTOCOL-VERSION: 2.0.0' HTTP/1.1
        at io.netty.handler.codec.http.HttpVersion.<init>(HttpVersion.java:121)
        at io.netty.handler.codec.http.HttpVersion.valueOf(HttpVersion.java:76)
        at io.netty.handler.codec.http.HttpRequestDecoder.createMessage(HttpRequestDecoder.java:86)
        at io.netty.handler.codec.http.HttpObjectDecoder.decode(HttpObjectDecoder.java:219)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:372)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:358)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:350)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:372)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:358)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:571)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:512)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:426)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:398)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:877)
        at java.lang.Thread.run(Thread.java:748)
    The flow status call fails with the above error https://github.com/dallaybatta/gaas-docker/blob/master/test.sh
    Abhishek Tiwari
    @abti
    the flow status will need a bit more setup to work
    @sv2000
    @sv2000
    tagging Sudarshan
    Shankar
    @dallaybatta
    @abti Ah I see, we need to fix the flowstatuses working as that will help in confirming when the flow execution is over.
    Abhishek Tiwari
    @abti
    basically, the flowstatus works but involves a bit of setup, its configured to read and respond to status from a DB .. I think the implementation is not open sourced since its for our Espresso KV store based
    we have a streaming job that runs and reads the Kafka events from jobs being completed (that were triggered by GaaS) and writes to that DB
    that streaming job runs on cluster .. so essentially eating our own dog food :smiley:
    Shankar
    @dallaybatta
    @abti There shoud be some way to plugin the custom implementation, if someone can share the starting information I can continue from it.
    Shankar
    @dallaybatta
    @all Anyone interested in running the existing application via the GAAS, you can use the gaas-docker. You can read this to do so https://github.com/dallaybatta/gaas-docker/tree/master/examples
    Abhishek Tiwari
    @abti
    Arjun is probably going to work a bit on it, I will check with him
    Shankar
    @dallaybatta
    @abti ok thanks.
    Shankar
    @dallaybatta
    what does multi-hop exactly mean here(https://issues.apache.org/jira/browse/GOBBLIN-491), does it mean flow across multiple GAAS deployments?
    Was looking at the multi-hop here https://en.wikipedia.org/wiki/Multi-hop_routing
    Abhishek Tiwari
    @abti
    lets chat about this tomorrow in our call.
    Shankar
    @dallaybatta
    Does GobblinClusterManager uses JobCataog to schedule the jobs when configured in the GAAS? I am looking at the code haven't yet found the linking however I expect the flow invocation from the user to the GAAS should be passed to the Cluster Master which stores the JobSpec into the JobCatalog (org.apache.gobblin.runtime.job_catalog.NonObservingFSJobCatalog). Anyone to share some quick points?
    Shankar
    @dallaybatta
    Ok, got it. The EventBus is making the communication from StreamingJobConfigurationManager::fetchJobSpecs() via this call postNewJobConfigArrival(jobSpec.getUri().toString(), jobSpec.getConfigAsProperties());
    Shivam Singh
    @isequal2
    I am trying to set-up gobblin on Yarn. but getting this error. WARN [ZKHelixAdmin] Root directory exists.Cleaning the root directory:/GobblinYarn
    Exception in thread "main" java.net.ConnectException: Call From to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    what I understand from documentation that my fs.uri might not be set properly
    If you guys have faced this your help is appreciated.
    Dawid Dawid
    @dawid.bartel85_gitlab
    Hello, could anyone help me solve this problem ?
    2019-01-07 20:03:31 CET INFO  [DefaultQuartzScheduler_Worker-1] org.apache.gobblin.cluster.HelixUtils  - Waiting for job job_dot-i-json-dot-avro_1546887000021 to complete... State - IN_PROGRESS
    2019-01-07 20:03:31 CET INFO  [AMRM Heartbeater thread] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing over to rm365
    2019-01-07 20:03:31 CET WARN  [AMRM Heartbeater thread] org.apache.hadoop.security.UserGroupInformation  - PriviledgedActionException as:gobblin (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1544639802277_223258_000001
    2019-01-07 20:03:31 CET WARN  [AMRM Heartbeater thread] org.apache.hadoop.ipc.Client$Connection$1  - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1544639802277_223258_000001
    2019-01-07 20:03:31 CET WARN  [AMRM Heartbeater thread] org.apache.hadoop.security.UserGroupInformation  - PriviledgedActionException as:gobblin (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1544639802277_223258_000001
    2019-01-07 20:03:31 CET INFO  [AMRM Heartbeater thread] org.apache.hadoop.io.retry.RetryInvocationHandler  - Exception while invoking allocate of class ApplicationMasterProtocolPBClientImpl over rm365 after 796 fail over attempts. Trying to fail over immediately.
    org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid AMRMToken from appattempt_1544639802277_223258_000001
            at sun.reflect.GeneratedConstructorAccessor83.newInstance(Unknown Source)
            at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
            at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
            at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
            at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
            at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
            at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:483)
            at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
            at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
            at com.sun.proxy.$Proxy18.allocate(Unknown Source)
            at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277)
            at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224)
    Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1544639802277_223258_000001
            at org.apache.hadoop.ipc.Client.call(Client.java:1502)
            at org.apache.hadoop.ipc.Client.call(Client.java:1439)
            at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
            at com.sun.proxy.$Proxy17.allocate(Unknown Source)
            at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
            ... 8 more
    2019-01-07 20:03:31 CET INFO  [AMRM Heartbeater thread] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing over to rm180
    2019-01-07 20:03:31 CET INFO  [AMRM Heartbeater thread] org.apache.hadoop.io.retry.RetryInvocationHandler  - Exception while invoking allocate of class ApplicationMasterProtocolPBClientImpl over rm180 after 797 fail over attempts. Trying to fail over after sleeping for 2662ms.
    java.net.ConnectException: Call From node-35.hadoop.xxx/xx.xx.xx.xx to rm-1.xxx:8030 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
            at sun.reflect.GeneratedConstructorAccessor82.newInstance(Unknown Source)
            at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
            at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
            at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
            at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
            at org.apache.hadoop.ipc.Client.call(Client.java:1506)
            at org.apache.hadoop.ipc.Client.call(Client.java:1439)
            at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
            at com.sun.proxy.$Proxy17.allocate(Unknown Source)
            at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
            at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:483)
            at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
            at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
            at com.sun.proxy.$Proxy18.allocate(Unknown Source)
            at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277)
            at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224)
    Caused by: java.net.ConnectException: Connection refused
            at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
            at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
            at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
            at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
            at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
            at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:648)
            at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:744)
            at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396)
            at org.apache.hadoop.ipc.Client.getConnection(Client.java:1555)
            at org.apache.hadoop.ipc.Client.call(Client.java:1478)
            ... 12 more
    2019-01-07 20:03:32 CET INFO  [DefaultQuartzScheduler_Worker-1] org.apache.gobblin.cluster.HelixUtils  - Waiting for job job_dot-i-json-dot-avro_1546887000021 to complete... State - IN_PROGRESS
    I have application working in cluster mode and accessing hadoop secured with kerberos
    This error occur periodically after some time ... during running app.
    Sorry, for my English ;)