These are chat archives for kite-sdk/kite

15th
Sep 2014
Manthosh Kumar
@manthosh
Sep 15 2014 06:59
I think this is the problem
Hadoop.isHadoop1() returns true, even though I'm using Hadoop 2 jars
Manthosh Kumar
@manthosh
Sep 15 2014 07:40
The problem was avro-tools.jar had higher priority in CLASSPATH. I corrected the order. Now Hadoop.isHadoop1() returns false. But even now hdfs-site.xml is not loaded :(
Ryan Blue
@rdblue
Sep 15 2014 16:18
@manthosh, I think I know what the problem is
I have a line in the test that has a comment, "force FileSystem to load hdfs-site.xml"
the first time FileSystem is used, it loads that config file
Setting up a minicluster will cause it to load that config file, so the test isn't exactly like your environment, where the cluster is already running.
So I set up a minicluster on the right port in a different process, removed all of the FS references, and I could reproduce your error when I tried to load a dataset.
But adding the FileSystem.getLocal(c)line back fixed it
Could you try adding FileSystem.getLocal(new Configuration())to the start of your app and see if that fixes the hdfs-site.xml loading issue?
We'll fix this is Kite by loading the local FS when the Datasets API is first used
Manthosh Kumar
@manthosh
Sep 15 2014 18:21
/**
 * Copyright 2013 Cloudera Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.kitesdk.examples.data;

import java.io.File;
import java.io.FileOutputStream;
import java.util.Random;

import org.apache.avro.generic.GenericData.Record;
import org.apache.avro.generic.GenericRecordBuilder;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.kitesdk.compat.Hadoop;
import org.kitesdk.data.Dataset;
import org.kitesdk.data.DatasetDescriptor;
import org.kitesdk.data.DatasetWriter;
import org.kitesdk.data.Datasets;
import org.kitesdk.data.spi.DefaultConfiguration;

/**
 * Create a dataset on the local filesystem and write some user objects to it,
 * using Avro generic records.
 */
public class CreateUserDatasetGeneric extends Configured implements Tool {
  private static final String[] colors = { "green", "blue", "pink", "brown", "yellow" };

  @Override
  public int run(String[] args) throws Exception {

    System.out.println("Is hadoop 1 ? "+Hadoop.isHadoop1());

    // Create a dataset of users with the Avro schema
    DatasetDescriptor descriptor = new DatasetDescriptor.Builder()
        .schema(new File("user.avsc"))
        .build();

    //Writing the conf to a file. hdfs-site.xml properties are not found in the output
    Configuration conf = DefaultConfiguration.get();
    conf.writeXml(new FileOutputStream("conf.xml")); 

    Dataset<Record> users = Datasets.create(
        "dataset:hdfs:/tmp/data/test/users", descriptor, Record.class);

    // Get a writer for the dataset and write some users to it
    DatasetWriter<Record> writer = null;
    try {
      writer = users.newWriter();
      Random rand = new Random();
      GenericRecordBuilder builder = new GenericRecordBuilder(descriptor.getSchema());
      for (int i = 0; i < 100; i++) {
        Record record = builder.set("username", "user-" + i)
            .set("creationDate", System.currentTimeMillis())
            .set("favoriteColor", colors[rand.nextInt(colors.length)]).build();
        writer.write(record);
      }
    } finally {
      if (writer != null) {
        writer.close();
      }
    }

    return 0;
  }

  public static void main(String... args) throws Exception {
    FileSystem.getLocal(new Configuration());
    int rc = ToolRunner.run(new CreateUserDatasetGeneric(), args);
    System.exit(rc);
  }
}
This throws the same error as before.
hdfs-site.xml is still not loaded
Ryan Blue
@rdblue
Sep 15 2014 18:26
what Hadoop version are you using? I just found that this doesn't work in our default configuration, but it does when we run it against the CDH5 profile
Manthosh Kumar
@manthosh
Sep 15 2014 18:27
CDH5.1.0
Ryan Blue
@rdblue
Sep 15 2014 19:02
I just ran my test again for 5.1.0 and it works
after you load the local FS, what is the setting for fs.defaultFS?
Ryan Blue
@rdblue
Sep 15 2014 19:30
sorry, I was wrong. the fix does work for me in the default and cdh5 profiles
Manthosh Kumar
@manthosh
Sep 15 2014 19:39
hdfs://nameservice1
Since I'm not using maven, will this be some dependency issue?
Manthosh Kumar
@manthosh
Sep 15 2014 19:56
It was a dependency issue indeed
Ryan Blue
@rdblue
Sep 15 2014 19:57
I'm not sure
did you get it working?
Manthosh Kumar
@manthosh
Sep 15 2014 19:57
Ordering. After moving the hadoop-hdfs up in the CLASSPATH everything worked
Even without FileSystem.getLocal(new Configuration())
Probably avro-tools was using hadoop1 files and it was searching for hadoop-site.xml
Anyways, thanks all
Ryan Blue
@rdblue
Sep 15 2014 19:59
No problem, I'm glad it's working
do you mind if I close that pull request?
Manthosh Kumar
@manthosh
Sep 15 2014 20:00
But I didn't give one
Ryan Blue
@rdblue
Sep 15 2014 20:01
oh, I must have been looking at a branch
Manthosh Kumar
@manthosh
Sep 15 2014 20:01
yeah