These are chat archives for kite-sdk/kite

12th
Sep 2014
Manthosh Kumar
@manthosh
Sep 12 2014 03:40
Does regular configuration take hive-site.xml from CLASSPATH? Moreover I tried keeping the configuration files in CLASSPATH, it didn't work.
Joey Echeverria
@joey
Sep 12 2014 17:16
Make sure you're putting the directory that contains the config files, not the files themselves on the classpath
for example, if hive-site.xml is at /home/joey/hive/hive-site.xml, then you want to put /home/joey/hive in the classpath, not /home/joey/hive/hive-site.xml
Manthosh Kumar
@manthosh
Sep 12 2014 18:26
It doesn't work when I try with eclipse. It throws Incomplete HDFS URI, no host: hdfs:/
Manthosh Kumar
@manthosh
Sep 12 2014 18:52
Even when I try to write directly to HDFS
Manthosh Kumar
@manthosh
Sep 12 2014 18:58
It doesn't take core-site.xml from the CLASSPATH
Joey Echeverria
@joey
Sep 12 2014 18:59
hrm
can confirm that the classpath is set correctly?
maybe do something like this:
//Get the System Classloader
ClassLoader sysClassLoader = ClassLoader.getSystemClassLoader();
    //Get the URLs
    URL[] urls = ((URLClassLoader)sysClassLoader).getURLs();

    for(int i=0; i< urls.length; i++)
    {
        System.out.println(urls[i].getFile());
    }       
    //Get the System Classloader
    ClassLoader sysClassLoader = ClassLoader.getSystemClassLoader();

    //Get the URLs
    URL[] urls = ((URLClassLoader)sysClassLoader).getURLs();

    for(int i=0; i< urls.length; i++)
    {
        System.out.println(urls[i].getFile());
    }  
you can also see if you can find it on the classpath
Manthosh Kumar
@manthosh
Sep 12 2014 19:01
I already tried that
bin folder is in classpath
Joey Echeverria
@joey
Sep 12 2014 19:02
getClass().getResource("/core-site.xml");
that should return he URL to core-site.xml
if it returns null, something is wrong with the classpath
it might also be finding an empty file that's also somehow in the classpath
Manthosh Kumar
@manthosh
Sep 12 2014 19:04
It returns the file location :D
but it's not working
Ryan Blue
@rdblue
Sep 12 2014 19:08
it will return the first core-site.xml that it finds
what you may need to do is get the list of URLs with getResources)
then you can print all of the core-site.xml files that are found. if yours comes after any other, then it won't work.
Manthosh Kumar
@manthosh
Sep 12 2014 19:10
But even configuration object will take the first core-site.xml, right??
Ryan Blue
@rdblue
Sep 12 2014 19:10
I believe so. If your core-site.xml isn't first, then it will be ignored.
Manthosh Kumar
@manthosh
Sep 12 2014 19:13
getClass().getResource("/core-site.xml"); returned the URL of the core-site.xml in the bin folder. Which has the correct configuration
Ryan Blue
@rdblue
Sep 12 2014 19:24
it looks like the classloader may not match then
Configuration uses the classloader from the current thread, or possibly Configuration.class to load the default resources
so could you get the resources for those?
also, it doesn't use /core-site.xml
it omits the "/" so you should try that, too
Manthosh Kumar
@manthosh
Sep 12 2014 19:27
Without "/",it returns null
Ryan Blue
@rdblue
Sep 12 2014 19:28
well, I guess that's progress :)
Manthosh Kumar
@manthosh
Sep 12 2014 19:30
ClassLoader cl = ClassLoader.getSystemClassLoader();
      System.out.println(cl.getResource("core-site.xml"));

returns the correct location

ClassLoader cl = ClassLoader.getSystemClassLoader();
      System.out.println(cl.getClass().getResource("core-site.xml"));

returns null

When I use '/', it's vice versa
Ryan Blue
@rdblue
Sep 12 2014 19:34
those two look the same to me, what's the difference?
Manthosh Kumar
@manthosh
Sep 12 2014 19:34
Corrected now
Ryan Blue
@rdblue
Sep 12 2014 19:35
thanks
is it really cl.getClass().getResource ?
Manthosh Kumar
@manthosh
Sep 12 2014 19:36
Yeah. It is. I checked again
Ryan Blue
@rdblue
Sep 12 2014 19:37
ok, so the system classloader works when there is no /
Manthosh Kumar
@manthosh
Sep 12 2014 19:37
Exactly
Ryan Blue
@rdblue
Sep 12 2014 19:39
So your thread's classloader must not be searching the classpath the same way
can you print your thread's classloader's classpath?
it should be
Thread.currentThread().getContextClassLoader().
Manthosh Kumar
@manthosh
Sep 12 2014 19:44
Thread.currentThread().getContextClassLoader() returns the correct location with out "/" and null with "/"
Ryan Blue
@rdblue
Sep 12 2014 19:47
ok, so both Thread classloaders and the system classloader appear to find it.
are you using a version of Kite with DefaultConfiguration?
and can you run DefaultConfiguration.get() to see what it looks like?
because it appears that it should be finding your Configuration without a problem
Manthosh Kumar
@manthosh
Sep 12 2014 19:48
I'm not using the one with DefaultConfiguration.get()
Ryan Blue
@rdblue
Sep 12 2014 19:48
no problem
Manthosh Kumar
@manthosh
Sep 12 2014 19:48
By DefaultConfiguration.get(), do you mean the one I created?
Ryan Blue
@rdblue
Sep 12 2014 19:49
no, the one that's in master
are you able to send me a dump of the Configuration you get when you run new Configuration()?
actually, are there configuration settings that you know should be in there?
if so, could you check that they are or aren't in a new Configuration()?
Manthosh Kumar
@manthosh
Sep 12 2014 20:06
Here's the dump of DefaultConfiguration.get()
Ryan Blue
@rdblue
Sep 12 2014 20:06
great. what is HDFS supposed to be?
Manthosh Kumar
@manthosh
Sep 12 2014 20:06
It turns out that this has the correct Configuration changes
Ryan Blue
@rdblue
Sep 12 2014 20:07
hdfs://nameservice1?
Manthosh Kumar
@manthosh
Sep 12 2014 20:07
Yeah
Ryan Blue
@rdblue
Sep 12 2014 20:07
ok, that's a good sign
there's a chance that the environment isn't set up when Kite loads its defaults
if that's the case, then the DefaultConfiguration changes in master should fix this problem
Manthosh Kumar
@manthosh
Sep 12 2014 20:08
Meaning?
Ryan Blue
@rdblue
Sep 12 2014 20:09
Can you try the Datasets.load command you were running before really quick, just to make sure that's the case?
Manthosh Kumar
@manthosh
Sep 12 2014 20:10
With DefaultConfiguration changes in master??? Now I tested with DefaultConfiguration class alone
Manthosh Kumar
@manthosh
Sep 12 2014 21:02
It didn't work even after trying in the source in the master
Manthosh Kumar
@manthosh
Sep 12 2014 21:16
Here's the problem
It turns out that Configuration object takes core-site.xml from the CLASSPATH but not hdfs-site.xml. That's why I think Kite should have the ability to add resources
Joey Echeverria
@joey
Sep 12 2014 21:17
oh, so it's not getting the configuration of the nameservice?
I think the correct solution is to always load hdfs-site if it's on the CLASSPATH
we should have been doing that since we're using Configuration specifically to talk to HDFS
Ryan Blue
@rdblue
Sep 12 2014 21:19
Kite needs to do that?
Seems reasonable to me
But isn't there something in HDFS that adds it to the defaults?
when HDFS is loaded, it should call Configuration#addDefaultResource()
I think what might be happening is that we call new Configuraiton() before the first call to HDFS
Joey Echeverria
@joey
Sep 12 2014 21:21
I think you're right
no
when you add a defaultResoruce
I thought it would reload in all Configuration objects
but maybe not
Ryan Blue
@rdblue
Sep 12 2014 21:22
yeah, it does if they have defaults configured
so we don't need to worry about ordering
but there are two versions of Kite to worry about
up until 0.17.0, we get a default configuration and then get HDFS information from that
with the addition of DefaultConfiguration, we don't look up HDFS information until we need to inside the FileSystem.get(...) call
so perhaps this works correctly on master?
Manthosh Kumar
@manthosh
Sep 12 2014 21:27
It didn't work for me even with master source
Ryan Blue
@rdblue
Sep 12 2014 21:28
let me see if I can get this running in a test
Manthosh Kumar
@manthosh
Sep 12 2014 21:32
Ok. Thanks
wow, I didn't know gitter would insert the whole gist
that's neat
anyway...
I added /tmp/test-classpath to the surefire plugin's config and ran that test
it gets a default config, checks that hdfs-site.xml has not been loaded
then it creates one, verifies that it is correctly loaded on the first FileSystem.get() call
and proceeds to make sure that the settings are honored by starting up HDFS, and creating/deleting a dataset
so it looks like on master, everything works as expected