These are chat archives for thunder-project/thunder

15th
Mar 2015
wolfbill
@wolfbill
Mar 15 2015 02:14
@laserson @laserson Hi man.Are you on line now
Uri Laserson
@laserson
Mar 15 2015 02:14
yip
wolfbill
@wolfbill
Mar 15 2015 02:17
Hi.I'd like to submit a thunder test to spark cluster.I run [spark@spark1 ~]$ spark-submit --master spark://spark1:7077 --deploy-mode cluster --class=/usr/local/spark110hadoop24/lib/spark-assembly-1.1.1-hadoop1.0.4.jar --py-files /usr/local/src/thunder/python/test/test_data.py
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Error: Must specify a primary resource (JAR or Python file)
Run with --help for usage help or --verbose for debug output
I don't know where the problem is ? Could you tell me ?
Uri Laserson
@laserson
Mar 15 2015 02:18
do you know about "thunder-submit"?
wolfbill
@wolfbill
Mar 15 2015 02:18
Yes I know
but only a little.I'll read more about ti
Uri Laserson
@laserson
Mar 15 2015 02:19
so, thunder-submit probably handles any necessary classpath issues
also, based on your error message, it appears you never actually told spark-submit what you want it to run
i.e., you generally need to tell spark-submit what class you want to run, or what python script to run
in your --class argument, you actually pointed to a jar
and iirc, you don't need to specify that jar anyway
wolfbill
@wolfbill
Mar 15 2015 02:28
Hi,when I run /usr/local/src/thunder/python/bin/thunder-submit --master spark://spark1:7077 --deploy-mode cluster --py-files /usr/local/src/thunder/python/test/test_data.py --verbose
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Error: Cannot load main class from JAR: spark://spark1:7077
Run with --help for usage help or --verbose for debug output
Uri Laserson
@laserson
Mar 15 2015 02:32
try something like:
export MASTER=spark:// ...
thunder-submit .../test_data.py
set master in an env var
don't bother with --deploy-mode or --py-files
i think
wolfbill
@wolfbill
Mar 15 2015 02:34
/usr/local/src/thunder/python/bin/thunder-submit /usr/local/src/thunder/python/test/test_data.py
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Error: Cannot load main class from JAR: spark://192.168.56.131:7077
Run with --help for usage help or --verbose for debug output
Uri Laserson
@laserson
Mar 15 2015 02:35
hmm, it definitely thinks you're giving it a jar file when you're supplying the master URI
can you successfully run the interactive shell?
just thunder?
wolfbill
@wolfbill
Mar 15 2015 02:38
yes。When I run path_to_thunder it runs ok
blob
Uri Laserson
@laserson
Mar 15 2015 02:42
and if you do sc.parallelize(range(10)).collect(), it works?
wolfbill
@wolfbill
Mar 15 2015 02:43
15/03/15 08:40:08 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/03/15 08:40:08 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
wolfbill
@wolfbill
Mar 15 2015 03:17
sc.parallelize(range(10)).collect()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/spark110hadoop24/python/pyspark/context.py", line 283, in parallelize
numSlices = numSlices or self.defaultParallelism
File "/usr/local/spark110hadoop24/python/pyspark/context.py", line 254, in defaultParallelism
return self._jsc.sc().defaultParallelism()
File "/usr/local/spark110hadoop24/python/lib/py4j-0.8.2.1-src.zip/py4j/javagateway.py", line 538, in _call
File "/usr/local/spark110hadoop24/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py";, line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o25.defaultParallelism.
: java.lang.NullPointerException
at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1284)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
wolfbill
@wolfbill
Mar 15 2015 03:36
@laserson hi man.I've already solve the problem.and when I type sc.parallelize(range(10)).collect().it works ok
wolfbill
@wolfbill
Mar 15 2015 06:29
Hi,I've already known that it's a bug of the thunder-submit
Jeremy Freeman
@freeman-lab
Mar 15 2015 07:00
just to clarify, turns out there was a strange bug in the argument formatting in thunder-submit that should now be fixed... curiously, the same parsing was not affecting the basic thunder script