These are chat archives for thunder-project/thunder

May 2015
May 12 2015 06:49
Hi anyone is on line now
data = tsc.loadExample('fish-series').toTimeSeries().normalize()
I got the error:

Py4JJavaError Traceback (most recent call last)

<ipython-input-6-8c10b2caeb2c> in <module>()
1 os.chdir('/usr/local/thunder/0.5.0/python/thunder')
----> 2 data = tsc.loadExample('fish-series').toTimeSeries().normalize()

/tmp/spark-0fbfd256-b0a3-44eb-b758-fc59e2b3afb3/userFiles-fccb2c3b-cb12-497c-b50a-bfba70532ad9/thunder_python-0.5.0-py2.7.egg/thunder/utils/ in loadExample(self, dataset)
581 return self.loadSeries(tmpdir)
582 elif dataset == "fish-series":
--> 583 return self.loadSeries(tmpdir).astype('float')
584 elif dataset == "fish-images":
585 return self.loadImages(tmpdir, inputFormat="tif", npartitions=npartitions)

/tmp/spark-0fbfd256-b0a3-44eb-b758-fc59e2b3afb3/userFiles-fccb2c3b-cb12-497c-b50a-bfba70532ad9/thunder_python-0.5.0-py2.7.egg/thunder/utils/ in loadSeries(self, dataPath, nkeys, nvalues, inputFormat, minPartitions, confFilename, keyType, valueType, keyPath, varName)
94 if inputFormat.lower() == 'binary':
95 data = loader.fromBinary(dataPath, confFilename=confFilename, nkeys=nkeys, nvalues=nvalues,
---> 96 keyType=keyType, valueType=valueType)
97 elif inputFormat.lower() == 'text':
98 if nkeys is None:

/tmp/spark-0fbfd256-b0a3-44eb-b758-fc59e2b3afb3/userFiles-fccb2c3b-cb12-497c-b50a-bfba70532ad9/thunder_python-0.5.0-py2.7.egg/thunder/rdds/fileio/ in fromBinary(self, dataPath, ext, confFilename, nkeys, nvalues, keyType, valueType, newDtype, casting)
210 '',
211 '',
--> 212 conf={'recordLength': str(recordSize)})
214 data = (_, v):

/usr/local/spark/1.3.0/python/pyspark/context.pyc in newAPIHadoopFile(self, path, inputFormatClass, keyClass, valueClass, keyConverter, valueConverter, conf, batchSize)
520 jrdd = self._jvm.PythonRDD.newAPIHadoopFile(self._jsc, path, inputFormatClass, keyClass,
521 valueClass, keyConverter, valueConverter,
--> 522 jconf, batchSize)
523 return RDD(jrdd, self)

/usr/local/spark/1.3.0/python/lib/ in _call(self, *args)
536 answer = self.gateway_client.send_command(command)
537 return_value = get_return_value(answer, self.gateway_client,
--> 538 self.target_id,
540 for temp_arg in temp_args:

/usr/local/spark/1.3.0/python/lib/ in get_return_value(answer, gateway_client, target_id, name)
298 raise Py4JJavaError(
299 'An error occurred while calling {0}{1}{2}.\n'.
--> 300 format(target_id, '.', name), value)
301 else:
302 raise Py4JError(

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 23, spark-cluster-03): File file:/tmp/tmpvs6gCp/key02_00000-key01_00000-key00_00000.bin does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(

I'd like to know if in the thunder 's cluster test,the data must on hdfs system?
Jeremy Freeman
May 12 2015 16:33
@wolfbill there is a current issue that makes it so the loadExample methods, which load local example data, only work in either local mode or EC2 deployments
this is because the example data need to be made available to all workers, which is a little tricky to solve in the general case
until we solve it, you can generate test data using tsc.makeExample, or manually copy the folder thunder/python/thunder/utils/data to the workers, and then load one of the examples directly, for example using tsc.loadImages('thunder/python/thunder/utils/data/mouse/images)