Oct 2015
Oct 02 2015 08:15
@PhC-PhD But in that case why would thunder find my images in the first place? I mean... images0000.tif is the first image of my dataset so it doesnt make that much sense to me.
I have tried what you said and it doesnt seem to work.. I started thunder from ~ and my files are in ~/p1. Called tsc.convertImagesToSeries('p1','p3',inputFormat='tif') and this was the output for each node in my cluster.
15/10/02 09:06:04 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 12, org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/manager/Downloads/spark/python/lib/", line 111, in main process()
  File "/home/manager/Downloads/spark/python/lib/", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile)
  File "/home/manager/Downloads/spark/python/lib/", line 263, in dump_streamvs = list(itertools.islice(iterator, batch))
  File "/home/user/Downloads/spark/python/pyspark/", line 1273, in takeUpToNumLeft
  File "/tmp/spark-d3de9d20-2ff5-4512-8757-36e407af6e16/userFiles-25f16584-bf78-4fab-ab90-2a88048ea630/thunder_python-0.5.1-py2.7.egg/thunder/rdds/fileio/", line 193, in <lambda>
  File "./thunder_python-0.5.1-py2.7.egg/thunder/rdds/fileio/", line 118, in _localRead raise FileNotFoundError(e) FileNotFoundError: [Errno 2] No such file or directory: 'p1/images0000.tif'

    at org.apache.spark.api.python.PythonRDD$$anon$
    at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.executor.Executor$
    at java.util.concurrent.ThreadPoolExecutor.runWorker(
    at java.util.concurrent.ThreadPoolExecutor$
Oct 02 2015 15:35
@rmchurch what's your reqirements? Just change the version of hadoop? So you don't need to build from src?
Philippe Castonguay
Oct 02 2015 15:44

@AlexandreLaborde Hum, have you tried to open this file with another command than 'convertImagesToSeries'?

E.g. open(PATH_TO_IMG)

Just to make sure that regular functions can find the file.
Michael Churchill
Oct 02 2015 18:09
@timberonce Only requirement is I have a machine with a Spark v1.4 installation with Hadoop v2.x, I can't install my own. But the pip package of thunder-python only works with Spark with Hadoop v1.x right? That's why I had to install Thunder from Github directly.
Michael Churchill
Oct 02 2015 19:21

@timberonce I spoke too soon when I thought I installed correctly from GH. I can run the "thunder" script on a single node just fine, but when I have worker nodes, calling "thunder --master $SPARKURL" I get an error:

python: can't open file '': [Errno 2] No such file or directory

This is because it can't find the thunder_python egg, so I'm missing an install step. Sorry, this all stems from not understanding the Python flow probably. The egg is there in the thunder folder created from GH (~/thunder), but its not present where my site-packages are

Michael Churchill
Oct 02 2015 21:10
OK, I now understand the Thunder build from source routine (and understand Python deployment better), "pip install" doesn't create a binary egg, you have to do it yourself using "python clean bdist_egg", and copy the resulting thunder/dist/thunder-python*.egg to <site-packages>/thunder/lib. Or you can run the helper script in thunder/bin/build to do it for you.