These are chat archives for thunder-project/thunder

2nd
Oct 2015
alexandrelaborde
@AlexandreLaborde
Oct 02 2015 08:15
@PhC-PhD But in that case why would thunder find my images in the first place? I mean... images0000.tif is the first image of my dataset so it doesnt make that much sense to me.
I have tried what you said and it doesnt seem to work.. I started thunder from ~ and my files are in ~/p1. Called tsc.convertImagesToSeries('p1','p3',inputFormat='tif') and this was the output for each node in my cluster.
...
15/10/02 09:06:04 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 12, 10.40.11.152): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/manager/Downloads/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process()
  File "/home/manager/Downloads/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile)
  File "/home/manager/Downloads/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_streamvs = list(itertools.islice(iterator, batch))
  File "/home/user/Downloads/spark/python/pyspark/rdd.py", line 1273, in takeUpToNumLeft
  File "/tmp/spark-d3de9d20-2ff5-4512-8757-36e407af6e16/userFiles-25f16584-bf78-4fab-ab90-2a88048ea630/thunder_python-0.5.1-py2.7.egg/thunder/rdds/fileio/readers.py", line 193, in <lambda>
  File "./thunder_python-0.5.1-py2.7.egg/thunder/rdds/fileio/readers.py", line 118, in _localRead raise FileNotFoundError(e) FileNotFoundError: [Errno 2] No such file or directory: 'p1/images0000.tif'

    at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
    at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
...
timberonce
@timberonce
Oct 02 2015 15:35
@rmchurch what's your reqirements? Just change the version of hadoop? So you don't need to build from src?
Philippe Castonguay
@PhABC
Oct 02 2015 15:44

@AlexandreLaborde Hum, have you tried to open this file with another command than 'convertImagesToSeries'?

E.g. open(PATH_TO_IMG)

Just to make sure that regular functions can find the file.
Michael Churchill
@rmchurch
Oct 02 2015 18:09
@timberonce Only requirement is I have a machine with a Spark v1.4 installation with Hadoop v2.x, I can't install my own. But the pip package of thunder-python only works with Spark with Hadoop v1.x right? That's why I had to install Thunder from Github directly.
Michael Churchill
@rmchurch
Oct 02 2015 19:21

@timberonce I spoke too soon when I thought I installed correctly from GH. I can run the "thunder" script on a single node just fine, but when I have worker nodes, calling "thunder --master $SPARKURL" I get an error:

python: can't open file 'setup.py': [Errno 2] No such file or directory

This is because it can't find the thunder_python egg, so I'm missing an install step. Sorry, this all stems from not understanding the setup.py Python flow probably. The egg is there in the thunder folder created from GH (~/thunder), but its not present where my site-packages are

Michael Churchill
@rmchurch
Oct 02 2015 21:10
OK, I now understand the Thunder build from source routine (and understand Python deployment better), "pip install" doesn't create a binary egg, you have to do it yourself using "python setup.py clean bdist_egg", and copy the resulting thunder/dist/thunder-python*.egg to <site-packages>/thunder/lib. Or you can run the helper script in thunder/bin/build to do it for you.