These are chat archives for thunder-project/thunder
I have run into another issue running Thunder on my University's cluster environment. An error seems to be occurring when I try to convert my images to a series, no matter how I try to do it.
I have successfully performed image registration for a large tif stack. I am able to save the output using saveAsBinaryImages(). However, if I take the same structure and try saveAsBinarySeries(), I get an error along the lines of:
File "/share/pkg/spark/1.5.0/install/python/lib/pyspark.zip/pyspark/serializers.py", line 269, in dump_stream
IOError: [Errno 28] No space left on device
I also get the same error if I load in the above saved binaries in a fresh instance, and try to convert them using toTimeSeries().normalize().
That seems to suggest a memory problem? However, I have run this on the nodes with the largest available memory on our cluster (256GB). I may be able to access 512GB nodes with special permission, but I am getting the feeling that won't solve my problem.
As suggested by @freeman-lab above, I have also added export _JAVA_OPTIONS="-Xms512m -Xmx4g” to my ~/.bash_profile as well. Is it likely that my Java is needing more than 4GB for converting images to series? I have upped the 4g and will be running it again overnight to see if that fixes the issue, but I'd be happy to know if there are other things I can consider trying.
IOError: [Errno 28] No space left on devicemy first thought is that it's storage, not memory, and you need to make sure you have sufficient storage on all nodes
shuffle, which writes temporary data to disk
Imagesobjects not support multi-indexing? When I try to convert using
toSeries(), I get an error in imgblocks/strategy.py:
imgSlices = [slice(timepoint, timepoint+1, 1)] + list(blockSlices) TypeError: can only concatenate tuple (not "int") to tuple
@freeman-lab Thanks. I found I had to set SPARK_LOCAL_DIRS to a different folder on our cluster for using temporary/scratch space. That has helped curb that error.
Two more questions I have come across:
Does tsc.loadImages() or other tsc.load functions automatically parallelize the data it loads, or does it need to be parallelized afterwards? I was under the impression that it will parallelize it when it loads it. However, when running Thunder on our cluster, I am reserving a node with 16 slots, but everything is being done on one of them.
When I submit my job, I use qsub -pe omp 16 to request a 16 slot node. I then use thunder-submit --master local myscript.py. I believe that thunder should treat this as running it on a local machine with 16 slots, but it should recognize it can be run in parallel. I then define tsc = ThunderContext.start(appName='myscript'). When I then use tsc.loadImages() everything is run on 1 slot, rather than being parallel across the 16. Do I need to import parallelize from spark to parallelize my loaded data?
This may be related to above, but after updating the SPARK_LOCAL_DIRS, now when I run saveAsBinarySeries, I am getting an OutOfMemoryError: Java Heap Space. I would anticipate this may be since the data is trying to compute on only one node and keeps running out of space. Would there be anything else to consider in working to debug this error?