These are chat archives for thunder-project/thunder
Also, could anyone please tell me how the memory allocation is actually done in Thunder? Just a brief primer would help. I tried to run a 400MB dataset locally on a computer with 16GB of RAM. I tried loadImagesAsSeries and it filled up my RAM and started overflowing into the swap! I tried to do the same thing by first loadImages and then converting to series. The loading works fine but the conversion is where the memory usage just explodes. Why is this? I am using a series of .tif files for the images.
Also, when I downsampled the temporal sampling by 3, the data loaded fine as a series (about 5GB of memory used), but then trying to run ICA constantly gave me an error where Java was running out of heap space. Is there a place in the source code where I can set the heap usage limit?
Are these problems simply a result of running Thunder on a local machine? Would running on EC2 eliminate such issues? In any case, it would be great to understand how the memory is actually being handled. Any reference in this regard would be a great help! Thanks a lot!
Could you please let me know what the changes to pyspark were? Did you just use a prebuilt Spark for Hadoop 1.x? Which version did you use?
I think this info could be quite useful for people trying to get it run on Windows. May also be helpful to include this info on the Thunder website. Thanks a lot! :)
java -Xms512m -Xmx1024m .\<yourapplication>will set your initial and max heap size to 500 MB and 1 GB, respectively. If you want to make the change permanent, you can add those flags to a system level environment variable called
PYSPARK_PYTHON(especially, if you have multiple python versions installed), and perhaps others. Your labmate's error message should clear things up.
pipsupport for installing pyspark, so there are a lot of places the manual installation can go awry. I'm still having trouble getting thunder to run from the command-line, because Windows doesn't recognize it as an executable or python code (the
thunderfile in \bin has no file extension). If I run pyspark and then
from thunder import ThunderContextto assign
tsc, everything works fine, but it's a bit roundabout. It might be necessary to add a short .exe file to the bin to make things usable for Windows.
Thanks a lot, @GrantRVD and @freeman-lab. I'll get my labmate to post his issue on Github soon, most likely tomorrow. If/when it works on Windows, I can make a list of steps that we did, in case it is helpful for others in the future. We only have Python 2.7 installed through Anaconda.
As for the Java error, I set an environment variable in Ubuntu as a fix. But now it seems that I am getting another error that I don't understand. I am posting the full log as an issue on Github here: thunder-project/thunder#172
Again, thanks a lot for your help and for making Thunder :)
from thunder import ThunderContext. I only tried to run thunder as a command and then realized that it isn't an executable. I'll try this method and post here soon. Thanks again!
@vjlbym Give it a shot. Specifically, cd your command line to the pyspark directory and run
.\pyspark.exe. Then, once you successfully get to the IPython prompt, use
from thunder import ThunderContext tsc = ThunderContext(sc)
Then try to run the ICA example from the thunder homepage. If that completes successfully then I'd say you're about ready to go.