These are chat archives for thunder-project/thunder

Nov 2015
Nov 25 2015 04:05

To answer my own questions above:

  1. I believe loadImages() does not parallelize because images cannot be parallelized in Spark. Is this correct? Once I load the images as binaries, things seemed to parallelize correctly. @freeman-lab Could you confirm that this is the case about loading images?

  2. To solve the Java Heap Space OutOfMemoryError, I added --driver-memory #G to thunder-submit, where # was the amount of memory on the node I reserved through submission on the cluster. This is without using export _JAVA_OPTIONS="-Xms512m -Xmx4g" to my ~/.bash_profile. Would there be an additional reason to include the additional _JAVA_OPTIONS environment variable?

@goonetilleke I anticipate you are installing via pip, correct?
Michael Churchill
Nov 25 2015 16:08
@kkcthans If you follow the source code for loadImages(), you will find it does do a sc.parallelize over the file paths in fileio/ in def read