These are chat archives for thunder-project/thunder

31st
Mar 2015
Gilles Vanwalleghem
@Yassum
Mar 31 2015 00:21
Ok thanks, then I'll try playing around with the gaussian and position to see if I get anything out of it.
tomsains
@tomsains
Mar 31 2015 16:02
Hey, finding the new release documentation really useful.
But I am having an issue with calling the 'subset()' function on my tif files.
It is returning the following error
Py4JJavaError: An error occurred while calling o439.collect.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 58.0 failed 1 times, most recent failure: Lost task 0.0 in stage 58.0 (TID 80, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/Users/MeyerLab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/worker.py", line 79, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/Users/MeyerLab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/serializers.py", line 196, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/Users/MeyerLab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/serializers.py", line 127, in dump_stream
    for obj in iterator:
  File "/Users/MeyerLab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/serializers.py", line 185, in _batched
    for item in iterator:
  File "/Users/MeyerLab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/rddsampler.py", line 115, in func
    if self.getUniformSample(split) <= self._fraction:
  File "/Users/MeyerLab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/rddsampler.py", line 57, in getUniformSample
    self.initRandomGenerator(split)
  File "/Users/MeyerLab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/rddsampler.py", line 43, in initRandomGenerator
    self._random = numpy.random.RandomState(self._seed)
  File "mtrand.pyx", line 610, in mtrand.RandomState.__init__ (numpy/random/mtrand/mtrand.c:7397)
  File "mtrand.pyx", line 646, in mtrand.RandomState.seed (numpy/random/mtrand/mtrand.c:7697)
ValueError: Seed must be between 0 and 4294967295
Jeremy Freeman
@freeman-lab
Mar 31 2015 21:33
Thanks for reporting this @tomsains ! This was a tricky one, it turns out there was a bug in Spark itself in 1.1 (that I actually fixed), and earlier versions of Thunder provided a work around. When the Spark bug was fixed (in 1.2+), the workaround no longer worked, so I got rid of it. But that's causing an error on Spark 1.1 builds! I just posted and issue, and committed a change that solves the problem across all Spark versions.
In short, had you been running on Spark 1.3 you would've been fine, but now it should work regardless