These are chat archives for thunder-project/thunder

2nd
Feb 2017
zuxfoucault
@zuxfoucault
Feb 02 2017 12:10
more info:
/usr/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
    811         answer = self.gateway_client.send_command(command)
    812         return_value = get_return_value(
--> 813             answer, self.gateway_client, self.target_id, self.name)
    814 
    815         for temp_arg in temp_args:

/usr/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    306                 raise Py4JJavaError(
    307                     "An error occurred while calling {0}{1}{2}.\n".
--> 308                     format(target_id, ".", name), value)
    309             else:
    310                 raise Py4JError(

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 27, master
): java.io.FileNotFoundException: File file:/tmp/tmpNHOsFd/series/fish/series-00000-00000-00000.bin does not exist
Jason Wittenbach
@jwittenbach
Feb 02 2017 13:06
@zuxfoucault when loading the example data, the data does not come with the Thunder installation, so it needs to be downloaded. It looks like perhaps its having problems downloading the data.
I would try loading the example data without engine=sc. If that works, then the issue is most likely with your Spark setup. If it does not work, then it’s likely that Thunder is having problems getting out to the internet.
zuxfoucault
@zuxfoucault
Feb 02 2017 15:15
@jwittenbach Thanks for responding! I actually walked through each line in series.fromexample() to see where the problem is and just found that the problem was that the data nodes could not "see" the downloaded data (Similar to previous discussion with AlexandreLaborde in this Gitter). Once I changed the default download location to NFS where each node can see the same file system, it works fine!
zuxfoucault
@zuxfoucault
Feb 02 2017 15:41
However, I encountered another error while executing examples = series.filter(lambda x: x.std() > 6).normalize().sample(100).toarray() in tutorials-basic.
ValueError                                Traceback (most recent call last)
<ipython-input-7-d01291816368> in <module>()
----> 1 examples = series.filter(lambda x: x.std() > 6).normalize().sample(100).toarray()

/raid/opt/big/anaconda/python2/lib/python2.7/site-packages/thunder/series/series.pyc in normalize(self, method, window, perc, offset)
   1079             return (y - b) / (b + offset)
   1080 
-> 1081         return self.map(get)
   1082 
   1083     def toimages(self, size='150'):

/raid/opt/big/anaconda/python2/lib/python2.7/site-packages/thunder/series/series.pyc in map(self, func, index, value_shape, dtype, with_keys)
    191         if isinstance(value_shape, int):
    192             values_shape = (value_shape, )
--> 193         new = super(Series, self).map(func, value_shape=value_shape, dtype=dtype, with_keys=with_keys)
    194 
    195         if index is not None:

/raid/opt/big/anaconda/python2/lib/python2.7/site-packages/thunder/base.pyc in map(self, func, value_shape, dtype, with_keys)
    466         if self.mode == 'spark':
    467             expand = lambda x: array(func(x), ndmin=1)
--> 468             mapped = self.values.map(expand, axis, value_shape, dtype, with_keys)
    469             return self._constructor(mapped, mode=self.mode).__finalize__(self, noprop=('index',))
    470 

/raid/opt/big/anaconda/python2/lib/python2.7/site-packages/bolt/spark/array.pyc in map(self, func, axis, value_shape, dtype, with_keys)
    153         """
    154         axis = tupleize(axis)
--> 155         swapped = self._align(axis)
    156 
    157         if with_keys:

/raid/opt/big/anaconda/python2/lib/python2.7/site-packages/bolt/spark/array.pyc in _align(self, axis)
    111 
    112         if tokeys or tovalues:
--> 113             return self.swap(tovalues, tokeys)
    114         else:
    115             return self

/raid/opt/big/anaconda/python2/lib/python2.7/site-packages/bolt/spark/array.pyc in swap(self, kaxes, vaxes, size)
    748 
    749         if len(kaxes) == self.keys.ndim and len(vaxes) == 0:
--> 750             raise ValueError('Cannot perform a swap that would '
    751                              'end up with all data on a single key')
    752 

ValueError: Cannot perform a swap that would end up with all data on a single key
Jason Wittenbach
@jwittenbach
Feb 02 2017 15:53
@zuxfoucault glad you got the data to load! This second error looks like an actual bug. Woud you mind opening an issue on GitHub for it?
long story short, that error tells me that it’s trying to do something (a swap on the underlying distributed array, which is like a transpose that involves moving the data around your cluster) that it really shouldn’t be trying to do in this case. So we probably have a logic error somewhere.
zuxfoucault
@zuxfoucault
Feb 02 2017 16:17
@jwittenbach Sure! I opened a new issue on GitHub for this error. Not sure if the issue title precisely described the situation though!
Jason Wittenbach
@jwittenbach
Feb 02 2017 16:20
great thanks! I’ll dive into it someone in the next week :)