These are chat archives for thunder-project/thunder
15/03/12 17:33:12 INFO scheduler.TaskSetManager: Lost task 3.2 in stage 51.0 (TID 2932) on executor bottou04-10g.pa.cloudera.com: org.apache.spark.api.python.PythonException (Traceback (most recent call last): File "/data/4/yarn/nm/filecache/1326/spark-assembly-1.2.1-hadoop2.4.0.jar/pyspark/worker.py", line 90, in main command = pickleSer._read_with_length(infile) File "/data/4/yarn/nm/filecache/1326/spark-assembly-1.2.1-hadoop2.4.0.jar/pyspark/serializers.py", line 151, in _read_with_length return self.loads(obj) File "/data/4/yarn/nm/filecache/1326/spark-assembly-1.2.1-hadoop2.4.0.jar/pyspark/serializers.py", line 400, in loads return cPickle.loads(obj) ImportError: No module named thunder.rdds.keys )
data.seriesStddev(), it works fine
setup.py bdist_eggor use the
Seriesobject is a pair of integers. however, not all pairs are represented, because they had absolutely no data (so they should have an array of zeros)
imgs = data.toBlocks().toImages()where
Seriesobject, but performance isn't great, and i think there may be a more efficient approach
Series.applyValues(lambda v: v[time]).pack()in a for loop over time is actually a reasonable solution and saves you a costly shuffle.
SPARK_HOMEto it locally, will the remotes get updated when I call
thunder-ec2 startor do I need to manually update it on the cluster?
ThunderContext.loadSeriesLocalmethod which creates a Series from an array stored in either a npy or mat file