These are chat archives for thunder-project/thunder

1st
Dec 2016
PieterKJ
@PieterKJ
Dec 01 2016 14:12
Hi All. I am completely new to both spark and python and I am trying to perform simple calculations on mydataset. Let's say I have a .csv file of which I want to compute the mean of each column...
The code I used so far is the following:
file1 = sc.textFile("C:/Users/PiKr/Documents/Ordina/Visionworks - sensordata/flat files/testfile.csv").map(lambda x: x.split(",")
dat = td.series.fromrdd(file1, dtype='float64')
First problem here is that the shape is (1000, 6L), so he interprets rows as columns?
I tried something simple like: dat.map(lambda x: x.mean()).toarray()
But this gives the error 'too many values to unpack'
Any halp would be appreciated :)
PieterKJ
@PieterKJ
Dec 01 2016 15:19
I get this error 'too many values to unpack' everytime I try to perform some function like, toarray() or plot()
Davis Bennett
@d-v-b
Dec 01 2016 23:56
@jwittenbach what do I need to do to my series object for this work?
from pyspark.mllib.clustering import KMeans
import thunder as td
KMeans.train(td.series.fromrandom(engine=sc).tordd(), k=2)
ah, now I remember, I can't pass a key-value-pair rdd to KMeans.train()