These are chat archives for thunder-project/thunder
Seriesmethods which assume the
valueshave numeric type (e.g.
Series.max()) break for the output of
RegressionModel, where the
valueshave type 'Object'
DataTableclass that would let us handle things like that, and it was motivated by exactly the issue you bring up: the
Regressionoutput and the desire to keep
Seriesrecords restricted to storing an
ndarraywith numeric type elements. However it came to light Spark is actually going to be replacing their
SchemaRDDobject with something very similar to what I was working on. Thus we shelved working on the
DataTablein order to see what the Spark folks come out with that we might leverage.
RegressionModel? For instance, I need to change the datatype of the output arrays to
float16, which is impossible using the
Series.astypemethod, so I use
Series.applyValuesto recast the data, but this somehow breaks
series.select, so I also have to use
Series.applyValuesto get my data out and packed...
mapoperations -- e.g. if I wanted to collect all of the things in the second slot of the records:
data.rdd.map(lambda (k, v): v).collect()
Data.applytype of functions don't touch the index, which is what
Series.selectuses to grab the right values
newType = 'float16' result = regressmodel.fit(imDat) result = result.applyValues(lambda x: [x.astype(newType), x.astype(newType), x.astype(newType)]) result.cache() betas = result.applyValues(lambda x: x) stats = result.applyValues(lambda x: x) resid = result.applyValues(lambda x: x)
betas = result.applyValues(lambda x: x)
result.select('betas')broke after I changed the types using
applyValues, you're returning a
list; Thunder wants that to be an
numpy.arrayaround the return value and it should work :smiley_cat: