These are chat archives for thunder-project/thunder
RegressionModelreturns an "incorrect" data object
RegressionModel.fitreturn a tuple of Series objects, called
DataTableclass so we can use it as a return value here. The nuts and bolts are really straight-forward as it's just a bunch of joins and maps. Then when the Spark community gets around to releasing the newer version of the
SchemaRDD, we would only need to change the back-end, as the API should stay the same. Thoughts?
betas, stats, resid = RegressionModel.fit(data) b = betas.pack() s = stats.pack()
did you mean something else? agreed that to e.g. select one conditional on the other you need a join
b = betas.pack(sorting=True) s = stats.pack(sorting=True)
sorting=True, would it makes sense to return the as well? (I'm thinking about plotting cell-based results)
RDD.joinworks, but how much speed to we lose by doing the sorting on the driver after the
k, v = betas.collectAsArray()
sorting=Truefor the collect methods, just for packing
k, v = betas.collectAsArray(sorting=True)and
k, v = stats.collectAsArray(sorting=True)which will give local arrays with both properly sorted
DataTablein the future, then even that minor annoyance will go away.