haifengl on master
add KMedoidsImputer and remove … (compare)
haifengl on master
fix missing value detection bug (compare)
haifengl on master
refactor SVDImputer refactor LLSImputer refactor imputation test and 1 more (compare)
haifengl on master
move names()/type()/measures() … add KNNImputer (compare)
new_col = lambda each_row: math.sqrt((3*each_row.first_column)**4 + (3*each_row.second_column)**4+ (3*each_row.third_column)**4)
my_dataframe.apply(new_col)
import smile.data.formula.Formula;
import static smile.data.formula.Terms.*;
df.stream().map(row -> {
or something?
The response variable Classification doesn't exist in the schema [...]
. Should I pay attention to this message? How do I get rid of it?
sterr = true
smile.validation.metric
package. The classification algorithms report posteriori probabilities if they are SoftClassifier
. Some algorithms also have a score()
method, which is not necessarily probabilities though.
Hi, I have a question about the new(?) OLS.fit() method. If I try to use it to predict housing sale prices, like OLS.fit(Formula.lhs("SalePrice"), X_train_dataframe), it fails with "no response variable".
So it seems I can only use OLS if I pass it a dataframe that includes X and y together. However, if I use it like "OLS.fit(Formula.lhs("SalePrice"), training_dataframe), which includes both the dependent and independent columns (X and y), then predict requires an array of the same size, including a column for the value I wish to predict.
However, doing that and passing, say, 0 for the y value results in wildly incorrect predictions, and changing the value affects predictions. Is there not a way to use OLS without comingling X and y?