by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jun 19 2018 00:00
    @ledell banned @renatomarinho
Juan C Rodriguez
@jcrodriguez1989
Hi, is it possible to give my own sort_metric function for AutoML?
If not, which one do you think could be similar to MAPE?
1 reply
Jordan Bentley
@jbentleyEG
@jakubhava for sparklyr, if I switch to an external backend instead of internal, could I expect more stability?
and can it generally use the same resources (cpus) as a cluster on the same machine as long as spark isn't doing any heavy lifting at the same time as H2O?
Jakub Háva
@jakubhava
@jbentleyEG Regarding backends, you should fine the answer here, if not, please let us know https://docs.h2o.ai/sparkling-water/2.4/latest-stable/doc/deployment/backends.html
btw: Can I suggest using sparkling-water channel for Sparkling Water related questions?
Simon Schmid
@SimonSchmid
Hello all,
when will version 3.30.0.4 of h2o-scala be available? I cannot find the jar to download. See https://mvnrepository.com/artifact/ai.h2o/h2o-scala
Michal Kurka
@michalkurka

@SimonSchmid we are in the process of deprecating this module, for your project you can either build the artifact from h2o-3 sources or just keep using 3.30.0.3 artifact (there was no change).

h2o-scala was primarily meant for Sparkling Water project, which no longer relies on it

What is your use case for h2o-scala?

Simon Schmid
@SimonSchmid
I was actually using it for sparkling water and was about to update to the latest version. Seems like I don't need the dependency anymore in this case. Thanks!
Michal Kurka
@michalkurka
yeah, for Sparkling Water you don’t need it
razou
@razou
Hello,
I wanted a way to perform zero-padding on a string column (with numbers like 1,2, 43, …)
But unable do it with apply:
df['x'].ascharacter().apply(lambda x: x.zfill(2), axis=1).head(10)
ValueError: Unimplemented: op <zfill> not bound in H2OFrame
Any idea ?
Thanks
Apply function seems to work only with limited number of operations
Michal Kurka
@michalkurka
you are correct, apply only works for certain built-in functions; using an arbitrary function is not supported
razou
@razou
Thanks @michalkurka for your answer
Do you have any on how could do such operations (i.e applying an arbitray function to each row of given frame)
Michal Kurka
@michalkurka

you can convert the column to Pandas, apply your transformation, then convert back to to H2OFrame and cbind with the original frame

this is cumbersome but H2O currently doesn’t have the functionality to execute any python code on the backend

razou
@razou

you can convert the column to Pandas, apply your transformation, then convert back to to H2OFrame and cbind with the original frame

this is cumbersome but H2O currently doesn’t have the functionality to execute any python code on the backend

That’s I have done, but I wanted to avoid doing that convertion because working with pandas with large datasets is very is very slow, not like frames

Michal Kurka
@michalkurka
right, if your use case involves specific data munging that we don’t support you might want to consider using Sparkling Water - do the munging in Spark and ML in H2O
razou
@razou
That would the best solution. Thanks
Kriszhou1
@Kriszhou1
I don't know how to make h20.ai access to IFrame,Anyone can help me solve the problem?
John
@wafflejohnh_twitter

Hi all, when running autoML
h2o.automl.get_leaderboard(aml, seems to return leaderboard with training metrics instead of the xval metrics described in the docs. Is this a known issue? I haven't been able to find whats causing this - Appreciate any pointers from the fellow experts (Y) Thanks!

autoML run is set up as such:
aml = H2OAutoML(max_models=20, seed=21, nfolds=5, sort_metric='rmse', keep_cross_validation_predictions=True, keep_cross_validation_models=False, project_name = project_name) aml.train( y=target, training_frame=hf_train)

Erin LeDell
@ledell
@wafflejohnh_twitter can you provide a reproducible exmaple please? they should be the CV metrics...
razou
@razou

Hi,
I’m guetting strange behaviors when reading data (folder containing many csv files) from Amazon S3 with import_file

  • I generate my data with Spark + Scala and write it into csv format in an S3 folder
  • When loading that data with h2o.import_file(path=s3_path, pattern=".*\.csv”) I’m gueting some data of the column 28 int the column 1 and some data of the column 29 int the column 2, …

Does this function have a limitation in number of columns or did I missided on an addtional option, or it’s due to something else ?

Thanks

John
@wafflejohnh_twitter
@ledell I rerun the automl call in a new kernel and it is showing xval in the leaderboard now. Thanks for confirming. Could it be if a previous autoML run disables xval, leaderboard will only show training scores on subsequent models generated with xval/without xval scores?
9 replies
razou
@razou
It seems that na_omit() function removes rows contaning at least one NA entry ? Does it exist an option to remove only rows where all entries are NA ?
Thanks
akshayi1
@akshayi1
Are there any plans to introduce walk-forward validation for timeseries predictions?
razou
@razou

Hi
I wanted to perform a frequency encoding on categorical features in a given frame
Suppose that I have a dictionary where keys are categorical columns and values are their frequencies. How can one replace each category with its frequency on the entire frame

Example with pandas data frame

fe = h2oDF_fe.groupBy(col).size()/len(fe)
h2oDF_fe.loc[:, col_fe] = h2oDF_fe[col].map(fe)

Is there for h2o frames an equivalent function to pandas dataframe's map function

10 replies
Juan C Rodriguez
@jcrodriguez1989
Is there a way to disable GPU backend for AutoML?
On my ubuntu laptop, when xgboost is included in AutoML, it is crashing.
If I run just xgboost(..., backend = "auto") it is also crashing.
However, xgboost(..., backend = "cpu") works.
I've tried with several h2o versions.
I could provide a simple reprex and session info.
2 replies
razou
@razou
Hi,
For xgboost() only one_hot_encoding seems to work whathever the specified encoding (e.g . even when 'categorical_encoding': 'enum' => the model uses one hot encoding ) Is it correct ?
3 replies
Chrinide
@chrinide
Is there any possibility to implement Rotation Forest in H2O? Based on H2O's great Random Forest and Principal Component Analysis algorithms, It seems that very easy to get a Rotation Forest? A report in arxiv https://arxiv.org/abs/1809.06705 shows that Rotation Forest is the best classifier for problems
with continuous features.
1 reply
razou
@razou

Hi
With a dictionary (imputer_dict) where keys are the same as my frame's columns I wanted to fill NAs in a frame with that dictionary's values as it is possible with pandas dataframes (e.g DataFrame.fillna(value = imputer_dict))

For each column (x) in my frame I wanted to replace its NAs with its value in imputer_dict (i.e imputer_dict[x])

If anybody have an idea

Thanks

12 replies
razou
@razou

Hi
I was trying to impute a frame with dictionary (each feature with it’s imputation value) :
frame.impute(values=[imputer_dict[col] for col in frame.columns])
I wanted to skip some columns (e.g x) by adding None as value in imputer_dict as said in the documentation and I’m guetting th following error

The documenation is confused to me. Anybody has an idea of what caused the error

2 replies
Gabriel Fields
@gabfields02

Hi,

I have a question.
A quick background on what I did with H2O is that I am able to perform predictions in my own Java environment using the MOJO that I downloaded from H2O.

I am planning to use the outputs (Scoring History, Variable Importances, etc) from the model that I built in H2O.
I want to use this data and display them in my own development environment.
I have checked H2O's documentation (http://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/index.html).
However, I can't seem to find any sample that shows this.

I want to ask. Is it possible to retrieve and display this data using the MOJO?
If so, how can I do this? Is there a documentation reference I can look into for this?

Regards,
Gabriel

jwolf9
@jwolf9

Hi,
I have a question about the documentation for the stopping_rounds parameter for h2o’s models: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/stopping_rounds.html

It says: To disable this feature, specify 0. When disabled, the metric is computed on the validation data (if provided); otherwise, training data is used.

I was wondering if this was a typo; was it meant to say when enabled rather than when disabled? Or does this mean that if enabled, early stopping will be computed using the training data rather than the validation data?

Thanks!

1 reply
Oscar Pan
@OscarDPan
image.png
Hi I was playing with h2o glm (binomial) and this is the predictions. Is it wrong? Why p1 is so small but the predict label are all "1"?
Gabriel Fields
@gabfields02

Hi. I am running H2O with Python.

So far, what I am able to do is build a GBM model and print its data. Below is my sample code.

gbm_model = H2OGradientBoostingEstimator(ntrees=100, max_depth=4, learn_rate=0.1)
gbm_model.train(predictors, response, training_frame=trainingFrame)
print(gbm_model)

Printing gbm_model displays tables of data like Scoring History and Variable Importances.
What I want to achieve is retrieve each data (with its header name) so that I can map and display those data on my own way.
So, I tried to access the Variable Importances data by looping through it.

print("Loop through Variable Importance Items")
varImp = gbm_model.varimp()

for varImpItem in varImp:
for item in varImpItem:
print(item)
print(" ")

For additional info, gbm_model.varimp() returns a ModelBase object.

However, what was retrieved was only the data itself.
The header names (variable, relative_importance, scaled_importance, percentage) were not included for the display.

I want to ask, is there a way to retrieve the header names for this? If so, how can I do it?

razou
@razou
Hi @gabfields02
You can retrieve each data from this command gbm_model._model_json['output’]
  • For varibale importance:
    gbm_model._model_json['output’]['variable_importances’]
    And if you want it in a dataframe
var_imp = gbm_model._model_json[‘output’][‘variable_importances'].as_data_frame()
11 replies
lohralexander
@lohralexander
Hi,
I am using EasyPredicModelWrapper to run trained Models.
For numerical and categorical input values this works fine.
But I noticed that there are problems with time columns: Time is not converted automatically, as in the flow interface, but an error is thrown.
Do Time Columns need to be addressed in a special way?
Regards
Oscar Pan
@OscarDPan

Hi,
I was comparing h2o.xgboost vs the native xgboost by following the instructions written in estimator_base.py:

        h2o.init()
        training_hf = h2o.import_file("train.csv")
        h2o_booster = H2OXGBoostEstimator(distribution="bernoulli",
                                          seed=0,
                                          ntrees=10,
                                          max_depth=5,
                                          min_split_improvement=0.1,
                                          learn_rate=0.1,
                                          sample_rate=0.9,
                                          col_sample_rate_per_tree=0.9,
                                          min_rows=2
                                          )
        label = "response"
        features = training_hf.columns
        features.remove(label)
        training_hf[label] = training_hf[label].asfactor()

        h2o_booster.train(x=features, y=label, training_frame=training_hf)
        h2oPredict = h2o_booster.predict(training_hf).as_data_frame()['p1'].values

        nativeDMatrix = training_hf.convert_H2OFrame_2_DMatrix(features, label, h2o_booster)
        nativeDMatrix.feature_names = features
        nativeParams = h2o_booster.convert_H2OXGBoostParams_2_XGBoostParams()
        nativeModel = xgb.train(params=nativeParams[0], dtrain=nativeDMatrix, num_boost_round=nativeParams[1])
        nativePredict = nativeModel.predict(data=nativeDMatrix, ntree_limit=nativeParams[1])

Apparently the predictions are just very close(not because of rounding), but definitely not exactly the same. Did I miss anything from the instruction?

Moreover, the tree structure starts to diverge a lot after a few initial ones being very similar.

2 replies
SURAJ BHAGAT
@surajenv_twitter
Hello
I am trying to run: localH2O = h2o.init(ip="localhost", port = 54321, startH2O = TRUE, nthreads=-1) BUT GETTING THIS Error: Error in h2o.init(ip = "localhost", port = 54321, startH2O = TRUE, nthreads = -1) :
H2O failed to start, stopping execution.
Erin LeDell
@ledell
@surajenv_twitter theres not enough info. please search on Stack Overflow first... i think this question has been solved a few times already
caomi8888
@caomi8888
Hi,everyone . i'm new here.Actually i wanna know .if there were any cases of H2O that had something to do with fault diagnosis for construction machines like excavator.Thanks a lot for your answers!:)
Gabriel Fields
@gabfields02
Hello,
Is there a way to run one H2O instance and access that same instance from different servers?
If so, how can I do it? Would it also be possible when running H2O with Python?
Thanks.
Owen Ball
@ob83_gitlab

Hi All,

Has anyone managed to get XGBoost in h2o-3 to use a GPU backend when running in a docker container? And, if so, can you give me some pointers?

I'm running out of ideas and the only output I get from h2o to debug is:

ERRR on field: _backend: GPU backend (gpu_id: 0) is not functional. Check CUDA_PATH and/or GPU installation.

Is there any way to get something more verbose?

For context. I'm using Metaflow in combination with AWS Batch.

  • The AWS batch AMI is a slightly customised build on top of the ECS GPU Optimised AMI which includes nvidia driver 418.87.00
  • The container image is built on nvidia/cuda runtime centos 7. I've tried every version of CUDA, but currently have cuda 8 and then install the the cuda 9 libraries
  • /usr/local/cuda symlink points to cuda 9
  • CUDA_PATH and LD_LIBRARY_PATH are set
  • h2o version is 3.30.1.1
  • Packages and Python 3 version are all managed with conda environment
  • Instance type is p3.2xlarge with nvidia Tesla

I have tested whether the container can see the GPU using pynvml and it seems to work. I also ran a test script using tensorflow-gpu and that seemed to work too. That leads me to conclude the problem is with h2o and/or cuda.

The documentation on configuring h2o to use GPUs with XGBoost is pretty limited in scope and as far as I can tell, I'm meeting the requirements.

Any help/advice much appreciated...

Thanks

9 replies
wprucknic
@wprucknic
Im not sure where the H2o suggestion box is, but would it be possible to impose monotonicity on gam_columns?
DennisKr
@DennisKr
Hey, is there a possibility to get the same dateformat parsing from the frame parsing process (water.parser.ParseTime) also with the EasyPredictModelWrapper? Right now it's a bit tedious to use time types with a mojo, as it expects a unix timestamp and not the dateformat which was originally uploaded. Therefore the training data or rather data in the same original format can't be directly used for predictions...
Gabriel Fields
@gabfields02
Hello, does anyone know how to start two instances of H2O in Python? I have only found documentation in R. I was wondering if there is a documentation in Python for this.
9 replies
Seiji Kumagai
@skumagai
Hi, I'd like to use context_path as an argument in h2o.init() in python, so I made a pull request #4911. Can somebody review it and let me know the next step I need to take?
Tom Roderick
@tomrod-pcci

Hi all! Found a bug in error handling:

  • h2o\model\model_base.py
  • Line 378:
    raise ValueError("'test_data' must be of type H2OFrame. Got: " + type(test_data))

type in Python 3.7.4 is not a string, can't concatenate
It should probably be replaced with
raise ValueError("'test_data' must be of type H2OFrame. Got: " + type(test_data).__name__)

1 reply
I'm not sure what OS contribution looks like, couldn't see any obvious path in github