I have a question about the documentation for the stopping_rounds parameter for h2o’s models: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/stopping_rounds.html
To disable this feature, specify 0. When disabled, the metric is computed on the validation data (if provided); otherwise, training data is used.
I was wondering if this was a typo; was it meant to say
when enabled rather than
when disabled? Or does this mean that if enabled, early stopping will be computed using the training data rather than the validation data?
Hi. I am running H2O with Python.
So far, what I am able to do is build a GBM model and print its data. Below is my sample code.
gbm_model = H2OGradientBoostingEstimator(ntrees=100, max_depth=4, learn_rate=0.1)
gbm_model.train(predictors, response, training_frame=trainingFrame)
Printing gbm_model displays tables of data like Scoring History and Variable Importances.
What I want to achieve is retrieve each data (with its header name) so that I can map and display those data on my own way.
So, I tried to access the Variable Importances data by looping through it.
print("Loop through Variable Importance Items")
varImp = gbm_model.varimp()
for varImpItem in varImp:
for item in varImpItem:
For additional info, gbm_model.varimp() returns a ModelBase object.
However, what was retrieved was only the data itself.
The header names (variable, relative_importance, scaled_importance, percentage) were not included for the display.
I want to ask, is there a way to retrieve the header names for this? If so, how can I do it?
I was comparing h2o.xgboost vs the native xgboost by following the instructions written in estimator_base.py:
h2o.init() training_hf = h2o.import_file("train.csv") h2o_booster = H2OXGBoostEstimator(distribution="bernoulli", seed=0, ntrees=10, max_depth=5, min_split_improvement=0.1, learn_rate=0.1, sample_rate=0.9, col_sample_rate_per_tree=0.9, min_rows=2 ) label = "response" features = training_hf.columns features.remove(label) training_hf[label] = training_hf[label].asfactor() h2o_booster.train(x=features, y=label, training_frame=training_hf) h2oPredict = h2o_booster.predict(training_hf).as_data_frame()['p1'].values nativeDMatrix = training_hf.convert_H2OFrame_2_DMatrix(features, label, h2o_booster) nativeDMatrix.feature_names = features nativeParams = h2o_booster.convert_H2OXGBoostParams_2_XGBoostParams() nativeModel = xgb.train(params=nativeParams, dtrain=nativeDMatrix, num_boost_round=nativeParams) nativePredict = nativeModel.predict(data=nativeDMatrix, ntree_limit=nativeParams)
Apparently the predictions are just very close(not because of rounding), but definitely not exactly the same. Did I miss anything from the instruction?
Moreover, the tree structure starts to diverge a lot after a few initial ones being very similar.
Has anyone managed to get XGBoost in h2o-3 to use a GPU backend when running in a docker container? And, if so, can you give me some pointers?
I'm running out of ideas and the only output I get from h2o to debug is:
ERRR on field: _backend: GPU backend (gpu_id: 0) is not functional. Check CUDA_PATH and/or GPU installation.
Is there any way to get something more verbose?
For context. I'm using Metaflow in combination with AWS Batch.
I have tested whether the container can see the GPU using pynvml and it seems to work. I also ran a test script using tensorflow-gpu and that seemed to work too. That leads me to conclude the problem is with h2o and/or cuda.
The documentation on configuring h2o to use GPUs with XGBoost is pretty limited in scope and as far as I can tell, I'm meeting the requirements.
Any help/advice much appreciated...
Hi all! Found a bug in error handling:
raise ValueError("'test_data' must be of type H2OFrame. Got: " + type(test_data))
type in Python 3.7.4 is not a string, can't concatenate
It should probably be replaced with
raise ValueError("'test_data' must be of type H2OFrame. Got: " + type(test_data).__name__)
Would anyone be able to give me some pointers on optimising resource allocation for XGBoost training in h2o? I'm sure I read in the documentation that you needed to leave a proportion of the available cpu/memory for XGBoost? Is this correct, or should I be giving H2O as much as possible?
For example. If I have an instance with 10 cpu and 100GB RAM what should I allocate directly to h2o and what, if any, should I keep free for XGBoost?
h2o.explain()features in h2o python.
FREEvalue mean? And why am I
OOMif I've got 10x as much
I'm training a GBM multi classifier and I wanted to know, what could cause the following error. Thanks
raw_df = h2o.import_file() df = h2o.deep_copy(raw_df[raw_df['x'] > 10, : ], 'df') df['split'] = df['y'] .stratified_split(test_frac=0.2, seed=1) train_valid = df[df['split'] == 'train', :].drop('split') test = df[df['split'] == 'test', :].drop('split') train_valid['col_split'] = train_valid['y'] .stratified_split(test_frac=0.2, seed=1) train = df[df['split'] == 'train', :].drop('col_split') valid = df[df['split'] == 'test', :].drop('col_split') raw_df['y'].unique().nrow => 95 df['y'].unique().nrow => 93 train['y'].unique().nrow => 93
training GBM alog with class_sampling_factors = [w1, ...., W93]
OSError: Job with key $03017f00000132d4ffffffff$_af9c11386cb765249816853dfc3d47fe failed with an exception: java.lang.IllegalArgumentException: class_sampling_factors must have 95 elements stacktrace: java.lang.IllegalArgumentException: class_sampling_factors must have 95 elements at hex.tree.SharedTree$Driver.computeImpl(SharedTree.java:244) at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:238) at water.H2O$H2OCountedCompleter.compute(H2O.java:1563) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
or the following one, when using "balance_classes": True in GBM model
OSError: Job with key $03017f00000132d4ffffffff$_acb90549c4fb00eefd9be1d55ab5448b failed with an exception: java.lang.IllegalArgumentException: Error during sampling - too few points? stacktrace: java.lang.IllegalArgumentException: Error during sampling - too few points? at water.util.MRUtils.sampleFrameStratified(MRUtils.java:309) at hex.tree.SharedTree$Driver.computeImpl(SharedTree.java:252) at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:238) at water.H2O$H2OCountedCompleter.compute(H2O.java:1563) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)