Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jun 19 2018 00:00
    @ledell banned @renatomarinho
Oscar Pan
@OscarDPan
image.png
Hi I was playing with h2o glm (binomial) and this is the predictions. Is it wrong? Why p1 is so small but the predict label are all "1"?
Gabriel Fields
@gabfields02

Hi. I am running H2O with Python.

So far, what I am able to do is build a GBM model and print its data. Below is my sample code.

gbm_model = H2OGradientBoostingEstimator(ntrees=100, max_depth=4, learn_rate=0.1)
gbm_model.train(predictors, response, training_frame=trainingFrame)
print(gbm_model)

Printing gbm_model displays tables of data like Scoring History and Variable Importances.
What I want to achieve is retrieve each data (with its header name) so that I can map and display those data on my own way.
So, I tried to access the Variable Importances data by looping through it.

print("Loop through Variable Importance Items")
varImp = gbm_model.varimp()

for varImpItem in varImp:
for item in varImpItem:
print(item)
print(" ")

For additional info, gbm_model.varimp() returns a ModelBase object.

However, what was retrieved was only the data itself.
The header names (variable, relative_importance, scaled_importance, percentage) were not included for the display.

I want to ask, is there a way to retrieve the header names for this? If so, how can I do it?

razou
@razou
Hi @gabfields02
You can retrieve each data from this command gbm_model._model_json['output’]
  • For varibale importance:
    gbm_model._model_json['output’]['variable_importances’]
    And if you want it in a dataframe
var_imp = gbm_model._model_json[‘output’][‘variable_importances'].as_data_frame()
11 replies
lohralexander
@lohralexander
Hi,
I am using EasyPredicModelWrapper to run trained Models.
For numerical and categorical input values this works fine.
But I noticed that there are problems with time columns: Time is not converted automatically, as in the flow interface, but an error is thrown.
Do Time Columns need to be addressed in a special way?
Regards
Oscar Pan
@OscarDPan

Hi,
I was comparing h2o.xgboost vs the native xgboost by following the instructions written in estimator_base.py:

        h2o.init()
        training_hf = h2o.import_file("train.csv")
        h2o_booster = H2OXGBoostEstimator(distribution="bernoulli",
                                          seed=0,
                                          ntrees=10,
                                          max_depth=5,
                                          min_split_improvement=0.1,
                                          learn_rate=0.1,
                                          sample_rate=0.9,
                                          col_sample_rate_per_tree=0.9,
                                          min_rows=2
                                          )
        label = "response"
        features = training_hf.columns
        features.remove(label)
        training_hf[label] = training_hf[label].asfactor()

        h2o_booster.train(x=features, y=label, training_frame=training_hf)
        h2oPredict = h2o_booster.predict(training_hf).as_data_frame()['p1'].values

        nativeDMatrix = training_hf.convert_H2OFrame_2_DMatrix(features, label, h2o_booster)
        nativeDMatrix.feature_names = features
        nativeParams = h2o_booster.convert_H2OXGBoostParams_2_XGBoostParams()
        nativeModel = xgb.train(params=nativeParams[0], dtrain=nativeDMatrix, num_boost_round=nativeParams[1])
        nativePredict = nativeModel.predict(data=nativeDMatrix, ntree_limit=nativeParams[1])

Apparently the predictions are just very close(not because of rounding), but definitely not exactly the same. Did I miss anything from the instruction?

Moreover, the tree structure starts to diverge a lot after a few initial ones being very similar.

2 replies
SURAJ BHAGAT
@surajenv_twitter
Hello
I am trying to run: localH2O = h2o.init(ip="localhost", port = 54321, startH2O = TRUE, nthreads=-1) BUT GETTING THIS Error: Error in h2o.init(ip = "localhost", port = 54321, startH2O = TRUE, nthreads = -1) :
H2O failed to start, stopping execution.
Erin LeDell
@ledell
@surajenv_twitter theres not enough info. please search on Stack Overflow first... i think this question has been solved a few times already
caomi8888
@caomi8888
Hi,everyone . i'm new here.Actually i wanna know .if there were any cases of H2O that had something to do with fault diagnosis for construction machines like excavator.Thanks a lot for your answers!:)
Gabriel Fields
@gabfields02
Hello,
Is there a way to run one H2O instance and access that same instance from different servers?
If so, how can I do it? Would it also be possible when running H2O with Python?
Thanks.
Owen Ball
@ob83_gitlab

Hi All,

Has anyone managed to get XGBoost in h2o-3 to use a GPU backend when running in a docker container? And, if so, can you give me some pointers?

I'm running out of ideas and the only output I get from h2o to debug is:

ERRR on field: _backend: GPU backend (gpu_id: 0) is not functional. Check CUDA_PATH and/or GPU installation.

Is there any way to get something more verbose?

For context. I'm using Metaflow in combination with AWS Batch.

  • The AWS batch AMI is a slightly customised build on top of the ECS GPU Optimised AMI which includes nvidia driver 418.87.00
  • The container image is built on nvidia/cuda runtime centos 7. I've tried every version of CUDA, but currently have cuda 8 and then install the the cuda 9 libraries
  • /usr/local/cuda symlink points to cuda 9
  • CUDA_PATH and LD_LIBRARY_PATH are set
  • h2o version is 3.30.1.1
  • Packages and Python 3 version are all managed with conda environment
  • Instance type is p3.2xlarge with nvidia Tesla

I have tested whether the container can see the GPU using pynvml and it seems to work. I also ran a test script using tensorflow-gpu and that seemed to work too. That leads me to conclude the problem is with h2o and/or cuda.

The documentation on configuring h2o to use GPUs with XGBoost is pretty limited in scope and as far as I can tell, I'm meeting the requirements.

Any help/advice much appreciated...

Thanks

9 replies
wprucknic
@wprucknic
Im not sure where the H2o suggestion box is, but would it be possible to impose monotonicity on gam_columns?
DennisKr
@DennisKr
Hey, is there a possibility to get the same dateformat parsing from the frame parsing process (water.parser.ParseTime) also with the EasyPredictModelWrapper? Right now it's a bit tedious to use time types with a mojo, as it expects a unix timestamp and not the dateformat which was originally uploaded. Therefore the training data or rather data in the same original format can't be directly used for predictions...
Gabriel Fields
@gabfields02
Hello, does anyone know how to start two instances of H2O in Python? I have only found documentation in R. I was wondering if there is a documentation in Python for this.
22 replies
Seiji Kumagai
@skumagai
Hi, I'd like to use context_path as an argument in h2o.init() in python, so I made a pull request #4911. Can somebody review it and let me know the next step I need to take?
Tom Roderick
@tomrod-pcci

Hi all! Found a bug in error handling:

  • h2o\model\model_base.py
  • Line 378:
    raise ValueError("'test_data' must be of type H2OFrame. Got: " + type(test_data))

type in Python 3.7.4 is not a string, can't concatenate
It should probably be replaced with
raise ValueError("'test_data' must be of type H2OFrame. Got: " + type(test_data).__name__)

1 reply
I'm not sure what OS contribution looks like, couldn't see any obvious path in github
Sandeep Kunsoth
@sandeepkunsoth000_gitlab
hi all i am getting this error when running on vm. raise H2OServerError("HTTP %d %s:\n%r" % (status_code, response.reason, data)). h2o.exceptions.H2OServerError: HTTP 400 Bad Request:
h20 version: 3.30.0.6. it worked somedays before same code is not working now please help currently its working in local only
7 replies
schwannden
@schwannden
image.png
When I am training with XGBoost in H2O (non GPU version), and when I list all frames, I found that the train and validation frame within each CV fold contain exactly the same number of rows, is this a mistake in CV splitting?
7 replies
Owen Ball
@ob83_gitlab

Would anyone be able to give me some pointers on optimising resource allocation for XGBoost training in h2o? I'm sure I read in the documentation that you needed to leave a proportion of the available cpu/memory for XGBoost? Is this correct, or should I be giving H2O as much as possible?

For example. If I have an instance with 10 cpu and 100GB RAM what should I allocate directly to h2o and what, if any, should I keep free for XGBoost?

2 replies
Gabriel Fields
@gabfields02
Hi. I want to ask, how can I get a list of all the model IDs in my cluster in Python?
I can successfully retrieve a model using h2o.get_model(model_id).
However, I am manually inputting or hard coding the model_id.
Other than H2O Flow (Models >> List All Models), I want to know if there is a way to list all the model IDs in the cluster. Thanks.
2 replies
hududed
@hududed
Hi all, I was wondering if anyone had success in using h2o.explain() features in h2o python.
Basically the feature is non-existent in 3.30.1.3 - which is the latest version for python ?
I created an issue here https://h2oai.atlassian.net/jira/software/c/projects/PUBDEV/issues/PUBDEV-7850?jql=project%20%3D%20%22PUBDEV%22%20ORDER%20BY%20created%20DESC
1 reply
Gabriel Fields
@gabfields02
Hi. I just want to clarify. Is importing a MOJO for scoring the same as using an imported binary model with h2o.load_model(yourModel)?
1 reply
Chen Kepeng
@kpchen
Hi all, I want to know how can I do the i18n thing on h2o flow web UI? any clues is appreciated.
razou
@razou
Hello, I wanted to know ho to perform stratified sampling in h2o ?
1 reply
PlymouthUniversityStudent
@AidanConnelly

When importing from JBDC what should the connection string look like?

jdbc:postgresql://172.17.0.1:5432/mot

No?

2 replies
DennisKr
@DennisKr
Hey, I wanted to ask if there is any chance this feature request may be implemented https://h2oai.atlassian.net/browse/PUBDEV-7700?
PlymouthUniversityStudent
@AidanConnelly
(K/V:13.2 MB + POJO:17.1 MB + FREE:464.6 MB == MEM_MAX:494.9 MB), desiredKV=2.15 GB OOM
What does the FREE value mean? And why am I OOM if I've got 10x as much FREE as K/V and POJO?
razou
@razou
Hello
I wanted how to perform resampling (with or without replacement ) in H2O ? If it native function for that
The purpose is to down sample or over sample target feature's classes in imbalanced data for classification.
Thank you
Igor Trpovski
@igor_trpovski_gitlab
Hi everyone,
When I run grid search with parallelism set to 0 or some value different than 1 it always hangs. In other words, after some time the progress bar can reach 100% but the grid never finishes. It happens with both the Cartesian and RandomDiscrete strategies. The model that I'm using is GBM and the cross validation folds are specified through fold_column.
When I run the grid with the default value for parallelism, some parameter combinations fail due to the dataset being very small (<100 rows) but the grid finishes.
Did anyone have a similar problem? I don't know how to debug this issue.
4 replies
razou
@razou

Hi
I'm training a GBM multi classifier and I wanted to know, what could cause the following error. Thanks

raw_df  = h2o.import_file()
df  =  h2o.deep_copy(raw_df[raw_df['x'] > 10, : ], 'df')

df['split']  = df['y'] .stratified_split(test_frac=0.2,  seed=1) 
train_valid = df[df['split'] == 'train', :].drop('split')
test  = df[df['split'] == 'test', :].drop('split')


train_valid['col_split']  = train_valid['y'] .stratified_split(test_frac=0.2,  seed=1) 

train = df[df['split'] == 'train', :].drop('col_split')
valid  = df[df['split'] == 'test', :].drop('col_split')


raw_df['y'].unique().nrow => 95
df['y'].unique().nrow => 93
train['y'].unique().nrow => 93

training GBM alog with class_sampling_factors = [w1, ...., W93]

OSError: Job with key $03017f00000132d4ffffffff$_af9c11386cb765249816853dfc3d47fe failed with an exception: java.lang.IllegalArgumentException: class_sampling_factors must have 95 elements
stacktrace: 
java.lang.IllegalArgumentException: class_sampling_factors must have 95 elements
    at hex.tree.SharedTree$Driver.computeImpl(SharedTree.java:244)
    at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:238)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1563)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

or the following one, when using "balance_classes": True in GBM model

OSError: Job with key $03017f00000132d4ffffffff$_acb90549c4fb00eefd9be1d55ab5448b failed with an exception: java.lang.IllegalArgumentException: Error during sampling - too few points?
stacktrace: 
java.lang.IllegalArgumentException: Error during sampling - too few points?
    at water.util.MRUtils.sampleFrameStratified(MRUtils.java:309)
    at hex.tree.SharedTree$Driver.computeImpl(SharedTree.java:252)
    at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:238)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1563)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
2 replies
Mwiza
@kundaMwiza
Hi all. H20 says in the documentation that splitting on a feature for regression gbms is based on the reduction in squared error. Is this squared error based on the node residuals, ie (resid - mean resid)^2 or is it the true response, ie (response - mean response). Im using gamma/ poisson distributions..
razou
@razou
Hello,
Does calibration (with h2o lib) works only for binary classification ?
Thanks