Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jun 19 2018 00:00
    @ledell banned @renatomarinho
razou
@razou
Hello, I wanted to know ho to perform stratified sampling in h2o ?
1 reply
PlymouthUniversityStudent
@AidanConnelly

When importing from JBDC what should the connection string look like?

jdbc:postgresql://172.17.0.1:5432/mot

No?

2 replies
DennisKr
@DennisKr
Hey, I wanted to ask if there is any chance this feature request may be implemented https://h2oai.atlassian.net/browse/PUBDEV-7700?
PlymouthUniversityStudent
@AidanConnelly
(K/V:13.2 MB + POJO:17.1 MB + FREE:464.6 MB == MEM_MAX:494.9 MB), desiredKV=2.15 GB OOM
What does the FREE value mean? And why am I OOM if I've got 10x as much FREE as K/V and POJO?
razou
@razou
Hello
I wanted how to perform resampling (with or without replacement ) in H2O ? If it native function for that
The purpose is to down sample or over sample target feature's classes in imbalanced data for classification.
Thank you
Igor Trpovski
@igor_trpovski_gitlab
Hi everyone,
When I run grid search with parallelism set to 0 or some value different than 1 it always hangs. In other words, after some time the progress bar can reach 100% but the grid never finishes. It happens with both the Cartesian and RandomDiscrete strategies. The model that I'm using is GBM and the cross validation folds are specified through fold_column.
When I run the grid with the default value for parallelism, some parameter combinations fail due to the dataset being very small (<100 rows) but the grid finishes.
Did anyone have a similar problem? I don't know how to debug this issue.
4 replies
razou
@razou

Hi
I'm training a GBM multi classifier and I wanted to know, what could cause the following error. Thanks

raw_df  = h2o.import_file()
df  =  h2o.deep_copy(raw_df[raw_df['x'] > 10, : ], 'df')

df['split']  = df['y'] .stratified_split(test_frac=0.2,  seed=1) 
train_valid = df[df['split'] == 'train', :].drop('split')
test  = df[df['split'] == 'test', :].drop('split')


train_valid['col_split']  = train_valid['y'] .stratified_split(test_frac=0.2,  seed=1) 

train = df[df['split'] == 'train', :].drop('col_split')
valid  = df[df['split'] == 'test', :].drop('col_split')


raw_df['y'].unique().nrow => 95
df['y'].unique().nrow => 93
train['y'].unique().nrow => 93

training GBM alog with class_sampling_factors = [w1, ...., W93]

OSError: Job with key $03017f00000132d4ffffffff$_af9c11386cb765249816853dfc3d47fe failed with an exception: java.lang.IllegalArgumentException: class_sampling_factors must have 95 elements
stacktrace: 
java.lang.IllegalArgumentException: class_sampling_factors must have 95 elements
    at hex.tree.SharedTree$Driver.computeImpl(SharedTree.java:244)
    at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:238)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1563)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

or the following one, when using "balance_classes": True in GBM model

OSError: Job with key $03017f00000132d4ffffffff$_acb90549c4fb00eefd9be1d55ab5448b failed with an exception: java.lang.IllegalArgumentException: Error during sampling - too few points?
stacktrace: 
java.lang.IllegalArgumentException: Error during sampling - too few points?
    at water.util.MRUtils.sampleFrameStratified(MRUtils.java:309)
    at hex.tree.SharedTree$Driver.computeImpl(SharedTree.java:252)
    at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:238)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1563)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
2 replies
Mwiza
@kundaMwiza
Hi all. H20 says in the documentation that splitting on a feature for regression gbms is based on the reduction in squared error. Is this squared error based on the node residuals, ie (resid - mean resid)^2 or is it the true response, ie (response - mean response). Im using gamma/ poisson distributions..
razou
@razou
Hello,
Does calibration (with h2o lib) works only for binary classification ?
Thanks
razou
@razou
hello
How to import/load mojo or load_moddel() without printing the whole model content in the console/standard output ?
7 replies
Antonio Pinto
@byo-ai
Byo.ai an intelligent assistant to make people carbon neutral/positive. Anyone with experience with one or more general purpose programming languages including but not limited to: Python, Java, C/C++ (also Pytorch,) feel free to send your CV to work@byo.ai (passion for the environment, clean technologies and artificial intelligence is a plus!)
razou
@razou
Hello
I wanted to pad numeric string column with zeros to the left:
Here is a solution with pandas dataframes
df[col] = df0[col].apply(lambda x: str(x).zfill(2) if x is not None else "00")
What could be the equivalent in H2O
Thanks
3 replies
Naeemah Small
@naeemahaz
Hello
I have an issue with h2o.
Naeemah Small
@naeemahaz
ModuleNotFoundError: No module named 'h20' after installing it
razou
@razou
It's seems like, you have a typo. Instead of h2o you wrote h20 with (zero)
razou
@razou
Hello,
I'm using stratified split based on y column and there are only single entries for some values of y
How can I force those entries to be in train instead of validation
Tiago Magnus
@tiagomagnusss
Hello! I'm trying to download the bin of an h2o model through the API, what is the correct way to store it on disk so I can import it later through the API? (I'm using Python 3.8)
Of all the encodings I tried, only UTF-8 and UTF-16 worked, but they scrape off the magic number at the start (this error message: "Missing magic number 0x1CED at stream start")
razou
@razou
Hello
frame[x] = frame[x].asfactor() take more 400ms. Is it possible to reduce this time ?
Thanks
razou
@razou
Another question: convert pandas.DataFrame onto h2o.H2OFrame is somewhat expensive (400 ms) Is there any way to optimize it ?
hassan hawilo
@hassanhawilo_gitlab
DistributedException from /127.0.0.1:54321: 'Index 1684 out of bounds for length 1684', caused by java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684
at water.MRTask.getResult(MRTask.java:494)
at water.MRTask.getResult(MRTask.java:502)
at water.MRTask.doAll(MRTask.java:397)
at water.MRTask.doAll(MRTask.java:403)
at hex.Model.predictScoreImpl(Model.java:1784)
at hex.Model.score(Model.java:1618)
at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:403)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1575)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684
at hex.genmodel.GenModel.setCats(GenModel.java:707)
at hex.genmodel.GenModel.setInput(GenModel.java:686)
at hex.genmodel.algos.deeplearning.DeeplearningMojoModel.score0(DeeplearningMojoModel.java:70)
at hex.genmodel.algos.deeplearning.DeeplearningMojoModel.score0(DeeplearningMojoModel.java:158)
at hex.genmodel.algos.ensemble.StackedEnsembleMojoModel.score0(StackedEnsembleMojoModel.java:39)
at hex.generic.GenericModel.score0(GenericModel.java:93)
at hex.Model.score0(Model.java:1992)
at hex.Model.score0(Model.java:1959)
at hex.Model$BigScore.score0(Model.java:1903)
at hex.Model$BigScore.map(Model.java:1881)
at water.MRTask.compute2(MRTask.java:675)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1578)
at hex.Model$BigScore$Icer.compute1(Model$BigScore$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1574)
... 5 more
I have this error trying to fix it for a week now
I couldn't know what causing it
any clue
your help is much appreciated
razou
@razou
As your logs say: caused by java.lang.ArrayIndexOutOfBoundsException
Can you share the part of the code causing this
hassan hawilo
@hassanhawilo_gitlab
sure
if(inputDataFrameIsPandas):
    dataToML = inputDataFrame.iloc[0:,0:]
    DataH2OFrameToML= h2o.H2OFrame(dataToML)
else:
    DataH2OFrameToML = inputDataFrame

predictionsDataFrame = MLModel.predict(DataH2OFrameToML)
thanks
it is happening only when stackensemble uses deeplearning in its models
if the stackensemble doesnot use deeplearning in its models then all work fine
razou
@razou
try this (to check if the dataframe is not empty):
    if not inputDataFrameIsPandas.empty:
        DataH2OFrameToML= h2o.H2OFrame(inputDataFrame.iloc[0:,0:])
    else:
        DataH2OFrameToML = inputDataFrame

    predictionsDataFrame = MLModel.predict(DataH2OFrameToML)
razou
@razou
@hassanhawilo_gitlab what is the difference between inputDataFrame and inputDataFrameIsPandas
Because your if statement is on inputDataFrameIsPandas and your select on inputDataFrame
hassan hawilo
@hassanhawilo_gitlab
so just sometime users provide dataframe loaded directly in H2O format
if not then they provided using pandas and we load it to H2O
I will check is there is an empty data in the dataframe and will let you know
razou
@razou

What I'm saying is that you were testing if this object inputDataFrameIsPandas is not None and you selected the data to predict on from another object inputDataFrame: inputDataFrameIsPandas is not the same as inputDataFrame

This may be better

    if not inputDataFrame.empty:
        DataH2OFrameToML= h2o.H2OFrame(inputDataFrame.iloc[0:,0:])
    else:
        DataH2OFrameToML = inputDataFrame

    predictionsDataFrame = MLModel.predict(DataH2OFrameToML)
hassan hawilo
@hassanhawilo_gitlab
I see what you mean but the inputDataFrameIsPandas is a boolean variable provided by the user to know which data to load
I have checked the DataH2OFrameToML provided to the predict function it is not empty
and as I said the code works fine if no deeplearning model as submodel in the stackensemble
for exmple we have stackensemble that uses DRF GBM and XGBoost it works fine
but once we introduce the deeplearning model with them it gives this error
hassan hawilo
@hassanhawilo_gitlab
still same error java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684
at hex.genmodel.GenModel.setCats(GenModel.java:707)
at hex.genmodel.GenModel.setInput(GenModel.java:686)
at hex.genmodel.algos.deeplearning.DeeplearningMojoModel.score0(DeeplearningMojoModel.java:70)
at hex.genmodel.algos.deeplearning.DeeplearningMojoModel.score0(DeeplearningMojoModel.java:158)
at hex.genmodel.algos.ensemble.StackedEnsembleMojoModel.score0(StackedEnsembleMojoModel.java:39)
at hex.generic.GenericModel.score0(GenericModel.java:93)
at hex.Model.score0(Model.java:1992)
at hex.Model.score0(Model.java:1959)
at hex.Model$BigScore.score0(Model.java:1903)
at hex.Model$BigScore.map(Model.java:1881)
at water.MRTask.compute2(MRTask.java:675)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1578)
at hex.Model$BigScore$Icer.compute1(Model$BigScore$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1574)
... 5 more
still I am getting same error check the dataframe passed to the model and same columns as the training dataframe and non of the data is nan
hassan hawilo
@hassanhawilo_gitlab
tried older version of H2O now the error changed to this
java.lang.IllegalArgumentException: Unsupported MOJO model hex.genmodel.algos.deeplearning.DeeplearningMojoModel.
OSError: Job with key $03017f00000132d4ffffffff$_b756f6aab3e7b7d12d531ff7aec345c8 failed with an exception: java.lang.IllegalArgumentException: Unsupported MOJO model hex.genmodel.algos.deeplearning.DeeplearningMojoModel.
stacktrace:
java.lang.IllegalArgumentException: Unsupported MOJO model hex.genmodel.algos.deeplearning.DeeplearningMojoModel.
at hex.generic.Generic$MojoDelegatingModelDriver.computeImpl(Generic.java:91)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:222)
at hex.generic.Generic$MojoDelegatingModelDriver.compute2(Generic.java:70)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1443)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)