Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jun 19 2018 00:00
    @ledell banned @renatomarinho
razou
@razou
It's seems like, you have a typo. Instead of h2o you wrote h20 with (zero)
razou
@razou
Hello,
I'm using stratified split based on y column and there are only single entries for some values of y
How can I force those entries to be in train instead of validation
Tiago Magnus
@tiagomagnusss
Hello! I'm trying to download the bin of an h2o model through the API, what is the correct way to store it on disk so I can import it later through the API? (I'm using Python 3.8)
Of all the encodings I tried, only UTF-8 and UTF-16 worked, but they scrape off the magic number at the start (this error message: "Missing magic number 0x1CED at stream start")
razou
@razou
Hello
frame[x] = frame[x].asfactor() take more 400ms. Is it possible to reduce this time ?
Thanks
razou
@razou
Another question: convert pandas.DataFrame onto h2o.H2OFrame is somewhat expensive (400 ms) Is there any way to optimize it ?
hassan hawilo
@hassanhawilo_gitlab
DistributedException from /127.0.0.1:54321: 'Index 1684 out of bounds for length 1684', caused by java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684
at water.MRTask.getResult(MRTask.java:494)
at water.MRTask.getResult(MRTask.java:502)
at water.MRTask.doAll(MRTask.java:397)
at water.MRTask.doAll(MRTask.java:403)
at hex.Model.predictScoreImpl(Model.java:1784)
at hex.Model.score(Model.java:1618)
at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:403)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1575)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684
at hex.genmodel.GenModel.setCats(GenModel.java:707)
at hex.genmodel.GenModel.setInput(GenModel.java:686)
at hex.genmodel.algos.deeplearning.DeeplearningMojoModel.score0(DeeplearningMojoModel.java:70)
at hex.genmodel.algos.deeplearning.DeeplearningMojoModel.score0(DeeplearningMojoModel.java:158)
at hex.genmodel.algos.ensemble.StackedEnsembleMojoModel.score0(StackedEnsembleMojoModel.java:39)
at hex.generic.GenericModel.score0(GenericModel.java:93)
at hex.Model.score0(Model.java:1992)
at hex.Model.score0(Model.java:1959)
at hex.Model$BigScore.score0(Model.java:1903)
at hex.Model$BigScore.map(Model.java:1881)
at water.MRTask.compute2(MRTask.java:675)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1578)
at hex.Model$BigScore$Icer.compute1(Model$BigScore$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1574)
... 5 more
I have this error trying to fix it for a week now
I couldn't know what causing it
any clue
your help is much appreciated
razou
@razou
As your logs say: caused by java.lang.ArrayIndexOutOfBoundsException
Can you share the part of the code causing this
hassan hawilo
@hassanhawilo_gitlab
sure
if(inputDataFrameIsPandas):
    dataToML = inputDataFrame.iloc[0:,0:]
    DataH2OFrameToML= h2o.H2OFrame(dataToML)
else:
    DataH2OFrameToML = inputDataFrame

predictionsDataFrame = MLModel.predict(DataH2OFrameToML)
thanks
it is happening only when stackensemble uses deeplearning in its models
if the stackensemble doesnot use deeplearning in its models then all work fine
razou
@razou
try this (to check if the dataframe is not empty):
    if not inputDataFrameIsPandas.empty:
        DataH2OFrameToML= h2o.H2OFrame(inputDataFrame.iloc[0:,0:])
    else:
        DataH2OFrameToML = inputDataFrame

    predictionsDataFrame = MLModel.predict(DataH2OFrameToML)
razou
@razou
@hassanhawilo_gitlab what is the difference between inputDataFrame and inputDataFrameIsPandas
Because your if statement is on inputDataFrameIsPandas and your select on inputDataFrame
hassan hawilo
@hassanhawilo_gitlab
so just sometime users provide dataframe loaded directly in H2O format
if not then they provided using pandas and we load it to H2O
I will check is there is an empty data in the dataframe and will let you know
razou
@razou

What I'm saying is that you were testing if this object inputDataFrameIsPandas is not None and you selected the data to predict on from another object inputDataFrame: inputDataFrameIsPandas is not the same as inputDataFrame

This may be better

    if not inputDataFrame.empty:
        DataH2OFrameToML= h2o.H2OFrame(inputDataFrame.iloc[0:,0:])
    else:
        DataH2OFrameToML = inputDataFrame

    predictionsDataFrame = MLModel.predict(DataH2OFrameToML)
hassan hawilo
@hassanhawilo_gitlab
I see what you mean but the inputDataFrameIsPandas is a boolean variable provided by the user to know which data to load
I have checked the DataH2OFrameToML provided to the predict function it is not empty
and as I said the code works fine if no deeplearning model as submodel in the stackensemble
for exmple we have stackensemble that uses DRF GBM and XGBoost it works fine
but once we introduce the deeplearning model with them it gives this error
hassan hawilo
@hassanhawilo_gitlab
still same error java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684
at hex.genmodel.GenModel.setCats(GenModel.java:707)
at hex.genmodel.GenModel.setInput(GenModel.java:686)
at hex.genmodel.algos.deeplearning.DeeplearningMojoModel.score0(DeeplearningMojoModel.java:70)
at hex.genmodel.algos.deeplearning.DeeplearningMojoModel.score0(DeeplearningMojoModel.java:158)
at hex.genmodel.algos.ensemble.StackedEnsembleMojoModel.score0(StackedEnsembleMojoModel.java:39)
at hex.generic.GenericModel.score0(GenericModel.java:93)
at hex.Model.score0(Model.java:1992)
at hex.Model.score0(Model.java:1959)
at hex.Model$BigScore.score0(Model.java:1903)
at hex.Model$BigScore.map(Model.java:1881)
at water.MRTask.compute2(MRTask.java:675)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1578)
at hex.Model$BigScore$Icer.compute1(Model$BigScore$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1574)
... 5 more
still I am getting same error check the dataframe passed to the model and same columns as the training dataframe and non of the data is nan
hassan hawilo
@hassanhawilo_gitlab
tried older version of H2O now the error changed to this
java.lang.IllegalArgumentException: Unsupported MOJO model hex.genmodel.algos.deeplearning.DeeplearningMojoModel.
OSError: Job with key $03017f00000132d4ffffffff$_b756f6aab3e7b7d12d531ff7aec345c8 failed with an exception: java.lang.IllegalArgumentException: Unsupported MOJO model hex.genmodel.algos.deeplearning.DeeplearningMojoModel.
stacktrace:
java.lang.IllegalArgumentException: Unsupported MOJO model hex.genmodel.algos.deeplearning.DeeplearningMojoModel.
at hex.generic.Generic$MojoDelegatingModelDriver.computeImpl(Generic.java:91)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:222)
at hex.generic.Generic$MojoDelegatingModelDriver.compute2(Generic.java:70)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1443)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
!welcome
@aw236

Hi all. I used H2o's Isolation Forest algorithm implementation in Python 3 in an AWS cluster environment (not sure which of these details is relevant). FYI, I am a data scientist, not a software engineer, so I am not proficient in Java, which I see a lot of the code is in.

My question is: is there a way to extract/save/see the attributes and split values selected for each of the trees that are trained for the isolation forest? I have scoured the documentation and looked at the code on GitHub without seeing any obvious way to do so. My use case is: demonstrating to a non-technical audience how these trees are, since they are skeptical of the "black-box" and lack of understanding of what attributes/split values the observations are being isolated by.

Thanks.

Nitesh yadav
@nitesh585
Hi all,
I am working on some project where I want to save the leader of h2oAutoML model and load when its needed.
I am trying to do this with joblib but it didn't work.
Do I need to use h2o's save model stuff or joblib will work?
Please guide me.
3 replies
razou
@razou
Hello
I'm using h2o.import_file() function to load multiple csv files like this:
h2o_frame = h2o.import_file(CSV_PATH, pattern='{0}_[0-9]+.csv$'.format('train'))
But it fails when one of the csv files is empty (having only column names)
Server error water.exceptions.H2OIllegalArgumentException: Error: File type mismatch. Cannot parse files [train_115092601.csv] and [train_202032.csv] of type CSV and CSV as one dataset.
How can I ignore the empty file or force the merge or any other idea to solve this issue ?
Thank you
Simon Schmid
@SimonSchmid
Hi all,
I have a question regarding this line: https://github.com/h2oai/h2o-3/blob/master/h2o-core/src/main/java/hex/Model.java#L1536 Why do we care about this? Isn't the response column ignored for scoring anyway? I encountered the error by chance and was just wondering why the check is there.
2 replies
Frankie Logan
@12tafran
Hi all,
I am working with a tweedie glm model and was wondering why the AIC is showing up as NULL
razou
@razou
Hello,
I remarked that when using H2O the python logger module won't print anything ? Anybody has experienced this ?
import logging

logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(name)s %(levelname)s:%(message)s')
logger = logging.getLogger(__name__)

logger.info('Training size: ', train.nrow)
logger.info('Validation size: ', validation.nrow)
6 replies
razou
@razou

Hello

Having a frame with some categorical columns (X1, .., X5) and X3 had only NaN in initial csv file
Strange behavior when using dtypes method

h2o_df['X3'] =  h2o_df['X3'].ascharacter().asfactor()
categorical_cols = [k for (k, v) in h2o_df.types.items() if v == 'enum' and k not in ['y']]

returns [X1, X2, X3,X4,X5,C1] instead of [X1, X2, X3, X4, X5]
Why the column C1 was added ?

wendycwong
@wendycwong
Frankie.Logan: There is no reason for glm model not to have AIC. It is an oversight. I am adding it now for you. Here is the JIRA: https://h2oai.atlassian.net/browse/PUBDEV-8065 . You can check it here to see when it is done. Thank you for bringing it to my attention. Wendy
1 reply
1pyroaqua
@1pyroaqua

Looking for feedback from the h2o community on how they productionize h2o models.

Based on the documentation, looks like java application is the standard for productionizing h2o models.

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html

  1. Should h2o models be always productionized using java application(s) ?
  2. How are people who lack java skills but have python skills (like me) productionize h2o models.
  3. I was under an assumption that I can create a python flask, gunicorn application that will start and h2o cluster and use h2o.upload_mojo() function to load a h2o MOJO model and use it for prediction or using h2o.mojo_predict_csv / h2o.mojo_predict_pandas to get prediction from a h2o model without even starting an h2o cluster Is this not a good standard?
3 replies
Jay van Zyl
@jayvanzyl
Hello, I have an issue with the rest api call when training a dl model:
Illegal argument for field: hidden of schema: DeepLearningParametersV3: cannot convert ""90"" to type int
Jay van Zyl
@jayvanzyl
Found the solution.
Nitesh yadav
@nitesh585
Hello, I m working on a college project to build my own AutoML.
How can H2O AutoML predict the type of models ( classification or regression ) to train? Which type of problem is regression or classification?
Simon Schmid
@SimonSchmid
Hi all, I found a bug in the AutoML class, see https://github.com/h2oai/h2o-3/blob/ff45788d86eda742eb0464d66d938094250b32e8/h2o-automl/src/main/java/ai/h2o/automl/AutoML.java#L93. The synchronization is working properly for 2 concurrent calls but not for more than 2. If there are e.g. 3 concurrent calls at 12:00:00, all will have the same startTime. The first processed one then will take 12:00:00 as startTime and the second call will first wait and then retrieve new start times until it is at least 12:00:01. 12:00:01 will then be saved as lastStartTime which means that the third call is actually fine with keeping 12:00:00. This will then result in an error as it produces duplicate models ids together with the run of the first call. Not sure what the best fix is, probably just checking that startTime is a time after lastStartTime .
4 replies
Simon Schmid
@SimonSchmid
Hi all, it's me again. From time to time, h2o runs a memory benchmark here https://github.com/h2oai/h2o-3/blob/master/h2o-core/src/main/java/water/HeartBeatThread.java#L163. What is the purpose of this? On Windows, it leads to a quite high CPU utilization when running. Took me a while to figure out where it comes from, a Thread Dump revealed it. I figured out that I can disable it by setting the system property sys.ai.h2o.heartbeat.benchmark.enabled=false. However, I am wondering what the downside is of disabling it.
2 replies
wangxudong
@caozhuozi
Hi, all. How to delete an experiment using python client?
1 reply
ldswaby
@ldswaby

Hello. I'm having an annoyingly time-consuming issue and was wondering if anyone here has any suggestions.

I'm basically using h2o to train and cross-validate some ANNs on a few different bio-logging data sets (some immersion and some acceleration). It works fine for the immersion datasets (which are all binary and <120MB), completing in just a few hours, but for some reason hangs at 100% on training on the first acceleration one (which are float and fairly larger - 170MB-8GB).

I suspect its a memory issue, but with no error message I have no idea how to troubleshoot this and proceed. I've been stuck on this for a week now coming up to the climax of a masters project! Does anyone have any ideas?

The part where it hangs:

deeplearning Model Build progress: <progress bar> 100%

CODE:

#!/usr/bin/env python3

import h2o
from h2o.estimators import H2ODeepLearningEstimator
import glob
import re

h2o.init(min_mem_size='30G', max_mem_size="100G")

files = glob.glob('../Data/Reduced/ACC*.csv')

for f in files:

    # Load data
    data = h2o.import_file(f, header=1)
    data['Dive'] = data['Dive'].asfactor()
    data['BirdID'] = data['BirdID'].asfactor()

    # Extract model ID from filepath
    wdw = re.search(r"/ACC(\d+)_reduced", f).group(1)

    # Build, train, and cross-validate model
    dl_cross = H2ODeepLearningEstimator(model_id = 'ACC_window_' + wdw,
                                        distribution = "bernoulli",
                                        hidden = [200, 200],
                                        fold_column = 'BirdID',
                                        keep_cross_validation_models = True,
                                        keep_cross_validation_fold_assignment = True,
                                        keep_cross_validation_predictions = True,
                                        score_each_iteration = True,
                                        epochs = 50,
                                        train_samples_per_iteration = -1,
                                        activation = "RectifierWithDropout",
                                        #input_dropout_ratio = 0.2,
                                        hidden_dropout_ratios = [0.2, 0.2],
                                        single_node_mode = False,
                                        balance_classes = False,
                                        force_load_balance = False,
                                        seed = 23123,
                                        score_training_samples = 0,
                                        score_validation_samples = 0,
                                        stopping_rounds = 0)
    print('Training...')

    dl_cross.train(x = data.columns[1:-1],
                   y="Dive",
                   training_frame=data)

    # Save model
    print('Saving...')
    h2o.save_model(model=dl_cross, path="../Data/Reduced/H2O_ACC_XVal_Models/", force=True)

# Close session
h2o.shutdown()