eXtreme Gradient Boosting (GBDT, GBRT or GBM) Library for large-scale and distributed machine learning, on single node, hadoop yarn and more.
import xgboost gives the warning cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
but it looks like it should be fixed in rhiever/tpot#284
was that never merged?
Hi, everybody! Could someone help me with a xgboost parameters. I have the following code:
it doesn't output RMSE metric during training. Where am I wrong?
i was looking to build xgboost from source
just wanted to know if anyone on the group has tried doing it before?
also just wanted to confirm whether this is 100% open source
Hello. Does anyone know if i can slice xgboost's DMatrix by column or block certain features from being used in specific train instance?
@Goorman it's probably easier to make a new DMatrix with those rows removed or censored in whatever way you need.
how can you use the pearson correlation coefficient as the loss function with the xgboost regressor?
@ckchow you have probably meant columns removed and yes this is the only solution i see right now. The problem is that i have to construct DMatrix from sparse libsvm file, and for example to perform greedy feature selection i would have to create new (big) libsvm file every iteration. Which is annoying.
Oh, I see. can't you construct DMatrices in memory from arrays of arrays?
At least in Java there is a float constructor, and I think there's a numpy constructor in python as well. might be out of luck if you're using the command line version.
hi... does anyone understand why xgboost is so slow if you have lots of classes? This code shows the problem https://bpaste.net/show/f7573b5a2fb9 RandomForestClassifier takes about 15 seconds
but xgboost never terminates at all for me
I am training a binary classifier. In the problem I am working on, I can generate more training data at will. In that by running a simulation I can (determenistically) determine the correct label for any feature set Each training case takes a bit to generate (say 0.5 seconds). The main motivation for training a classifier is that evaluating via simulation takes too long.
Is there a specific way to task advantage of my capacity to generate more data, that I can do in xgboosting, that I couldn't do with say a SVM?
Its almost an Active Learning problem
I'm not sure if there is anything beyond: "Generate more data, both for training and validation , until the validation error hits 0"
Hi everyone! Could anyone explain what are the arguments of a custom loss function?
Jay Kim (Data Scientist)
Hi everyone. I joined this room first time today, nice to meet you all
Asbjørn Nilsen Riseth
Is there a built-in way to run XGBoost with a weighted mean square loss function? Something like ∑i=1Dwi(yi−y^i)2
is there a general reason why xgboost predict returns only nan?
this is for python
xgboost predict for multithread works bad
on windows xp，i found a lots of issues for xgboost，exspacially，
For XGBoost, when considering time series data, is it worth creating features which represent a change in other features? For example, say I have the feature "total_active_users". Would it make sense to have a feature "change_in_total_active_users"? Or, would that just be redundant?
Can someone suggest how to begj with xgboost ?
I use xgboost4j-0.80.jar predictleaf always return 3 leafindex for one label? is this any error?
have anyone can answer me?
:joy: :joy_cat: :joy:
I used xgboost4j-0.80.jar, xgboost train parameter of round is 800 and train data is 2000000. When I use predictleaf to get leafIndex, the jvm crashed.
A fatal error has been detected by the Java Runtime Environment:
SIGSEGV (0xb) at pc=0x00007f42160bf902, pid=880, tid=0x00007f42175f2700
JRE version: Java(TM) SE Runtime Environment (8.0_171-b11) (build 1.8.0_171-b11)
Java VM: Java HotSpot(TM) 64-Bit Server VM (25.171-b11 mixed mode linux-amd64 compressed oops)
V [libjvm.so+0x6d6902] jni_SetFloatArrayRegion+0xc2
Core dump written. Default location: /data/suzhe/suzhe-1.0-SNAPSHOT/core or core.880
If you would like to submit a bug report, please visit: