- Join over
**1.5M+ people** - Join over
**100K+ communities** - Free
**without limits** - Create
**your own community**

@ipashchenko you can use

`scale_pos_weight`

in binary classification.
@PPACI can you try the latest code?

ok, I got something on this parameter. histogram_pool_size.

I'm looking into numerical instability of the BinaryLogLoss of lightGBM as printet below. Should the computed response not be able to become numerical instable and therefore affect the computation of gradient and hessian? Normally one would ensure that it did not overflow when computing the ecponential of a very small value for example with an epsilon value. Am i way off on this and can someone maybe help me understand the reason behind this code and why it is numerical stable?

https://github.com/Microsoft/LightGBM/blob/1c92e75d0342989359c469b1ffabc2901038c0f2/src/objective/binary_objective.hpp

void GetGradients(const double* score, score_t* gradients, score*t* hessians) const override { if (weights* == nullptr) {

```
#pragma omp parallel for schedule(static)
for (data_size_t i = 0; i < num_data_; ++i) {
// get label and label weights
const int is_pos = is_pos_(label_[i]);
const int label = label_val_[is_pos];
const double label_weight = label_weights_[is_pos];
// calculate gradients and hessians
const double response = -label * sigmoid_ / (1.0f + std::exp(label * sigmoid_ * score[i]));
const double abs_response = fabs(response);
gradients[i] = static_cast<score_t>(response * label_weight);
hessians[i] = static_cast<score_t>(abs_response * (sigmoid_ - abs_response) * label_weight);
}
```

I would like to use LambdaRank for music recommendation task, The number of items are several thousands and users would have listened to very few songs. While training each user will have few records per user (group) but while scoring I would have to score thousands of songs per user and rank them. Can lambdaRank be used for this problem ? I have tried Matrix Factorization / item similarity approaches and the results were better than recommending popular songs.

I'm doing my calculation as NumRecords * NumColumns * 8bytes = amount of RAM I would need at minimum

I am seeing that the log loss on gpu training is increasing on every iteration

That doesn't seem normal

does anyone know the

`predict_raw_score`

means in lgb CLI config
Hi! Thanks for this great tool guys! Would you have additional information on how refit on CLI works? In the documentations, it's described as a way to "refit existing models with new data". Is it a matter of simply passing the new data to the refit.conf file? <-- When I've tried this, it produces very erroneous results, indicating that the model may not updating correctly. Would you have any additional documentations or examples for how to use refit?

I came up with following hack:

```
predictions = {}
for start_day, end_day in tqdm_notebook(weeks):
def r2_closure(y_pred, data):
y_true = data.get_label()
y_pred
predictions[(start_day, end_day)]=(y_true, y_pred)
return 'r2', r2_score(y_true, y_pred), True
train_idx = np.where(dates<np.datetime64(start_day, 'D'))[0]
test_idx = np.where((dates>=np.datetime64(start_day, 'D')) & (dates<np.datetime64(end_day, 'D')))[0]
train = dataset.subset(train_idx)
test = dataset.subset(test_idx)
booster = lgb.train(params, train, valid_sets=test, feval=r2_closure, verbose_eval=False)
```

but there should be something easier!

Why it can't just do something like this:

dataset.to_csv() or like this: dataset.to_numpy()?

or maybe like this:

test.save_binary('file.libsvm.bin')

booster.predict('file.libsvm.bin')

?

this thing above calls r2_closure on every round which is not what I want (that's why it's dictionary) and overall idea to obtain predictions on test using feval & valid_sets is hack

I have a trained model:

```
tree
version=v2
num_class=10
[...]
Tree=0
[...]
Tree=1
[...]
Tree=999
[...]
feature importances:
[...]
parameters:
[boosting: gbdt]
[objective: multiclass]
[...]
[num_iterations: 100]
[...]
```

In which order are the Trees 0 to 999 assigned to one of the 10 classes?

Hi all, I am an R user currently working xgboost and SHAP values to facilitate the interpretation of the boosted regression tree model. Is there a way to extract SHAP values from the LightGBM model in the R package? I look for s.th. comparable to: https://slundberg.github.io/shap/notebooks/Census%20income%20classification%20with%20LightGBM.html

did some you succesfully run sparkmml on centos 7 ?

I've seen that there's this issue : Azure/mmlspark#390

Is anybody here?

@guolinke

Thank you very much for LightGBM public! I have a question about the evaluation script used for MSLR-10K and MSLR-30K datasets. Did you use the NDCG evaluation script on https://www.microsoft.com/en-us/research/project/mslr/?

Thank you very much for making LightGBM public! *

I want to know why the loss never change,hello

@xlamb1412 Your question is several months old by now - But yes, I've found the GPU version to be significantly faster than CPU. Testing on AWS instances, the worst GPU instance available (g2.2xlarge) can train in about the same amount of time as the best compute instance (c5.24xlarge). I'm exploring how to combien GPU training with MPI parallel learning. It would be sick if you could train on 8 or 16 GPUs at once