Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Guolin Ke
    @stevespark please try histogram_pool_size when num_leaves is large
    @ipashchenko you can use scale_pos_weight in binary classification.
    @PPACI can you try the latest code?
    Guolin Ke
    @Goorman you can try the ignore_column parameter
    Is there a function within lightgbm or utility to transform the trees into a dataframe similar to xgboost?
    Jayaram Prabhu Durairaj
    are there good methods to tune GBM trees, soe step by step sensible method ?
    Hi guys...Is there a way to train a lightgbm model in batches? or any way to work with a data, larger than the RAM size?
    ok, I got something on this parameter. histogram_pool_size.
    M Hendra Herviawan
    HI, how to add "-O3 -mtune=native" during build, I am using ubuntu 17
    I'm getting "[LightGBM] [Fatal] Unknown token bh in data file" when I try to run lightgbm on the allstate data set. Does anyone know what's causing this?
    Simon Østergaard Kristensen

    I'm looking into numerical instability of the BinaryLogLoss of lightGBM as printet below. Should the computed response not be able to become numerical instable and therefore affect the computation of gradient and hessian? Normally one would ensure that it did not overflow when computing the ecponential of a very small value for example with an epsilon value. Am i way off on this and can someone maybe help me understand the reason behind this code and why it is numerical stable?

    void GetGradients(const double score, score_t gradients, scoret* hessians) const override {
    if (weights
    == nullptr) {

      #pragma omp parallel for schedule(static)
      for (data_size_t i = 0; i < num_data_; ++i) {
        // get label and label weights
        const int is_pos = is_pos_(label_[i]);
        const int label = label_val_[is_pos];
        const double label_weight = label_weights_[is_pos];
        // calculate gradients and hessians
        const double response = -label * sigmoid_ / (1.0f + std::exp(label * sigmoid_ * score[i]));
        const double abs_response = fabs(response);
        gradients[i] = static_cast<score_t>(response * label_weight);
        hessians[i] = static_cast<score_t>(abs_response * (sigmoid_ - abs_response) * label_weight);
    Jayaram Prabhu Durairaj
    can we plot partial independance plot ?
    Aswath Ravindran
    I would like to use LambdaRank for music recommendation task, The number of items are several thousands and users would have listened to very few songs. While training each user will have few records per user (group) but while scoring I would have to score thousands of songs per user and rank them. Can lambdaRank be used for this problem ? I have tried Matrix Factorization / item similarity approaches and the results were better than recommending popular songs.
    I had a question: If my data size is much larger than my GPU memory, can LightGBM still work on that ?
    I'm doing my calculation as NumRecords NumColumns 8bytes = amount of RAM I would need at minimum
    I had a question:Why I changed the weight of training dataset, by setting param 'weight' of traning dataset, but my gbdt module did not change at all?
    Hugo Mougard
    I encounter the warning UserWarning: Foundnum_iterationsin params. Will use it instead of argument a lot. What does it mean exactly?
    Karthik Anantha Padmanabhan
    I have a question on gpu training
    I am seeing that the log loss on gpu training is increasing on every iteration
    That doesn't seem normal
    does anyone know the predict_raw_score means in lgb CLI config
    hi, it seems the operation bagging sample make gpu version slow. does anybody have good ideas?
    Isabelle Tingzon
    Hi! Thanks for this great tool guys! Would you have additional information on how refit on CLI works? In the documentations, it's described as a way to "refit existing models with new data". Is it a matter of simply passing the new data to the refit.conf file? <-- When I've tried this, it produces very erroneous results, indicating that the model may not updating correctly. Would you have any additional documentations or examples for how to use refit?
    Nkululeko Thangelane
    Hi, how do you change the default metric on predict. So that the output does not reflect outputs for F1 Score.
    Boris Filippov
    what is right way to convert lgb.Dataset to raw data in case when it's just subset of another (big) lgb.Dataset which was initialized from filename, not numpy\scr\pandas df?
    Boris Filippov

    I came up with following hack:

    predictions = {}
    for start_day, end_day in tqdm_notebook(weeks):
        def r2_closure(y_pred, data):
            y_true = data.get_label()
            predictions[(start_day, end_day)]=(y_true, y_pred)
            return 'r2', r2_score(y_true, y_pred), True
        train_idx = np.where(dates<np.datetime64(start_day, 'D'))[0]
        test_idx = np.where((dates>=np.datetime64(start_day, 'D')) & (dates<np.datetime64(end_day, 'D')))[0]
        train = dataset.subset(train_idx)
        test = dataset.subset(test_idx)
        booster = lgb.train(params, train, valid_sets=test, feval=r2_closure, verbose_eval=False)

    but there should be something easier!
    Why it can't just do something like this:
    dataset.to_csv() or like this: dataset.to_numpy()?
    or maybe like this:


    this thing above calls r2_closure on every round which is not what I want (that's why it's dictionary) and overall idea to obtain predictions on test using feval & valid_sets is hack
    Boris Filippov
    or I just incorrectly assume that order of rows in dataset is same as in source file and I can do slice on it based on some source columns? And this whole idea is wrong? Can it be different?
    I have a question regarding the model which is exported to a file by LightGBM

    I have a trained model:

    feature importances:
    [boosting: gbdt]
    [objective: multiclass]
    [num_iterations: 100]

    In which order are the Trees 0 to 999 assigned to one of the 10 classes?

    Jakob Gerstenlauer
    Hi all, I am an R user currently working xgboost and SHAP values to facilitate the interpretation of the boosted regression tree model. Is there a way to extract SHAP values from the LightGBM model in the R package? I look for s.th. comparable to: https://slundberg.github.io/shap/notebooks/Census%20income%20classification%20with%20LightGBM.html
    Fibinse Xavier`
    I'm trying to install Lightgbm with GPU support on my macbook pro, because it as a AMD Radeon Pro card in-built. Am i being naive ?
    Hi there !
    did some you succesfully run sparkmml on centos 7 ?
    I've seen that there's this issue : Azure/mmlspark#390
    Paul Armen Gureghian
    is there a c++ example of lightgbm predict calling python trained model ?
    Is anybody here?
    Jacob Tran
    is LightGBM GPU version better than the CPU one? speed and accuracy.
    Hi all, does LightGBM works with scala 2.12.x on top of apache spark?
    Robert William
    Evening guys
    Thank you very much for LightGBM public! I have a question about the evaluation script used for MSLR-10K and MSLR-30K datasets. Did you use the NDCG evaluation script on https://www.microsoft.com/en-us/research/project/mslr/?
    Thank you very much for making LightGBM public! *
    What are the different ways to deploy mmlspark model? is there a way to export a model in MOJO format ? Thanks
    I want to know why the loss never change,hello
    @xlamb1412 Your question is several months old by now - But yes, I've found the GPU version to be significantly faster than CPU. Testing on AWS instances, the worst GPU instance available (g2.2xlarge) can train in about the same amount of time as the best compute instance (c5.24xlarge). I'm exploring how to combien GPU training with MPI parallel learning. It would be sick if you could train on 8 or 16 GPUs at once