Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Travis Brady
    @travisbrady
    Are there plans to have OPE estimators ala https://github.com/VowpalWabbit/estimators available in mainline vw so they can be called from outside python? (context: I'm a binding author and would love access to this)
    2 replies
    Travis Brady
    @travisbrady
    Hi all. I wrote up a short little post about vw 8.9.0 mostly for folks newer to vw but thought I'd share here. I'd love any feedback or corrections, I'm sure I've missed plenty. https://travisbrady.github.io/posts/vowpal-8.9.0/
    Jack Gerrits
    @jackgerrits
    Thanks for sharing Travis! Love the post! (And thanks for the call out :) ) By the way, I haven’t forgotten about supporting bindings, it’s been a bit lower on the todo list for a while though. We’ve started work on a new C binding interface that provides a more complete view at VW functionality and actually reports errors how a C api should.
    3 replies
    Sam Lendle
    @lendle
    For posterity, here's a recording of the NeurIPS 2020 presentation: https://slideslive.com/38942331/vowpal-wabbit, which I discovered in Travis's post above but haven't been able to find elsewehre. Great presentation, it helped clear up many of the questions I had last week. Thanks!
    Jack Gerrits
    @jackgerrits
    We also just put up a blog post that contains timestamps to each section in case anyone is looking for anything in particular: https://vowpalwabbit.org/blog/neurips2020.html
    Lena Gangan
    @lenagangan_twitter
    is it reasonable to use CB (or maybe even non contextual) with >20K arms (nr of book titles from an online bookstore)
    Wenjuan Dou
    @darlwen
    I am using pyvw(python api of vw), are there any way to print debug info when using vw.learn()?
    6 replies
    Wenjuan Dou
    @darlwen
    And one more question , in https://vowpalwabbit.org/blog/neurips2020.html, for CB Visualizations, where can I get source code of it?
    1 reply
    Dan Swain
    @dantswain
    Hi! I was wondering how people are using VW in a cloud-based environment? It seems like VW is generally set up to write things to disk, so it's not clear to me how one would operate many learners in an environment where disk is ephemeral and for scaling purposes a request may be routed to any one of multiple servers. Is anyone using VW like this?
    15 replies
    Max Pagels
    @maxpagels_twitter

    I'm using explore_eval to evaluate exploration algorithms (e-greedy with different epsilon values). Can someone confirm that explore_eval isn't intended for use with more than one pass over the data?

    The core issue I have is that i'd like to evaluate the best policy + exploration algorithm for a system in which the policy is trained once per week and then deployed. So the model itself is stationary for a week but across e.g. a year, it isn't. I'd like to use data generated by this system to do offline evaluation of new policies + exploration algorithms

    Wenjuan Dou
    @darlwen

    Hi all, I am reading the code to make clear how vw do epsilon greedy exploration.
    I find the following code in cb_explore_adf_greedy.cc:

    void cb_explore_adf_greedy::predict_or_learn_impl(VW::LEARNER::multi_learner& base, multi_ex& examples)
    {
      // Explore uniform random an epsilon fraction of the time.
      VW::LEARNER::multiline_learn_or_predict<is_learn>(base, examples, examples[0]->ft_offset);
    
      ACTION_SCORE::action_scores& preds = examples[0]->pred.a_s;
    
      uint32_t num_actions = (uint32_t)preds.size();
    
      size_t tied_actions = fill_tied(preds);
    
      const float prob = _epsilon / num_actions;
      for (size_t i = 0; i < num_actions; i++) preds[i].score = prob;
      if (!_first_only)
      {
        for (size_t i = 0; i < tied_actions; ++i) preds[i].score += (1.f - _epsilon) / tied_actions;
      }
      else
        preds[0].score += 1.f - _epsilon;
    }

    It givens the action with the largest cost a score: 1-epsilon, and the rest actions a score: epsilon/num_actions. Is this how it do exploration based on epsilon? I am a little confused about it, can someone help explain it?

    Wenjuan Dou
    @darlwen
    @here could someone help answer my question above? Thanks a lot!
    Max Pagels
    @maxpagels_twitter

    Epsilon greedy works as follows (example with 4 arms):

    Per round, choose the best arm given context (i.e. arm with lowest cost) with probability epsilon. With probability 1-epsilon, choose an arm uniformly at random.

    With epsilon = 0.1, at any given round, the probability of choosing the best arm is 1-0.1 plus 0.1 x 1/4 -> 0.925 ("exploit"). The probability of choosing a suboptimal arm is 0.1 * (1/4) = 0.025 ("explore")

    3 replies
    Max Pagels
    @maxpagels_twitter

    Watched the great content at https://slideslive.com/38942331/vowpal-wabbit, thanks to all involved! A related question:

    I am implementing a ranking system, where the action sets per slot are not disjoint, i.e. i basically want a ranking without duplicates. The video mentions that the theory behind slates is worked out for the intersected/joint action case, but that it's still being worked on in VW.

    Am I shooting myself in the foot if I use CCB instead of slates now? Is there some rough estimate of when joint action sets will be supported in slates mode? Is slates mode planned as a replacement for CCB? @jackgerrits is probably the one to ask :)

    olgavrou
    @olgavrou
    Hi @maxpagels_twitter it sounds like you do have a CCB problem. Slates is an extension of CCB where the action set is supposed to be disjoint and there is a single global reward for the entire slate. In CCB you have a joint action space and therefore it will do a ranking for you. In CCB you can specify rewards for each slot. Attaching the documentation for each in case you want to take a closer look: CCB and Slates
    Max Pagels
    @maxpagels_twitter
    @olgavrou thanks, that was what I was thinking. Just wondered since the video mentioned slates with joint action spaces is on the roadmap, and I was wondering about its status. I'd rather have something with a global reward. For now though, I'll use CCB.

    Regarding CCBs, I have a follow-up question. The docs mention this:

    "If action_ids_to_include is excluded then all actions are implicitly included". What's the use case for action_ids_to_include?

    It also states "This is currently unsupported". Does that refer to action_ids_to_include or the exclusion of action_ids_to_include :)?

    olgavrou
    @olgavrou
    We do want to get there eventually but it isn't currently on the roadmap (at least not in the near future). A suggestion regarding the global reward might be assigning a global reward and distributing evenly to all slots? But I haven't tried that out myself so not sure what results you will get there :)
    Nishant Kumar
    @nishantkr18
    Hey everyone! Would we be having the RLOS fest this yr? I believe the applications should have opened by now?
    Lalit Jain
    @lalitkumarj
    Hi all, I am trying to get active learning working with VW. I'm successfully able to send unlabeled examples however, vw sends back a single float (presumably a prediction) which is always 0. In active_interactor.py (which seems quite out of data) it seems that sometimes vw should send back a list pf prediction,tag,importance which I can then send back along with the feature. This is also the model in these slides:https://cilvr.cs.nyu.edu/diglib/lsml/lecture12_active.pdf. Would anybody be able to provide some guidance on what could be going wrong? Thank you!!
    Lalit Jain
    @lalitkumarj
    One additional data point: I just backed up to version 8.2.0 and things seem to be working fine there.
    AnkitRai-22
    @AnkitRai-22
    Hi everyone, I plan to contribute to RLOSF 2021, problem number 20th - "AutoML for online learning". We are supposed to implement AutoML HPO(Hyperparameter Optimization) techniques for VW. But there are many algos available to achieve so. I am planning to use ParamILS to achieve so. Any suggestions or comments would be highiy appreciated.
    Max Pagels
    @maxpagels_twitter

    CCBs: I always get undefined loss with --passes >1 on my example dataset. is this intended?

    More generally, there doesn't seem to be a ccb_adf option, only ccb_explore_adf, so it's not clear how to properly evaluate the policy (not the exploration algorithm) offline

    Utkarsh Sharma
    @utkarshsharma00
    Hi, I plan to contribute to RLOSF 2021. As per the website the applications have started from 14th January 2021, but I am not able to find a link to application form. Any help would be highly appreciated.
    Max Pagels
    @maxpagels_twitter
    Related to my CCB question, pretty sure it's a bug. Made a Github issue: VowpalWabbit/vowpal_wabbit#2781
    Josh Minor
    @jishminor
    To leverage the contextual bandit adf learner in vw, must the data samples supplied always have one action labeled with a:c:p? If I have existing data for contexts, actions and rewards (no probabilities), can this be used to train a model which would then be used to warm start an online learning session where vw generates predicted actions?
    3 replies
    Wenjuan Dou
    @darlwen

    Hi everyone, in vw source code, when compute prediction, we have the following code:

    float finalize_prediction(shared_data* sd, vw_logger& logger, float ret)
    {
      if (std::isnan(ret))
      {
        ret = 0.;
        if (!logger.quiet)
        { std::cerr << "NAN prediction in example " << sd->example_number + 1 << ", forcing " << ret << std::endl; }
        return ret;
      }
      if (ret > sd->max_label) return (float)sd->max_label;
      if (ret < sd->min_label) return (float)sd->min_label;
      return ret;
    }

    If I use squaredloss, then the prediction for the above function's input is 1.36777e+09, but after finalize_prediction, it become 0, does it make sense?

    peterychang
    @peterychang

    Hi, I plan to contribute to RLOSF 2021. As per the website the applications have started from 14th January 2021, but I am not able to find a link to application form. Any help would be highly appreciated.

    Sorry about that, the date has been moved back to Feb 1 per https://www.microsoft.com/en-us/research/academic-program/rl-open-source-fest/

    Jack Gerrits
    @jackgerrits
    @darlwen what is max_label and min_label when it is called?
    8 replies
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys!
    I am trying to run some experiments using --cb_explore_adf and I noticed that very often the model gets biased (the probability mass function output is mostly the same, regardless of context). I've tried using regularizers, modify the LR, add decay, and some other stuff, but I'm still not convinced the model is not biased, because when I run a few predictions for visualization, the PMF is often the same or at least the highest probability is at the same index.
    That being said, I would like to know if anyone has any suggestion of what might be causing this? (I know that my dataset is not biased, though not perfectly balanced)
    Also, when sampling an action from the PMF, why we don't always grab the index at which max(prob) occurs? I.e., why it is recommended to use the sample_custom_pmf (from: https://vowpalwabbit.org/tutorials/cb_simulation.html#getting-a-decision-from-vowpal-wabbit)? As I understand this is to add some kind of randomization, but aren't the model already exploring when we train it with explore_adf?
    Would love to hear your feedback.
    Cheers!
    olgavrou
    @olgavrou
    Hi @Favoreto_B_twitter if you are doing epsilon-greedy then the pmf provided will have a probability (1-e) on the predicted action and the remaining probability is distributed evenly on the remaining actions. The reason you see it always at index 0 is that VW will swap the predicted action with the first index so that it is always at index 0.
    For your second question, if we didn't sample from the pmf and just returned the predicted action (i.e. the one with the highest probability) then we would not be doing any exploration we would be exploiting 100% of the time. Sampling from the pmf means exactly that: that we will sample with a higher probability (1-e) the predicted action (exploiting) and with less probability one of the other actions (exploring)
    The model doesn't explore, the model learns and predicts. The exploration happens with what you decide to eventually show to the user.
    Bernardo Favoreto
    @Favoreto_B_twitter
    Thanks @olgavrou.
    I am not using only epsilon-greedy, though. I've seen bias for other algorithms as well, but the chosen action is not necessarily always at index 0 (could you elaborate on that?). One other thing I've noticed is that, depending on the namespace interactions I use, some contextual features don't seem to influence at all the model's prediction (e.g., if I'm using -q UA (user-action) and change a Location feature, it doesn't change the prediction), any idea why is that (that happened to me while using softmax explorer)?
    The second part is pretty clear to me now, thanks!
    pushpendre
    @pushpendre
    Hi, I was wondering if there are any online regression models implemented in VW beyond a linear model ? For example, is there a tree-based regressor in VW that can be trained online? or a DNN based regressor?
    pushpendre
    @pushpendre

    For example,

    The bandit bakeoff paper mentions that

    We run our CB algorithms in an online fashion using Vowpal Wabbit: .... we consider online CSC or regression oracles. Online CSC itself reduces to multiple online regression problems in VW...

    I understand the loss function and the gradient updates but I want to know what is online regression model class implemented in VW ?

    6 replies
    pushpendre
    @pushpendre
    Just for record, my question above is still open, the thread (till first 7 replies) went into another direction.
    pushpendre
    @pushpendre
    Hi everyone, one more question, how do importance weights interact with AdaGrad ? IIUC importance weights are derived for vanilla SGD and not for AdaGrad. I was wondering how exactly these two tweaks are implemented together?
    Josh Minor
    @jishminor
    This message was deleted
    4 replies
    pushpendre
    @pushpendre

    what is online regression model class implemented in VW ?
    one more question, how do importance weights interact with AdaGrad ?

    figured both out. thanks.

    Raphael Ottoni
    @raphaottoni
    Hello guys
    is anybody here?

    I am following the tutorial on CTR with cb_explore_adf and I would love to know if it is possible to use the namespace feature article to be numeric...
    in the tutorial, you guys tells us to do like this:

    shared |User user=Tom time_of_day=morning
    |Action article=politics
    |Action article=sports
    |Action article=music
    |Action article=food

    is it possible to pass numerical values and let the model generialize better when there is a new feature in the middle?

    shared |User user=Tom time_of_day=morning
    |Action price:2.99
    |Action price:10.99

    so later, when I want to test a new price, let's say 6.99 .. it will have a better estimator for it?
    2 replies
    Raphael Ottoni
    @raphaottoni
    I also opened a stack overflow question, so I could update the findings and help others 😊
    Raphael Ottoni
    @raphaottoni
    @olgavrou, I’m little bit confused by the answer you gabe to @Favoreto_B_twitter. You said, and I quote, “ The reason you see it always at index 0 is that VW will swap the predicted action with the first index so that it is always at index 0” .. what you are saying is that the internal index of a arm could change ?! In on intersction the index 0 of the PMF would be related to arm1 but in the next to arm2? How Am I suppose to know which arms are at each index given the pmf ? How can I validate those things ?
    @olgavrou does it happe with the —cb_explore_adf ? I think it doesnt ... due to the order we pass on predicit, right ?
    shared |User user=Tom time_of_day=morning
    |Action article=politics
    |Action article=sports
    |Action article=music
    |Action article=food
    In this example , politicis would always be index 0 abd food always index 3, in the PMF right ?