Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Max Pagels
    @maxpagels_twitter

    If anyone has any comments on this message I posted I'd be very grateful:

    @pmineiro thanks for your patience answering all my questions. I did a quick sanity check: I'd expect explore_eval with 100% exploration against a "world" that never changes and where exactly half of actions are positive (-1 cost) and half negative (+1) would get an estimated average loss of 0, but that's not the case. I'm not sure if this is due to some systemic bias, because in this particular case --cb_explore_adf reports the loss I'd expect. I made an issue but I'm not sure if it's a bug or intended behaviour: VowpalWabbit/vowpal_wabbit#2621

    olgavrou
    @olgavrou

    @maxpagels_twitter : you definitely do not ever run --cb_explore (or --cb_explore_adf) on an offline CB dataset without --explore_eval. you only run --cb_explore either 1) online, i.e., acting in the real-world, 2) offline with a supervised dataset and --cbify (to simulate #1) or 3) offline with --explore_eval and an offline CB dataset (to simulate #1). nothing else is coherent.

    @maxpagels_twitter I think Paul was referring to your question here

    Max Pagels
    @maxpagels_twitter
    @olgavrou yeah, I already read that and tested explore_eval as suggested, but it gives a loss i wouldn't expect against a uniform random dataset with exactly as much positive and negative feedback. The reported loss is systematically wrong. Which is why I'm wondering if it's a feature of explore_eval or if there is a bug
    Paul Mineiro
    @pmineiro

    In contextual bandits, and in VW, doing this will fail because of the issue @pmineiro mentioned. The way to overcome this is to keep track of all predictions and their context in some DB or memory store and learn only when a reward arrives for a particular prediction/context, or a suitable amount of time has passed such that you can assume zero reward and learn on that.

    This join operation is done for you by Azure Personalizer (https://azure.microsoft.com/en-us/services/cognitive-services/personalizer/). We done presentations and workshops at AI NextConn conferences where we show the detailed dataflow diagram, maybe you can find one of those ... or you could just use APS.

    Max Pagels
    @maxpagels_twitter

    More questions: why, in cb_explore_adf with epsilon set to 0.0, do I se probability distributions with values other than 0.0 or 1.0? This only happens in the start of a dataset:

    maxpagels@MacBook-Pro:~$ vw --cb_explore_adf test --epsilon 0.0
    Num weight bits = 18
    learning rate = 0.5
    initial_t = 0
    power_t = 0.5
    using no cache
    Reading datafile = test
    num sources = 1
    average  since         example        example  current  current  current
    loss     last          counter         weight    label  predict features
    0.666667 0.666667            1            1.0    known        0:0.333333...        6
    0.833333 1.000000            2            2.0    known        1:0.5...        6
    0.416667 0.000000            4            4.0    known        2:1...        6
    0.208333 0.000000            8            8.0    known        2:1...        6
    0.104167 0.000000           16           16.0    known        2:1...        6
    0.052083 0.000000           32           32.0    known        2:1...        6
    0.026042 0.000000           64           64.0    known        2:1...        6
    0.013021 0.000000          128          128.0    known        2:1...        6
    0.006510 0.000000          256          256.0    known        2:1...        6
    
    finished run
    number of examples = 486
    weighted example sum = 486.000000
    weighted label sum = 0.000000
    average loss = 0.003429
    total feature number = 4374
    maxpagels@MacBook-Pro:~$

    All examples have the same number of arms (3), and on different datasets, I see the same thing at the start of a dataset. One large dataset I have takes some 20,000 examples before giving correct probabilities

    --first works as expected, but not --epsilon, which at 0.0 exploration should be greedy, ie. the probability vector should have one value of 1.0 and the reset of 0.0.
    Max Pagels
    @maxpagels_twitter
    Update on the above: apparently if the raw predicted cost for 2 or more arms is exactly the same, tie breaks are done at random, resulting in a probability other than one even though --epsilon 0.0
    olgavrou
    @olgavrou
    Hi @omelyanchikd thanks for bringing this up, this looks like a bug in cover. The distribution (and the resulting prediction) should not be affected by the number of predictions in the test dataset. Will ping you again once some progress is made here.
    Diana Omelianchyk
    @omelyanchikd
    Thank you @olgavrou. I will be looking forward to it. We have decided to go with bagging approach for now.
    Wes
    @wmelton
    @olgavrou can you help me understand how vw treats vectors passed as features? E.g. feature=[0.3,-1.3,...,n] - when using this, vw does not throw an error, yet it is not apparent how vw interprets this. Does it understand it as a vector or does it treat it more like a string and one-hot encode it, or something different entirely?
    1 reply
    Allegra Latimer
    @alatimer
    Hi all--I'm wondering if anyone can point me to papers that demonstrate real-world/industry examples of building a model to filter a large action space down to a reasonable number of actions before using CB to select a recommended action from that subset. I have seen cases where the action space was naturally limited by context, eg by business rules (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/thompson.pdf) or curation by domain experts (https://arxiv.org/abs/1003.0146), but I haven't been able to find a good example of a hierarchical modeling approach where eg a high-recall recommender system is used to first filter down actions to a manageable subset before applying CB. Any ideas?
    Amil Khare
    @amil.khare_gitlab
    Hey all!
    I was wondering if VW can handle cases with one class datasets. Basically finding similar data points based on initial dataset. Is there some way VW can help in such cases?
    Allegra Latimer
    @alatimer
    Hi all, I see CATS is a relatively new algorithm for bandits in continuous action space that uses a tree policy class. Does that mean we can use a tree policy class generally in VW? EG if I am using vw --cb_explore_adf, is there a command line argument to make the policy class be decision trees?
    olgavrou
    @olgavrou
    Hi @alatimer right now that will not be possible. CATS uses the cats_tree reduction under the hood which is a single line reduction. cb_explore_adf is a multi line reduction. Offset_tree was also added though that will use cb_explore (again it is single line so no adf involved here).
    2 replies
    George Fei
    @georgefei
    Hi all, I’m wondering if the python wrapper or the cli tool provides a way to output the reward estimation of each arm in the cb/cb_explore mode (I found this post asking the exact same question but it was unanswered: https://stackoverflow.com/questions/60678450/how-to-get-cost-reward-current-estimation-for-all-arms-using-vowpal-wabbit-with)?
    1 reply
    Wenjuan Dou
    @darlwen
    image.png
    Hi all , I am reading VW source code recently, I am confused about how reduction stack work, to be more detail, how does this setup_base function initialize all enabled reductions?
    My train setting is: vw -d debug.txt --foreground --ccb_explore_adf --cb_type mtr --epsilon 0.01 --ftrl -f debug.model, and based on the debug info:
    Enabled reductions: ftrl, scorer, csoaa_ldf, cb_adf, cb_explore_adf_greedy, cb_sample, shared_feature_merger, ccb_explore_adf
    how does all these reductions enabled?
    Wenjuan Dou
    @darlwen
    @olgavrou could you pls help explain it?
    Alexey Taymanov
    @ataymano

    hi @darlwen ,
    Stack of reductions for every vw run is defined by 2 things:
    1) DAG of dependencies that are defined in setup function for every reduction.
    i.e. here:
    https://github.com/VowpalWabbit/vowpal_wabbit/blob/b8732ffec3f8c7150dace1c41434bf3cdb4d8436/vowpalwabbit/cb_explore_adf_greedy.cc#L96
    if we have cb_explore_adf reduction included, we also include cb_adf one.
    2) topoligical order here: https://github.com/VowpalWabbit/vowpal_wabbit/blob/b8732ffec3f8c7150dace1c41434bf3cdb4d8436/vowpalwabbit/parse_args.cc#L1246

    So, final stack of reduction for each vw run is actually sub-stack from 2) that contains:
    1) reductions that you explicitly provided in your command line
    2) reductions that defined in input model file (if any)
    3) reductions populated as dependencies.

    In your case you have ccb_explore_adf, ftrl provided explicitly by you, others are populated as dependencies:
    ccb_explore_adf -> cb_sample
    ccb_explore_adf -> cb_explore_adf_greedy -> cb_adf -> csoaa_ldf

    Wenjuan Dou
    @darlwen

    thanks @ataymano much more clear now. In VW::LEARNER::base_learner* setup_base(options_i& options, vw& all)
    when enter the following logic,

     else
      {
        all.enabled_reductions.push_back(std::get<0>(setup_func));
        return base;
      }

    my understanding is that it won't do auto setup_func = all.reduction_stack.top(); anymore, for example, when we get "ftrl_setup" then it enters the else logic, then how it makes the rest reductions(scorer, ccb_explore_adf etc.) enabled?

    2 replies
    Kev. Noel
    @arita37
    thanks for you presntation at Neurips !
    Sam Lendle
    @lendle
    howdy! I'm trying to get a pyvw.vw object to process a data file when I instantiate it with a --data argument. Based on this fairly recent s.o. answer https://stackoverflow.com/a/62876763, my understanding is that it should do just that, but I am not having any luck. I'm using vw version 8.9.0, did something change in a recent release? I have confirmed that using the same options from the command line works so I don't think I'm doing something obviously wrong like using a wrong file name
    6 replies
    Andrew Clegg
    @andrewclegg
    what does ring_size do? will increasing this help I/O performance, or is it not something to worry about?
    Jack Gerrits
    @jackgerrits
    Ring_size refers to the initial size of the example pool. It will resize if it needs more room to store parsed examples waiting to be processed by the learner. I wouldn't worry too much about changing it.
    Andrew Clegg
    @andrewclegg
    thanks!
    buildvoc
    @buildvoc
    Was wondering if you could help with a very simple question please note that I am a beginner in VW, someone explain multi class or multi label to me and how it works in VW for example can answer questions like “What’s the predicted price of this house?” or “Should I buy this house today?" but it's more difficult to apply it to a problem like "What type of house is this?" (multiclass) or "What are the adjectives that best describe this house?" (multilabel)
    George Fei
    @georgefei
    hey @jackgerrits could you help answer my question from 2 weeks ago: I’m wondering if the python wrapper or the cli tool provides a way to output the reward estimation of each arm in the cb/cb_explore mode (I found this post asking the exact same question but it was unanswered: https://stackoverflow.com/questions/60678450/how-to-get-cost-reward-current-estimation-for-all-arms-using-vowpal-wabbit-with)?
    9 replies
    Sam Lendle
    @lendle
    I have a bunch of questions, mostly about bandits & policy optimization options, not so much about bandits w/ exploration. Questions below in separate messages so responses/discussions can be threaded. In exchange for everyone's patience, I'll update the wiki where I can.
    Is my understanding correct that dm, ips, and dr all run the cost sensitive classification reduction with estimated cost for each action as the cost in the csc? The difference between dm, ips, dr is in the estimated costs:
    • dm: naively estimates cost w/ regression. Subject to bias due to confounding
    • ips: estimate cost as reported cost/probability, or 0 if cost is not reported. (c(a) = cost/probability * I(observed action = a)). Unbiased if probabilities are correct, usually high variance
    • dr: doubly robust, uses regression and ips, often lower variance
    1 reply
    Is mtr the same as “importance weighted regression” in https://arxiv.org/pdf/1802.04064.pdf?
    If so, is the method:
    1. Estimate costs w/ regression and 1/probability importance weights. (Essentially the same regression for cost as in the dm method, but with importance weights?)
    2. Policy: predict cost for each action, take action w/ lowest predict cost
    1 reply
    What does --baseline do with respect to both simple regression and with contextual bandits? What is an example's ‘enabled flag’ referred to in the help string for --check_enabled?
    4 replies
    Is there a reason MTR isn’t/can’t be implemented for --cb rather than --cb_adf? Related: it’s trivial to manually convert a simple cb example to an adf cb example. I would think that --cb gets internally converted to an adf type problem, but since mtr is not available for --cb, it suggests that is not the case. What else is different between --cb and --cb_adf, when there are not actually any dependent features other than an action indicator?
    4 replies

    This is the only thing I’ve found that describes the implementation for csoaa: http://users.umiacs.umd.edu/~hal/tmp/multiclassVW.html. As I read it, that means csc based bandit methods:

    1. Fit a regression for each action where the target of the regression is the estimated cost of that action
    2. Policy: predict cost for each action from the regression models, take action w/ lowest predicted cost
      Is that right?

    If so, is it reasonable to think of ips and mtr as essentially the same except:

    1. IPS uses cost * I(action = observed action)/probability as target and 1 as weight
    2. Mtr uses cost as target and I(action = observed action)/probability as weight
    1 reply
    How is the progressive validation loss reported by the driver for bandits when not exploring? Is it just the IPS: mean(cost * I(observed action = predicted action) / probability)) or something more sophisticated, like https://arxiv.org/abs/1210?
    3 replies
    Is there an --eval type option for --cb_adf? How is the data specified?
    2 replies
    Travis Brady
    @travisbrady
    Are there plans to have OPE estimators ala https://github.com/VowpalWabbit/estimators available in mainline vw so they can be called from outside python? (context: I'm a binding author and would love access to this)
    2 replies
    Travis Brady
    @travisbrady
    Hi all. I wrote up a short little post about vw 8.9.0 mostly for folks newer to vw but thought I'd share here. I'd love any feedback or corrections, I'm sure I've missed plenty. https://travisbrady.github.io/posts/vowpal-8.9.0/
    Jack Gerrits
    @jackgerrits
    Thanks for sharing Travis! Love the post! (And thanks for the call out :) ) By the way, I haven’t forgotten about supporting bindings, it’s been a bit lower on the todo list for a while though. We’ve started work on a new C binding interface that provides a more complete view at VW functionality and actually reports errors how a C api should.
    3 replies
    Sam Lendle
    @lendle
    For posterity, here's a recording of the NeurIPS 2020 presentation: https://slideslive.com/38942331/vowpal-wabbit, which I discovered in Travis's post above but haven't been able to find elsewehre. Great presentation, it helped clear up many of the questions I had last week. Thanks!
    Jack Gerrits
    @jackgerrits
    We also just put up a blog post that contains timestamps to each section in case anyone is looking for anything in particular: https://vowpalwabbit.org/blog/neurips2020.html
    Lena Gangan
    @lenagangan_twitter
    is it reasonable to use CB (or maybe even non contextual) with >20K arms (nr of book titles from an online bookstore)
    Wenjuan Dou
    @darlwen
    I am using pyvw(python api of vw), are there any way to print debug info when using vw.learn()?
    6 replies
    Wenjuan Dou
    @darlwen
    And one more question , in https://vowpalwabbit.org/blog/neurips2020.html, for CB Visualizations, where can I get source code of it?
    1 reply
    Dan Swain
    @dantswain
    Hi! I was wondering how people are using VW in a cloud-based environment? It seems like VW is generally set up to write things to disk, so it's not clear to me how one would operate many learners in an environment where disk is ephemeral and for scaling purposes a request may be routed to any one of multiple servers. Is anyone using VW like this?
    15 replies
    Max Pagels
    @maxpagels_twitter

    I'm using explore_eval to evaluate exploration algorithms (e-greedy with different epsilon values). Can someone confirm that explore_eval isn't intended for use with more than one pass over the data?

    The core issue I have is that i'd like to evaluate the best policy + exploration algorithm for a system in which the policy is trained once per week and then deployed. So the model itself is stationary for a week but across e.g. a year, it isn't. I'd like to use data generated by this system to do offline evaluation of new policies + exploration algorithms

    Wenjuan Dou
    @darlwen

    Hi all, I am reading the code to make clear how vw do epsilon greedy exploration.
    I find the following code in cb_explore_adf_greedy.cc:

    void cb_explore_adf_greedy::predict_or_learn_impl(VW::LEARNER::multi_learner& base, multi_ex& examples)
    {
      // Explore uniform random an epsilon fraction of the time.
      VW::LEARNER::multiline_learn_or_predict<is_learn>(base, examples, examples[0]->ft_offset);
    
      ACTION_SCORE::action_scores& preds = examples[0]->pred.a_s;
    
      uint32_t num_actions = (uint32_t)preds.size();
    
      size_t tied_actions = fill_tied(preds);
    
      const float prob = _epsilon / num_actions;
      for (size_t i = 0; i < num_actions; i++) preds[i].score = prob;
      if (!_first_only)
      {
        for (size_t i = 0; i < tied_actions; ++i) preds[i].score += (1.f - _epsilon) / tied_actions;
      }
      else
        preds[0].score += 1.f - _epsilon;
    }

    It givens the action with the largest cost a score: 1-epsilon, and the rest actions a score: epsilon/num_actions. Is this how it do exploration based on epsilon? I am a little confused about it, can someone help explain it?

    Wenjuan Dou
    @darlwen
    @here could someone help answer my question above? Thanks a lot!
    Max Pagels
    @maxpagels_twitter

    Epsilon greedy works as follows (example with 4 arms):

    Per round, choose the best arm given context (i.e. arm with lowest cost) with probability epsilon. With probability 1-epsilon, choose an arm uniformly at random.

    With epsilon = 0.1, at any given round, the probability of choosing the best arm is 1-0.1 plus 0.1 x 1/4 -> 0.925 ("exploit"). The probability of choosing a suboptimal arm is 0.1 * (1/4) = 0.025 ("explore")

    3 replies
    Max Pagels
    @maxpagels_twitter

    Watched the great content at https://slideslive.com/38942331/vowpal-wabbit, thanks to all involved! A related question:

    I am implementing a ranking system, where the action sets per slot are not disjoint, i.e. i basically want a ranking without duplicates. The video mentions that the theory behind slates is worked out for the intersected/joint action case, but that it's still being worked on in VW.

    Am I shooting myself in the foot if I use CCB instead of slates now? Is there some rough estimate of when joint action sets will be supported in slates mode? Is slates mode planned as a replacement for CCB? @jackgerrits is probably the one to ask :)