Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Fedor Shabashev
    @fshabashev
    I wonder if it is possible to use Vowpal Wabbit with unix socket (file socket) instead of a TCP socket.
    The documentation only describes the TCP socket usage, while file sockets could be convenient so I won't have to use a port
    Diana Omelianchyk
    @omelyanchikd

    Good day, Vowpal Community, @all
    we wanted to switch our contextual bandit models from epsilon-greedy approach to the online cover approach. However, when we ran this simple snippet of code (see below) to check how online cover is going to perform for us, result was not as expected.

    import vowpalwabbit.pyvw as pyvw
    data_train = ["1:0:0.5 |features a b", "2:-1:0.5 |features a c", "2:0:0.5 |features b c",
                  "1:-2:0.5 |features b d", "2:0:0.5 |features a d", "1:0:0.5 |features a c d",
                  "1:-1:0.5 |features a c", "2:-1:0.5 |features a c"]
    data_test = ["|features a b", "|features a b"]
    model1 = pyvw.vw(cb_explore=2, cover=10, save_resume=True)
    for data in data_train:
        model1.learn(data)
    model1.save("saved_model.model")
    model2 = pyvw.vw(i="saved_model.model")
    for data in data_test:
        print(data)
        print(model1.predict(data))
        print(model2.predict(data))
    for data in data_test:
        print(data)
        print(model1.predict(data))
        print(model2.predict(data))

    Output for this snippet was like this:

    |features a b
    [0.75, 0.25]
    [0.5, 0.5]
    |features a b
    [0.7642977237701416, 0.2357022762298584]
    [0.5, 0.5]
    |features a b
    [0.7763931751251221, 0.22360679507255554]
    [0.5, 0.5]
    |features a b
    [0.7867993116378784, 0.21320071816444397]
    [0.5917516946792603, 0.40824827551841736]

    For some reason, initiated model2 does not seem to provide results, influenced by loaded weights (it starts with uniform distribution between two actions). Moreso, though no learning has been happening for model1 and model2 for test dataset, predicted probabilities changed over time for both models. Is this an expected behavior for online cover approach? And if yes, could you please guide me to any documentation/article, where I could find an explanation on why it's happening.

    Diana Omelianchyk
    @omelyanchikd
    Many thanks in advance :)
    Paul Mineiro
    @pmineiro
    @maxpagels_twitter : the purpose of explore_eval is to estimate the online performance of a learning algorithm as it learns, but using an off-policy dataset. it's different than evaluating or learning a policy over an off-policy dataset, because you have to account for the change in information revealed to the algorithm as the result of making different decisions. as such, it is far less data efficient, but sometimes necessary. one use case is to evaluate exploration strategies offline, hence the name.
    1 reply
    Wes
    @wmelton
    Are there any practical examples in the wild of taking a similar action and context data in JSON format as shown on Personalizer's documentation (https://docs.microsoft.com/en-us/azure/cognitive-services/personalizer/concepts-features#actions-represent-a-list-of-options) and converting it to the VW format for use with cb or cb_adf? The VW website example for news recommendation only uses static strings as actions, where real world news recs would use article features in the actions to improve decision quality. Appreciate any help/guidance.
    Max Pagels
    @maxpagels_twitter

    @pmineiro thanks. So just to be clear, let's say I have logged bandit data and want to know whether an epsilon-greedy algorithm at 10% or 20% would be better. Do I:

    • use explore_eval for both and choose the one with the best average loss?
    • run vw --cb_explore <n> --epsilon 0.1 and vw --cb_explore <n> --epsilon 0.2 and choose the one with the best average loss?

    As far as I can tell I should be using explore_eval, which is why I'm wondering what the use case for the second option is, i.e. comparing different exploration algorithms by simply comparing losses of respective --cb_explore experiments? it there any situation where this is a valid approach?

    Paul Mineiro
    @pmineiro
    @maxpagels_twitter : you definitely do not ever run --cb_explore (or --cb_explore_adf) on an offline CB dataset without --explore_eval. you only run --cb_explore either 1) online, i.e., acting in the real-world, 2) offline with a supervised dataset and --cbify (to simulate #1) or 3) offline with --explore_eval and an offline CB dataset (to simulate #1). nothing else is coherent.
    Wes
    @wmelton
    @pmineiro In online cb scenarios, if you are predicting clicks on 3 pieces of content, why is it necessary to explicitly update the model when no action was taken by the user? In traditional bayes-bernoulli approaches, “regret” was implicit by nature of reward and trials being separate. Trying to make the mental shift here. Challenge i see in our current inplementation is that if we update the model to say cost of 0 (no action) but user shortly after takes an action (cost -1), model sees the probability as 50% now, which seems odd to me. Outside of batch updates (which seems to defeat the purpose of “online” learning), is there a way to tell VW the incremental value of a given prediction as to not dilute the model?
    Paul Mineiro
    @pmineiro
    @wmelton : the short answer is that by fitting the zeros you are regressing against an unbiased target. the long answer is very long.
    Wes
    @wmelton
    @pmineiro haha that makes sense. Am i correct in assuming that omitting zero cost outcomes would reduce performance significantly? Are there any solid papers or videos that are helpful in describing typical real-time data flows for using vw in RL scenario like this? It seems like outside of fixed window batch scenarios it would be very difficult to do this efficiently.
    Paul Mineiro
    @pmineiro
    @wmelton it's hard to understand your question. in your 3-pieces-of-content-recommendation-problem, when a user takes no action in response to a piece of content that is presumed bad (cost 0) and you need to tell the learning algorithm about it, why is that surprising? of course you wait for some amount of time before concluding the user has taken no response, and you only update the model once per decision. azure personalizer (https://azure.microsoft.com/en-us/services/cognitive-services/personalizer/) parametrizes this delay as the "experimental unit window". i suggest you use that as the dataflows have been all worked out already.
    Wes
    @wmelton
    @pmineiro i appreciate your help and feedback. We considered using Personalizer but it is exceptionally expensive for a startup. From what you’ve shared here, i think i understand now the correct way to handle this. Thanks for your time and help! If you have a coffee or beer fund, happy to drop something in there for the help. Thanks!
    Max Pagels
    @maxpagels_twitter
    Regarding @wmelton's question, I think he is wondering because in e.g. standard bernoulli bandits with beta posterior updates, you only need to record trials + successes, and that can technically be done by incrementing a trial count at the point of prediction and incrementing successes only if you get a positive reward. Whereas the algorithms VW uses require you to make a prediction, keep track of the context, then wait for a positive or negative reward. Both types or feedback are explictly needed
    Paul Mineiro
    @pmineiro
    @maxpagels_twitter : ok. in the bandit (no context) case with discrete actions, the model parameters are a (c, n) pair per action. the update is (c += 1 if success else 0, n += 1) ... since the n update is constant you can apply it anytime you want and still get the same answer. you do have to remember what action was taken to be able to apply the "c" update later, so that's the analog of "remembering context". however, by applying the "n update" before you "know c" you are actually creating a pessimistic (over recent trials) estimator by assuming "c = 0 for now", whereas the rest of the technique uses an optimistic estimator (to explore). but counts are "data linear" (e.g., you can just remove some of the c-and-n if you decide later those interactions were lying spammers) whereas in more complicated model spaces we don't know how to be data linear.
    Wes
    @wmelton
    @maxpagels_twitter exactly! Thanks for helpfully asking what i apparently wasnt asking clearly lol. @pmineiro In this case, should we ignore reward signals after a window has expired, or should we still process them trusting that central limit thereom will help us achieve accuracy over time as we observe more events?
    @maxpagels_twitter on a different topic, in your analysis, how are you extracting confidence intervals for prediction accuracy? I havent uncovered in the documentation (yet) how to observe model confidence per inference or total over time.
    Paul Mineiro
    @pmineiro

    @pmineiro In this case, should we ignore reward signals after a window has expired, or should we still process them trusting that central limit theorem will help us achieve accuracy over time as we observe more events?

    I'm not trying to be salty, but there's no CLT issue here. When you update VW, you are saying "for this context i observed this reward". If you do it again, you are saying "i happened to observe the exact same context again but this time i got this other reward". So the best estimate after that is the average of the first and second reward, which is probably not what you want. With respect to time limit, if you define reward as, e.g., "1 if a click within 30 minutes of presentation else 0" then what happens after 30 minutes is irrelevant.

    Wes
    @wmelton
    @pmineiro i dont mind saltiness - just here to learn. I referenced CLT to highlight the same situation as found in the example you gave, at least in my mind. At a sufficiently large sample size, errors in reporting reward with perfect accuracy should regress to the mean over time, correct? I may have wrongly assumed this conclusion given my current understanding that features are “shared” across many users in a given model, so i assumed attribution errors would ultimately more or less tend to the mean given a large enough corpus of events. If Im totally wrong, no sweat haha. Like i said, just here to learn - trying to make the mental leap from a more traditional non-contextual approach to this approach.
    Paul Mineiro
    @pmineiro
    @wmelton using the bandit analogy, if you do 2 vw updates you'll get the equivalent of (n += 2) in the bandit setting. with vw, every time you send in a reward ("c") you get the equivalent of an increment in the number of trials ("n"). so it'll cause you problems.
    Max Pagels
    @maxpagels_twitter

    @wmelton yeah, just to be clear:

    If you have a bernoulli bandit, what some people do is that when an arm is pulled, they record +1 trials and update the posterior, and only when they get a reward for that pull do they update +1 successes. In a context-free setting this is sort of OK and will be kind of eventually consistent. I've done this before, primarily because it saves me from keeping track of pulls that get zero rewards and assigning those explicitly. It isn't "correct", however. In bandit settings you should learn when the reward is available, not do such a half-step. But it's a practical compromise.

    In contextual bandits, and in VW, doing this will fail because of the issue @pmineiro mentioned. The way to overcome this is to keep track of all predictions and their context in some DB or memory store and learn only when a reward arrives for a particular prediction/context, or a suitable amount of time has passed such that you can assume zero reward and learn on that.

    @wmelton regarding the second question, I've found no flags directly in VW for this. I've made my own system with bootstrapping
    Max Pagels
    @maxpagels_twitter

    If anyone has any comments on this message I posted I'd be very grateful:

    @pmineiro thanks for your patience answering all my questions. I did a quick sanity check: I'd expect explore_eval with 100% exploration against a "world" that never changes and where exactly half of actions are positive (-1 cost) and half negative (+1) would get an estimated average loss of 0, but that's not the case. I'm not sure if this is due to some systemic bias, because in this particular case --cb_explore_adf reports the loss I'd expect. I made an issue but I'm not sure if it's a bug or intended behaviour: VowpalWabbit/vowpal_wabbit#2621

    olgavrou
    @olgavrou

    @maxpagels_twitter : you definitely do not ever run --cb_explore (or --cb_explore_adf) on an offline CB dataset without --explore_eval. you only run --cb_explore either 1) online, i.e., acting in the real-world, 2) offline with a supervised dataset and --cbify (to simulate #1) or 3) offline with --explore_eval and an offline CB dataset (to simulate #1). nothing else is coherent.

    @maxpagels_twitter I think Paul was referring to your question here

    Max Pagels
    @maxpagels_twitter
    @olgavrou yeah, I already read that and tested explore_eval as suggested, but it gives a loss i wouldn't expect against a uniform random dataset with exactly as much positive and negative feedback. The reported loss is systematically wrong. Which is why I'm wondering if it's a feature of explore_eval or if there is a bug
    Paul Mineiro
    @pmineiro

    In contextual bandits, and in VW, doing this will fail because of the issue @pmineiro mentioned. The way to overcome this is to keep track of all predictions and their context in some DB or memory store and learn only when a reward arrives for a particular prediction/context, or a suitable amount of time has passed such that you can assume zero reward and learn on that.

    This join operation is done for you by Azure Personalizer (https://azure.microsoft.com/en-us/services/cognitive-services/personalizer/). We done presentations and workshops at AI NextConn conferences where we show the detailed dataflow diagram, maybe you can find one of those ... or you could just use APS.

    Max Pagels
    @maxpagels_twitter

    More questions: why, in cb_explore_adf with epsilon set to 0.0, do I se probability distributions with values other than 0.0 or 1.0? This only happens in the start of a dataset:

    maxpagels@MacBook-Pro:~$ vw --cb_explore_adf test --epsilon 0.0
    Num weight bits = 18
    learning rate = 0.5
    initial_t = 0
    power_t = 0.5
    using no cache
    Reading datafile = test
    num sources = 1
    average  since         example        example  current  current  current
    loss     last          counter         weight    label  predict features
    0.666667 0.666667            1            1.0    known        0:0.333333...        6
    0.833333 1.000000            2            2.0    known        1:0.5...        6
    0.416667 0.000000            4            4.0    known        2:1...        6
    0.208333 0.000000            8            8.0    known        2:1...        6
    0.104167 0.000000           16           16.0    known        2:1...        6
    0.052083 0.000000           32           32.0    known        2:1...        6
    0.026042 0.000000           64           64.0    known        2:1...        6
    0.013021 0.000000          128          128.0    known        2:1...        6
    0.006510 0.000000          256          256.0    known        2:1...        6
    
    finished run
    number of examples = 486
    weighted example sum = 486.000000
    weighted label sum = 0.000000
    average loss = 0.003429
    total feature number = 4374
    maxpagels@MacBook-Pro:~$

    All examples have the same number of arms (3), and on different datasets, I see the same thing at the start of a dataset. One large dataset I have takes some 20,000 examples before giving correct probabilities

    --first works as expected, but not --epsilon, which at 0.0 exploration should be greedy, ie. the probability vector should have one value of 1.0 and the reset of 0.0.
    Max Pagels
    @maxpagels_twitter
    Update on the above: apparently if the raw predicted cost for 2 or more arms is exactly the same, tie breaks are done at random, resulting in a probability other than one even though --epsilon 0.0
    olgavrou
    @olgavrou
    Hi @omelyanchikd thanks for bringing this up, this looks like a bug in cover. The distribution (and the resulting prediction) should not be affected by the number of predictions in the test dataset. Will ping you again once some progress is made here.
    Diana Omelianchyk
    @omelyanchikd
    Thank you @olgavrou. I will be looking forward to it. We have decided to go with bagging approach for now.
    Wes
    @wmelton
    @olgavrou can you help me understand how vw treats vectors passed as features? E.g. feature=[0.3,-1.3,...,n] - when using this, vw does not throw an error, yet it is not apparent how vw interprets this. Does it understand it as a vector or does it treat it more like a string and one-hot encode it, or something different entirely?
    1 reply
    Allegra Latimer
    @alatimer
    Hi all--I'm wondering if anyone can point me to papers that demonstrate real-world/industry examples of building a model to filter a large action space down to a reasonable number of actions before using CB to select a recommended action from that subset. I have seen cases where the action space was naturally limited by context, eg by business rules (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/thompson.pdf) or curation by domain experts (https://arxiv.org/abs/1003.0146), but I haven't been able to find a good example of a hierarchical modeling approach where eg a high-recall recommender system is used to first filter down actions to a manageable subset before applying CB. Any ideas?
    Amil Khare
    @amil.khare_gitlab
    Hey all!
    I was wondering if VW can handle cases with one class datasets. Basically finding similar data points based on initial dataset. Is there some way VW can help in such cases?
    Allegra Latimer
    @alatimer
    Hi all, I see CATS is a relatively new algorithm for bandits in continuous action space that uses a tree policy class. Does that mean we can use a tree policy class generally in VW? EG if I am using vw --cb_explore_adf, is there a command line argument to make the policy class be decision trees?
    olgavrou
    @olgavrou
    Hi @alatimer right now that will not be possible. CATS uses the cats_tree reduction under the hood which is a single line reduction. cb_explore_adf is a multi line reduction. Offset_tree was also added though that will use cb_explore (again it is single line so no adf involved here).
    2 replies
    George Fei
    @georgefei
    Hi all, I’m wondering if the python wrapper or the cli tool provides a way to output the reward estimation of each arm in the cb/cb_explore mode (I found this post asking the exact same question but it was unanswered: https://stackoverflow.com/questions/60678450/how-to-get-cost-reward-current-estimation-for-all-arms-using-vowpal-wabbit-with)?
    1 reply
    Wenjuan Dou
    @darlwen
    image.png
    Hi all , I am reading VW source code recently, I am confused about how reduction stack work, to be more detail, how does this setup_base function initialize all enabled reductions?
    My train setting is: vw -d debug.txt --foreground --ccb_explore_adf --cb_type mtr --epsilon 0.01 --ftrl -f debug.model, and based on the debug info:
    Enabled reductions: ftrl, scorer, csoaa_ldf, cb_adf, cb_explore_adf_greedy, cb_sample, shared_feature_merger, ccb_explore_adf
    how does all these reductions enabled?
    Wenjuan Dou
    @darlwen
    @olgavrou could you pls help explain it?
    Alexey Taymanov
    @ataymano

    hi @darlwen ,
    Stack of reductions for every vw run is defined by 2 things:
    1) DAG of dependencies that are defined in setup function for every reduction.
    i.e. here:
    https://github.com/VowpalWabbit/vowpal_wabbit/blob/b8732ffec3f8c7150dace1c41434bf3cdb4d8436/vowpalwabbit/cb_explore_adf_greedy.cc#L96
    if we have cb_explore_adf reduction included, we also include cb_adf one.
    2) topoligical order here: https://github.com/VowpalWabbit/vowpal_wabbit/blob/b8732ffec3f8c7150dace1c41434bf3cdb4d8436/vowpalwabbit/parse_args.cc#L1246

    So, final stack of reduction for each vw run is actually sub-stack from 2) that contains:
    1) reductions that you explicitly provided in your command line
    2) reductions that defined in input model file (if any)
    3) reductions populated as dependencies.

    In your case you have ccb_explore_adf, ftrl provided explicitly by you, others are populated as dependencies:
    ccb_explore_adf -> cb_sample
    ccb_explore_adf -> cb_explore_adf_greedy -> cb_adf -> csoaa_ldf

    Wenjuan Dou
    @darlwen

    thanks @ataymano much more clear now. In VW::LEARNER::base_learner* setup_base(options_i& options, vw& all)
    when enter the following logic,

     else
      {
        all.enabled_reductions.push_back(std::get<0>(setup_func));
        return base;
      }

    my understanding is that it won't do auto setup_func = all.reduction_stack.top(); anymore, for example, when we get "ftrl_setup" then it enters the else logic, then how it makes the rest reductions(scorer, ccb_explore_adf etc.) enabled?

    2 replies
    Kev. Noel
    @arita37
    thanks for you presntation at Neurips !
    Sam Lendle
    @lendle
    howdy! I'm trying to get a pyvw.vw object to process a data file when I instantiate it with a --data argument. Based on this fairly recent s.o. answer https://stackoverflow.com/a/62876763, my understanding is that it should do just that, but I am not having any luck. I'm using vw version 8.9.0, did something change in a recent release? I have confirmed that using the same options from the command line works so I don't think I'm doing something obviously wrong like using a wrong file name
    6 replies
    Andrew Clegg
    @andrewclegg
    what does ring_size do? will increasing this help I/O performance, or is it not something to worry about?
    Jack Gerrits
    @jackgerrits
    Ring_size refers to the initial size of the example pool. It will resize if it needs more room to store parsed examples waiting to be processed by the learner. I wouldn't worry too much about changing it.
    Andrew Clegg
    @andrewclegg
    thanks!
    buildvoc
    @buildvoc
    Was wondering if you could help with a very simple question please note that I am a beginner in VW, someone explain multi class or multi label to me and how it works in VW for example can answer questions like “What’s the predicted price of this house?” or “Should I buy this house today?" but it's more difficult to apply it to a problem like "What type of house is this?" (multiclass) or "What are the adjectives that best describe this house?" (multilabel)
    George Fei
    @georgefei
    hey @jackgerrits could you help answer my question from 2 weeks ago: I’m wondering if the python wrapper or the cli tool provides a way to output the reward estimation of each arm in the cb/cb_explore mode (I found this post asking the exact same question but it was unanswered: https://stackoverflow.com/questions/60678450/how-to-get-cost-reward-current-estimation-for-all-arms-using-vowpal-wabbit-with)?
    9 replies