Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Crystal Wang
    @cwang506
    image.png
    Crystal Wang
    @cwang506
    image.png
    Дмитрий Корнильцев
    @kornilcdima:matrix.org
    [m]
    Hi everyone! Does someone have a working python example of new VW's feature "CATS, CATS pdf for Continuous Actions"?
    I'm really new to RL and this library. Can someone help me to understand how to process data with this new Bandit?
    My task is the following. I have prices and I need to find optimal price (not big not low), my reward is a click rate.
    Appreciate any help )
    Jack Gerrits
    @jackgerrits
    Olga created a tutorial Jupyter notebook which has not yet been merged but it is a great resource VowpalWabbit/jupyter-notebooks#6
    4 replies
    Jack Gerrits
    @jackgerrits
    What version of vw are you using?
    9 replies
    Chang Liu
    @changliu94
    Hi everyone! Does anyone know if there is any command/implementation in vw that can tackle non-stationary environment? Thanks in advance!
    1 reply
    Harsh Khilawala
    @HarshKhilawala_gitlab
    I am new here and want to get started contributing to VowpalWabbit. Can anyone please help me get started?
    3 replies
    Mónika Farsang
    @MoniFarsang
    Hi, does someone know whether the RL open source fest results are already out?
    Nishant Kumar
    @nishantkr18
    Yes they are now. Congratulations to everyone selected!
    1 reply
    daraya123
    @daraya123

    Hi all, I had a problem when using SquareCB algorithm to train contextual bandit model, especially when saving & loading it again.
    I trained and saved SquareCB model in this way (using the simulation setting as in https://vowpalwabbit.org/tutorials/cb_simulation.html):

    vw = pyvw.vw("--cb_explore_adf -q UA -f squarecb.model --save_resume --quiet --squarecb")
    num_iterations = 5000
    ctr = run_simulation(vw, num_iterations, users, times_of_day, actions, get_cost)
    plot_ctr(num_iterations, ctr)
    vw.finish()

    and then loaded the model :

    vw_loaded=pyvw.vw('--cb_explore_adf -q UA -i squarecb.model')
    num_iterations = 5000
    ctr = run_simulation(vw_loaded, num_iterations, users, times_of_day, actions, get_cost, do_learn=False)
    
    plot_ctr(num_iterations, ctr)
    print(ctr[-1])

    and the loaded model seems like doing a random exploration.
    Could anyone explain how to save and load this model correctly? Thanks in advance.

    11 replies
    daraya123
    @daraya123
    image.png
    George Fei
    @georgefei
    Hi all, if I use explore_eval to evaluate the reward estimation + exploration algo combination, is it fair for me to compare the explore_eval's output average loss with the realized average loss of the training data?
    10 replies
    Chang Liu
    @changliu94
    Can anyone here help me understand how the bagging algorithm does counterfactual learning from logged bandit data? From the bagging algorithm in the bake-off paper, we can see it reduces to an oracle where the probability of choosing an action a is decided by the proportion of the current policies that evaluate action a as optimial. So how will the probability in the logged data be used? I am perplexed here.
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys!
    I was reading Microsoft's Personalizer docs and wondering why changing the Reward wait time triggers retraining? I understand that there might be delay-related bias. I don't understand how retraining the model is any better than simply using the existing model but with an updated reward wait time? Because in my mind, offline training doesn't suffer from delay-related bias (because all data is there, ready for use).
    Does this make sense? What am I missing?
    What happens if the user decides to add/modify attributes? Is there retraining or simply a matter of changing the Rank input?
    Thanks!
    Max Pagels
    @maxpagels_twitter
    The key issue is this: say your wait time is one minute and the reward is a click. Now, let's say the click arrived 1,5 minutes after prediction. It's thus not in your training data and assumed to be whatever default reward you specified
    Now, if you change the wait time to 2 minutes, the training data must be recreated so as to include your click, which means the model too must be retrained otherwise you are training on different definitions of reward (old stuff has rewards calculated with 1 minute cutoff, newer with 2 minute cutoff). This leads to general weirdness
    If you add features, at least in VW-land, you don't need to retrain but just continue training on new data. So in personaliser I think it's just a matter of changing the rank input
    of course, if you add informative features later on, they will only be recorded for events after the change, but in an online system that doesn't really matter since it will correct itself over time
    Bernardo Favoreto
    @Favoreto_B_twitter
    Awesome, Max! That's exactly what I thought about in both scenarios. Because Personalizer saves data even after the reward wait time, we'd be able to create new data considering the updated wait time.
    Thanks!
    1 reply
    John
    @JohnLangford
    @changliu94 the probability of an action is passed to the base algorithm that bagging reduces to where it is used in the update rule.
    1 reply
    Marcos Passos
    @marcospassos
    Hey guys! Does anyone know if there is a way to bypass that first letter limitation regarding namespace names? We need to use arbitrary keys that may begin with the same letter
    1 reply
    Max Pagels
    @maxpagels_twitter
    OPE tutorial is up for those interested: https://vowpalwabbit.org/tutorials/off_policy_evaluation.html
    3 replies
    Chang Liu
    @changliu94
    Thank you, Max! It is very much appreciated that we have a tutorial on off-policy evaluation!
    Bernardo Favoreto
    @Favoreto_B_twitter
    How are arrays interpreted in VW? According to (https://docs.microsoft.com/en-us/azure/cognitive-services/personalizer/concepts-features#categorize-features-with-namespaces), we can use an array as a feature value, as long as it's a numeric array.
    I was wondering how does this gets interpreted by VW. The docs show an example of a feature called "grams" whose value is an array (e.g., [150, 300, 450]), but to me is still unclear what happens when we use feature values as arrays.
    Thanks!
    kornilcdima
    @kornilcdima

    Hey everybody. I've just started to use VW. And I'm solving a dynamic pricing problem where price is discrete action space (10 arms). Prices are cut on buckets, every bucket stores 10% of prices. My cost is CTR. My probability is constant 0.1 since I have 10 arms each of them appears in 10% of cases. My goal is to find optimal prices which lead to increasing CTR.
    I know that CATs is better for my case but I prefer not using it as the first attempt.

    I have the following questions questions:
    1). What is the main difference between --cb and -cb_explore. As I understood --cb_explore just gives probabilities and --cb doesn't. I’ve noticed that it was mentioned that --cb doesn't do exploration and --cb_explore does. Am I right at this point?
    2). VW requires the following format action: cost: probability. And probability here is nothing but pmf. Would it be right in my case just to set 0.1 for all cases.
    3). I do kind of pretraining on logged data (existing dataset) to learn policy with parameters: --cb_explore 10 cover 13. After that I use a pre-trained model with the flag -i. I get the output with probas and take the highest proba as predicted value. Will I be exploring in this case?

    Please forgive my naive questions and many thanks for answers in advance )

    20 replies
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hello everyone!
    I have a question concerning the use of Slates + CB and CCB + CB.
    I've come across the following presentation from Netflix (https://www.slideshare.net/FaisalZakariaSiddiqi/netflix-talk-at-ml-platform-meetup-sep-2019) and was wondering if they used Slates.
    Apparently, they do. However, I don't understand how we can use a single slate first to pick a title for a slot and then, at the same prediction, choose a thumbnail. That's why I believe they instead use Slates for title recommendation on multiple slots and CB for thumbnail selection afterward. Would that make sense?
    I believe that if the actions for other slots depend on the first slot's action (e.g., the option of thumbnails for a title depends on the title), Slates cannot be used.
    For CCB + CB, an example could be using CCB to order topics in a list and then CB to pick the written text for each topic.
    Is using Slates or CCB + CB reasonable? Is it very use-case-specific? I'm afraid I'm missing something here.
    Thanks!
    Max Pagels
    @maxpagels_twitter

    So as far as I know netflix actually does so that the possible combos of e.g title and image are predefined, and those form a single arm. Of course the amount of combos is massive, so I don't think they use all.

    There has to be some prefiltering going on since i suspect showing (title: crime dramas, show: top gear, picture: jurassic park) would lead to issues :). So I think that they aren't using slates as slates in VW are defined, merely a large action space where one action is one predefined combo of title, genre, picture and so on.

    I may also be wrong here

    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys, regarding CCB and Slates... what's the use of slot attributes? What sort of attributes can a slot have? Would love to hear some examples!
    Thanks
    2 replies
    kornilcdima
    @kornilcdima
    This message was deleted
    kornilcdima
    @kornilcdima

    Hey guys,
    Does anyone have an example of daemon style code for CATs? Right now I’m using a python wrapper which I took from Olga’s notebook example and It works fine. However, I have a subtle vision of how to launch it in daemon-style.
    Is it something like this?
    pre-training the model on historical data

    vw --cats 6 --bandwidth 0.5 --min_value 0 --max_value 3--epsilon 0.3 -d train.dat -f model.vw

    raising the ready model

    vw --cats 6 --bandwidth 0.5 --min_value 0 --max_value 3--epsilon 0.3 --save_resume --daemon --quiet --num_children 1 --port 8080 -i model.vw -f model.vw

    updating the model on new data

    vw --cats 6 --bandwidth 0.5 --min_value 0 --max_value 3--epsilon 0.3 --save_resume -i model.vw -d train.dat -f model.vw
    olgavrou
    @olgavrou
    hi @kornilcdima here is some documentation on how to use vw in deamon mode and it should work fine if you start vw with the appropriate cats arguments: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format#on-demand-model-saving
    you can also gather vw's predictions by passing in the cli argument: -p <predictions_file>
    kornilcdima
    @kornilcdima

    hi @kornilcdima here is some documentation on how to use vw in deamon mode and it should work fine if you start vw with the appropriate cats arguments: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format#on-demand-model-saving

    @olgavrou, thank for the answer. I already launched VW for discrete action space. So, as I understood, I should I use the same syntax. Then the main question is in what situation I should use --cats_pdf instead of --cats ?

    olgavrou
    @olgavrou
    @kornilcdima cats will call cats_pdf under the hood and then sample from the pdf for you. So you would use cats when you want vw to do the pdf sampling for you and cats_pdf when you want the entire pdf and/or want to do the sampling yourself (see here:https://github.com/VowpalWabbit/vowpal_wabbit/wiki/CATS,-CATS-pdf-for-Continuous-Actions)
    Ryan Angi
    @rangi513

    @maxpagels_twitter Thanks a ton for your OPE tutorial and your really insightful questions above - I've found it extremely useful.
    Currently I have a logged dataset generated from an online bandit policy --cb_adf_expore using --epsilon 0.05 --cb_type dr. I want to determine whether I should be using dr or mtr (IWR) for my cb_type for my online bandit (assuming I restart my policy in the future). I can run --cb_adf over the logged dataset: vw --cb_adf -d train.dat -q AF --cb_type mtr however, based on the OPE tutorial and above comments/questions I understand that I shouldn't compare the PV loss across different OPE estimators. Is there a method I should use to determine the best cb_type option to use for my online policy? (mtr shows a much lower loss than dr, but I understand this isn't really comparable.)

    Please let me know if I'm thinking about this completely wrong and if I should continue to use doubly robust and spend my time fiddling with hyperparameters instead of focusing too much on the PE estimator.

    16 replies
    George Fei
    @georgefei
    Hi everyone, I came across this thesis https://core.ac.uk/download/pdf/154670973.pdf while I was searching online on the best practices for setting hyperparameters like learning rate, learning rate decay, etc for vw. That thesis was written in 2015 and it concludes that " the performance of vw can seriously deteriorate over time in an online setting with nonstationary data" because the learning rate strictly decreases if we make it decay, and if we set the learning rate to be fixed, we risk underfitting/overfitting. Are these concerns raised in that paper still relevant, and I wonder if there are methods implemented now to address those concerns?
    4 replies
    Max Pagels
    @maxpagels_twitter
    Anyone else have issues with PLT? Can't see any difference at all if I change kary_tree, lr
    2 replies
    CLI, 8.10.1 (git commit: 3887696)
    kornilcdima
    @kornilcdima
    Hey, I have 2 questions about CATs.
    1. What value should I put for pdf when I pre-train a model on historical data. It differs from discrete action space where I could set a pmf based on prior distribution of arms, in case of CATs the action space is continuous. I have 2 options in mind: 1-to use a constant, 2-to use pdf-value from vw.predict.
    2. I'd like to imitate Thomson Sampling (TS). Does it have sense to use --botstrap in order to imitate TS? According to this article it does have sense. https://arxiv.org/abs/1706.04687
    olgavrou
    @olgavrou
    @kornilcdima yes cats will expect the value that the pdf had at the action predicted, which is the prob value that you see from vw.predict (the action predicted and the pdf_value at that action) , so using that is the right way to go here if you are training on historical data
    George Fei
    @georgefei
    hey team, I have a very basic question regarding --save_resume, I noticed after setting a non-default learning rate, power_t, lambda and other hyperparameters in the first pass and setting --save_resume, when I load the model during the second pass the parameters displayed in the output went back to the default ones. this makes me wonder if the original hyperparameters are still being used in the subsequent passes?
    9 replies
    kornilcdima
    @kornilcdima

    Hi everyone, could someone suggest what is better to do in my case.
    I'm using CATs for predicting CTR (the balance is 4-10%) where action space belongs to deciding on the optimal price for buying a click. The price-space is non-stationary and changes over time. The reward function is: -1 - for win, 0 - for lose. I also tried to use probability (from LogReg) of win instead of -1.
    I failed to get any good pre-train policy on my historical data. Tried to use different exploration policies, different parameters. All the time I get a very low cumulative reward rate (see the picture).
    The distribution of predicted prices that I get from vw.predict is uniform and with a small bandwidth does not cover the whole range of prices.

    Is it a good idea to do pre-training then, since I only get the uniform distribution of prices?

    image.png
    image.png
    George Fei
    @georgefei
    Hi everyone, I noticed the average loss when using cb_explore is lower than that when using cb, on the same logged data with all the hyperparameters fixed. Is it because cb_explore uses the pmf to pick an action stochastically and cb always picks the action that is estimated to perform the best? If this is the case, when tuning the hyperparameters in backtesting using logged data, should I run in cb_explore mode since it's closer to the production setting?
    2 replies
    kornilcdima
    @kornilcdima

    Hey, I have a question about --cats_pdf output
    When I print pdf-value, that gives 2 corteges with chosen action range and exploit probability. The values inside are always constant. Is it normal behavior? I expected to see different values, i.e. different ranges and pdf_values. My version of python VW is '8.10.0'.

    If I use --cats_pdf, then pdf output is:

    1 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    2 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    3 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    4 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    5 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    6 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    7 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    8 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    9 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)

    When I use just --cats, I put values for learning in the following manner:

    prediction, pdf_value = vw.predict(f'ca | {f_2} {f_3} {f_4} {f_5} {f_6}')
    vw.learn(f'ca {price_pred}:{cost}:{pdf_value} |{f_2} {f_3} {f_4} {f_5} {f_6}')

    Model's parameters:

    min_value = 0
    max_value = 80
    bandwidth = 16
    
    vw = pyvw.vw(f"--cats_pdf {num_actions} --bandwidth {bandwidth} \
                 --min_value {min_value} --max_value {max_value} \
                 --dsjson --chain_hash --{exploration}")
    2 replies
    Max Pagels
    @maxpagels_twitter

    I have a silly IPS question. Let's say I have a uniform random logging policy that chooses between two actions and always receives a reward of 1 regardless of context. Evidence would suggest that no matter what policy I deploy afterwards, I would continue to get a reward of 1 per round.

    Not let's say I have a candidate policy P that happens to choose the same actions at each timestep, though not with any randomness/exploration.

    Based on this data, per round, the IPS estimate is 1/0.5 = 2, and since both policies agree each round, the average IPS over the history is also 2, when you would expect it to be one given that regardless of context or exploit/explore, that's the reward the logging policy saw each round. The candidate policy, if deployed, won't get to a reward of 2 per round, but rather 1.

    What assumption am I violating in this example? Is there some stationarity requirement? I thought the IPS estimator is a martingale.

    14 replies
    Max Pagels
    @maxpagels_twitter
    I'd add that with snips instead of ips, I get the expected 1.0.
    MochizukiShinichi
    @MochizukiShinichi
    Hello everyone, VW newbie here :) I'm trying to use vw contextual bandits implementation to solve for an optimization problem where the arm would be a product and feedback would be click/dismiss. Unlike optimizing for CTR where the cost is 0/1, I'm thinking of assigning a value representing the estimated monetary impact to each (arm, feedback) combo. For instance a click on product X would yield a reward $5. Are there any caveats I should be aware of when using non binary rewards? Thanks in advance!
    3 replies
    kornilcdima
    @kornilcdima

    @olgavrou is it possible to take a saved CATs model and change the flag --cats to --cats_pdf in order to get not only a continuous prediction but also ranges itself?

    From what I see, If I use saved model and switch the flag the model's output still is prediction and pdf_value. But I'd be good to get range buckets as well.

    2 replies