Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    George Fei
    @georgefei
    2.How do I interpret the output of --explore_eval? More specifically update count, violation count, and final multiplier (what variables do they correspond to in the algorithm on slide 9 of https://pdfs.semanticscholar.org/presentation/f2c3/d41ef70df24b68884a5c826f0a4b48f17095.pdf). Do I also look at the average loss to compare different exploration algo + hyperparameter combinations?

    3.In order to use -explore_eval I have to convert my data from cb format to cb_adf format since the cb format is not supported when using -explore_eval. For the example data with two arms below, are the two ways to represent the data equivalent?:

    2:10.02:0.5 | x0:0.47 x1:0.84 x2:0.29
    1:8.90:0.5 | x0:0.51 x1:0.65 x2:0.67

    shared | x0:0.47 x1:0.84 x2:0.29
    | a1
    0:10.02:0.5 | a2

    shared | x0:0.51 x1:0.65 x2:0.67
    0:8.90:0.5 | a1
    | a2

    Wes
    @wmelton
    Hello all - ive been evaluating Microsoft Personalizer for our company, which i have largely assumed is VW under the hood with MS specific tech/service written on top of it.
    My question is this - within a given namespace, does the order of features or their names matter? Im assuming yes, but the VW documentation out there doesnt make it super clear how to handle a situation where two given documents have the same keywords in them, but after tokenization, the keywords are not in the same order due to variance in the number of keywords found in each document. Appreciate guidance there.
    finally, i referenced Personalizer only because it sparked this train of thought largely because the documentation for it leverages only rhe JSON format of input data, but seems to neglect any instruction with regards to variation in keyword order if your features are keywords extracted from a document. Thanks!
    18 replies
    Max Pagels
    @maxpagels_twitter
    @georgefei did you already get an answer to your questions? I'd be very interested in them, too
    1 reply
    Particularly, I imagine lots of folks do evaluation by gridsearching learning with cb_type ips/dm/dr and choosing the one with the best reported loss. Isn't that wrong, especially considering dm is biased? --eval throws an error if you use DM.
    George Fei
    @georgefei
    related to my third question above about whether the same data in cb and cb_adf formats are going to yield the same result. Not sure about the vw implementation but in the contextual bandit bake-off paper, the reward estimation is formulated differently for each case:
    image.png
    49 replies
    Max Pagels
    @maxpagels_twitter
    I think there is a very clear need for a policy evaluation tutorial on vowpalwabbit.org. I'd be happy to write one, assuming someone can help answer questions as they arise, since I have a couple of outstanding ones myself. Would folks find this valuable?
    10 replies
    Max Pagels
    @maxpagels_twitter

    OPE PR: VowpalWabbit/vowpalwabbit.github.io#193

    @lalo @olgavrou et al, Note that I will need expert advice on this. There is a checklist that needs to be confirmed to absolute certainty or, if untrue, commented on to provide me with the correct interpretation.

    Max Pagels
    @maxpagels_twitter
    @pmineiro would be also be a very good additional reviewer
    olgavrou
    @olgavrou
    @maxpagels_twitter thanks for taking a stab at this, much appreciated! Will add reviewers to the PR
    Max Pagels
    @maxpagels_twitter
    @olgavrou no problem. I have a bunch more to come :)
    Travis Brady
    @travisbrady
    Anyone here have experience using CBs in VW with a "no op" arm that by definition can't generate a reward?
    For example, imagine a use case where we intend to potentially contact a user automatically. so arms are "email user", "send text message to user", "send push notification to user" or "do nothing". The first 3 options all have "click through" as the reward, but "do nothing" of course has no such obvious reward. Is there a standard way to handle this? All pointers much appreciated.
    Max Pagels
    @maxpagels_twitter
    In my view, doing nothing is a valid action but even doing nothing can be either good or bad, and per the CB problem setup it needs some reward. Perhaps CTR isn't the best reward metric to use? Can you find another signal that applies to all actions?
    4 replies
    Wenjuan Dou
    @darlwen
    @here for policy of contextual bandits, VW provides ips, dr, dm and mtr for now. For mtr, it uses linear model to optimize the policy. I wonder wether we can use a tree-based or DNN based model?
    Crystal Wang
    @cwang506
    Hi everyone! I’m currently using VWClassifier to predict binary labels (-1, 1) on some dummy dataset where y = sigmoid(X@w) for some random X and w. I am able to use pyvw to fit perfectly to the training dataset when I use VWClassifier without any regularization, but I’m noticing strange behavior once I add in regularization. For example, when I add in l1 regularization of 1e-3, all of my training and testing labels get pushed to 1, and the ROC_AUC score between the predicted and actual labels are 0.5 for both training and test. When compared to sklearn packages SGDClassifier and LogisticRegression, I get vastly different results—the labels do not get pushed to 1, and the ROC_AUC score are all >0.5 when I compare the predicted outcome to the actual outcome. Here is the code I'm running, and any help would be greatly appreciated! Thanks :)
    image.png
    Crystal Wang
    @cwang506
    image.png
    Дмитрий Корнильцев
    @kornilcdima:matrix.org
    [m]
    Hi everyone! Does someone have a working python example of new VW's feature "CATS, CATS pdf for Continuous Actions"?
    I'm really new to RL and this library. Can someone help me to understand how to process data with this new Bandit?
    My task is the following. I have prices and I need to find optimal price (not big not low), my reward is a click rate.
    Appreciate any help )
    Jack Gerrits
    @jackgerrits
    Olga created a tutorial Jupyter notebook which has not yet been merged but it is a great resource VowpalWabbit/jupyter-notebooks#6
    4 replies
    Jack Gerrits
    @jackgerrits
    What version of vw are you using?
    9 replies
    Chang Liu
    @changliu94
    Hi everyone! Does anyone know if there is any command/implementation in vw that can tackle non-stationary environment? Thanks in advance!
    1 reply
    Harsh Khilawala
    @HarshKhilawala_gitlab
    I am new here and want to get started contributing to VowpalWabbit. Can anyone please help me get started?
    3 replies
    Mónika Farsang
    @MoniFarsang
    Hi, does someone know whether the RL open source fest results are already out?
    Nishant Kumar
    @nishantkr18
    Yes they are now. Congratulations to everyone selected!
    1 reply
    daraya123
    @daraya123

    Hi all, I had a problem when using SquareCB algorithm to train contextual bandit model, especially when saving & loading it again.
    I trained and saved SquareCB model in this way (using the simulation setting as in https://vowpalwabbit.org/tutorials/cb_simulation.html):

    vw = pyvw.vw("--cb_explore_adf -q UA -f squarecb.model --save_resume --quiet --squarecb")
    num_iterations = 5000
    ctr = run_simulation(vw, num_iterations, users, times_of_day, actions, get_cost)
    plot_ctr(num_iterations, ctr)
    vw.finish()

    and then loaded the model :

    vw_loaded=pyvw.vw('--cb_explore_adf -q UA -i squarecb.model')
    num_iterations = 5000
    ctr = run_simulation(vw_loaded, num_iterations, users, times_of_day, actions, get_cost, do_learn=False)
    
    plot_ctr(num_iterations, ctr)
    print(ctr[-1])

    and the loaded model seems like doing a random exploration.
    Could anyone explain how to save and load this model correctly? Thanks in advance.

    11 replies
    daraya123
    @daraya123
    image.png
    George Fei
    @georgefei
    Hi all, if I use explore_eval to evaluate the reward estimation + exploration algo combination, is it fair for me to compare the explore_eval's output average loss with the realized average loss of the training data?
    10 replies
    Chang Liu
    @changliu94
    Can anyone here help me understand how the bagging algorithm does counterfactual learning from logged bandit data? From the bagging algorithm in the bake-off paper, we can see it reduces to an oracle where the probability of choosing an action a is decided by the proportion of the current policies that evaluate action a as optimial. So how will the probability in the logged data be used? I am perplexed here.
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys!
    I was reading Microsoft's Personalizer docs and wondering why changing the Reward wait time triggers retraining? I understand that there might be delay-related bias. I don't understand how retraining the model is any better than simply using the existing model but with an updated reward wait time? Because in my mind, offline training doesn't suffer from delay-related bias (because all data is there, ready for use).
    Does this make sense? What am I missing?
    What happens if the user decides to add/modify attributes? Is there retraining or simply a matter of changing the Rank input?
    Thanks!
    Max Pagels
    @maxpagels_twitter
    The key issue is this: say your wait time is one minute and the reward is a click. Now, let's say the click arrived 1,5 minutes after prediction. It's thus not in your training data and assumed to be whatever default reward you specified
    Now, if you change the wait time to 2 minutes, the training data must be recreated so as to include your click, which means the model too must be retrained otherwise you are training on different definitions of reward (old stuff has rewards calculated with 1 minute cutoff, newer with 2 minute cutoff). This leads to general weirdness
    If you add features, at least in VW-land, you don't need to retrain but just continue training on new data. So in personaliser I think it's just a matter of changing the rank input
    of course, if you add informative features later on, they will only be recorded for events after the change, but in an online system that doesn't really matter since it will correct itself over time
    Bernardo Favoreto
    @Favoreto_B_twitter
    Awesome, Max! That's exactly what I thought about in both scenarios. Because Personalizer saves data even after the reward wait time, we'd be able to create new data considering the updated wait time.
    Thanks!
    1 reply
    John
    @JohnLangford
    @changliu94 the probability of an action is passed to the base algorithm that bagging reduces to where it is used in the update rule.
    1 reply
    Marcos Passos
    @marcospassos
    Hey guys! Does anyone know if there is a way to bypass that first letter limitation regarding namespace names? We need to use arbitrary keys that may begin with the same letter
    1 reply
    Max Pagels
    @maxpagels_twitter
    OPE tutorial is up for those interested: https://vowpalwabbit.org/tutorials/off_policy_evaluation.html
    3 replies
    Chang Liu
    @changliu94
    Thank you, Max! It is very much appreciated that we have a tutorial on off-policy evaluation!
    Bernardo Favoreto
    @Favoreto_B_twitter
    How are arrays interpreted in VW? According to (https://docs.microsoft.com/en-us/azure/cognitive-services/personalizer/concepts-features#categorize-features-with-namespaces), we can use an array as a feature value, as long as it's a numeric array.
    I was wondering how does this gets interpreted by VW. The docs show an example of a feature called "grams" whose value is an array (e.g., [150, 300, 450]), but to me is still unclear what happens when we use feature values as arrays.
    Thanks!
    kornilcdima
    @kornilcdima

    Hey everybody. I've just started to use VW. And I'm solving a dynamic pricing problem where price is discrete action space (10 arms). Prices are cut on buckets, every bucket stores 10% of prices. My cost is CTR. My probability is constant 0.1 since I have 10 arms each of them appears in 10% of cases. My goal is to find optimal prices which lead to increasing CTR.
    I know that CATs is better for my case but I prefer not using it as the first attempt.

    I have the following questions questions:
    1). What is the main difference between --cb and -cb_explore. As I understood --cb_explore just gives probabilities and --cb doesn't. I’ve noticed that it was mentioned that --cb doesn't do exploration and --cb_explore does. Am I right at this point?
    2). VW requires the following format action: cost: probability. And probability here is nothing but pmf. Would it be right in my case just to set 0.1 for all cases.
    3). I do kind of pretraining on logged data (existing dataset) to learn policy with parameters: --cb_explore 10 cover 13. After that I use a pre-trained model with the flag -i. I get the output with probas and take the highest proba as predicted value. Will I be exploring in this case?

    Please forgive my naive questions and many thanks for answers in advance )

    20 replies
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hello everyone!
    I have a question concerning the use of Slates + CB and CCB + CB.
    I've come across the following presentation from Netflix (https://www.slideshare.net/FaisalZakariaSiddiqi/netflix-talk-at-ml-platform-meetup-sep-2019) and was wondering if they used Slates.
    Apparently, they do. However, I don't understand how we can use a single slate first to pick a title for a slot and then, at the same prediction, choose a thumbnail. That's why I believe they instead use Slates for title recommendation on multiple slots and CB for thumbnail selection afterward. Would that make sense?
    I believe that if the actions for other slots depend on the first slot's action (e.g., the option of thumbnails for a title depends on the title), Slates cannot be used.
    For CCB + CB, an example could be using CCB to order topics in a list and then CB to pick the written text for each topic.
    Is using Slates or CCB + CB reasonable? Is it very use-case-specific? I'm afraid I'm missing something here.
    Thanks!
    Max Pagels
    @maxpagels_twitter

    So as far as I know netflix actually does so that the possible combos of e.g title and image are predefined, and those form a single arm. Of course the amount of combos is massive, so I don't think they use all.

    There has to be some prefiltering going on since i suspect showing (title: crime dramas, show: top gear, picture: jurassic park) would lead to issues :). So I think that they aren't using slates as slates in VW are defined, merely a large action space where one action is one predefined combo of title, genre, picture and so on.

    I may also be wrong here

    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys, regarding CCB and Slates... what's the use of slot attributes? What sort of attributes can a slot have? Would love to hear some examples!
    Thanks
    2 replies
    kornilcdima
    @kornilcdima
    This message was deleted
    kornilcdima
    @kornilcdima

    Hey guys,
    Does anyone have an example of daemon style code for CATs? Right now I’m using a python wrapper which I took from Olga’s notebook example and It works fine. However, I have a subtle vision of how to launch it in daemon-style.
    Is it something like this?
    pre-training the model on historical data

    vw --cats 6 --bandwidth 0.5 --min_value 0 --max_value 3--epsilon 0.3 -d train.dat -f model.vw

    raising the ready model

    vw --cats 6 --bandwidth 0.5 --min_value 0 --max_value 3--epsilon 0.3 --save_resume --daemon --quiet --num_children 1 --port 8080 -i model.vw -f model.vw

    updating the model on new data

    vw --cats 6 --bandwidth 0.5 --min_value 0 --max_value 3--epsilon 0.3 --save_resume -i model.vw -d train.dat -f model.vw
    olgavrou
    @olgavrou
    hi @kornilcdima here is some documentation on how to use vw in deamon mode and it should work fine if you start vw with the appropriate cats arguments: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format#on-demand-model-saving
    you can also gather vw's predictions by passing in the cli argument: -p <predictions_file>