Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Raphael Ottoni
    @raphaottoni
    reward = Arm_value * Gaussian Sample
    vw = pyvw.vw("--cb_explore_adf -q :: --epision 0.2")
    Raphael Ottoni
    @raphaottoni
    Screen Shot 2021-02-22 at 19.39.41.png
    Above is a graph of this setup, I really dont know why the agent chosen the most expensive arm!
    if i simple divide the rewards by 100 and run the very same experiment:
    Screen Shot 2021-02-22 at 19.40.32.png
    this thing bugs me
    =(
    I forgot to mention: reward is actually -1 X arm_value X Gaussian Sample
    Max Pagels
    @maxpagels_twitter
    @raphaottoni could you provide a github gist of your data?
    Raphael Ottoni
    @raphaottoni
    there is no data training data... just a "simulator" which is a object that would return a sample from those curves regarding the arm_id:
    { curve_type: "Gaussian", curve_id: "arm_1", mean: 20.0, std: 0.0},
    { curve_type: "Gaussian", curve_id: "arm_2", mean: 5.0, std: 0.0},
    { curve_type: "Gaussian", curve_id: "arm_3", mean: 4.0, std: 0.0}
    {"399": "arm_1", "499": "arm_2", "599": "arm_3"}
    each step VW chooses a ARM, I sample from this curve and multiply the result by the Arm's Price. Then I change the signal so it would be reward instead of cost and them fit it to the model...
    Raphael Ottoni
    @raphaottoni
    The problem appears to be solved if we apply a log function upon the reward.
    Max Pagels
    @maxpagels_twitter

    there is no data training data... just a "simulator" which is a object that would return a sample from those curves regarding the arm_id:

    if you have a simulator, you are training on some VW data somewhere. could you provide that dataset as a gist?

    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys, I was checking the Slates formulation out of curiosity, and it got me thinking. I could swear streaming services like Netflix used slates for recommendations. I know they personalize both the title recommendation and the thumbnail for each title. Thus, it seemed like the perfect use case for Slates (here, the title would be a slot and the image another slot): there is a single global reward (play or not), and the action set is disjointed.
    However, when trying to visualize how this would work in VW, I noticed that it probably wouldn't. What made me think this is that Slates predicts for all slots at once, and therefore there is no way we could select first the title, then pre-filter the possible thumbnails for that tile, and then make a prediction for the thumbnail slot.
    Am I missing something here? What are some use-cases of Slates for personalization using VW? The only one that comes into mind is "whole page optimization".
    Thanks!
    2 replies
    Jui Pradhan
    @JuiP
    Hi everyone, I was looking at the estimators repository issue: VowpalWabbit/estimators#1, we already have an implementation of ips estimator in Python. My question is why is "convert current IPS estimator to Python" mentioned as a Goal for this project? Can someone please clarify?
    2 replies
    pushpendre
    @pushpendre

    Hi I was wondering if I could get a pointer to the implementation of the --cb k --cb_type dr in the source code? Basically I am trying to understand the parameters that are learnt at the end of off-policy CB training in VW. E.g. I did

    vw --cb 3 --cb_type ips  -f cb.model -d train.txt --invert_hash readable_ips.model
    vw --cb 3 --cb_type dm  -f cb.model -d train.txt --invert_hash readable_dm.model
    vw --cb 3 --cb_type dr  -f cb.model -d train.txt --invert_hash readable_dr.model

    and the dr model obviously contains parameters equal to ips+ dm but I want to know exactly what is the linear regression formula that is being implemented in dr.

    3 replies
    CP500
    @CP500
    Hi everyone, Just a newbie question on CATS. Does it give you a PMF when you call predict?
    vw = pyvw.vw("––cats_pdf 7 –bandwidth 0.1 –min_value 0 –max_value 1")
    ex = vw.parse('ca | c1:0.5 c2:1.3', labelType=8)
    vw.predict(ex)
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys, I would like to know if anyone found an appropriate way of calculating feature importance after training a model?
    I tried using sklearn/eli5 permutation methods but neither properly worked.
    Then, I decided to code my own, where I first train a model, then do permutation importance in a held out set. I'm a bit concerned as to whether the results are significant, mainly because of all interactions created on the fly with VW. I should mention I am aware of the multilinearity/correlation problem, and this is not my biggest concern.
    Does it even make sense to calculate feature importance in VW? (I assume so because this was one of the topics from the VW presentation at: https://slideslive.com/38942331/vowpal-wabbit
    Thanks!
    olgavrou
    @olgavrou
    @CP500 cats pdf should give you a pdf (probability density function) and not a pmf (probability mass function) as cats is predicting in a continuous action space. PDF is in the form of (left:right:pdf_value) triples so you could check that the pdf integrates to 1 by doing (left - right) pdf_value + (left - right) pdf_value for all the returned triples
    7 replies
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hello everyone!
    Does anyone know where (and if) I can find the notebook for the Estimators library? I would like to use this lib, but there's not much documentation/examples on how to do so.
    Also, does it make sense to use this lib for CCBs?
    Thanks!
    Jack Gerrits
    @jackgerrits
    @Favoreto_B_twitter that repo is very much still a work in progress - so docs/examples are not there yet unfortunately. For CCB the approach that has been taken so far is to do CFE on the first slot only - so in this context I think it does make sense to use it. But it would need some adapting and I am not positive here.
    9 replies
    Max Pagels
    @maxpagels_twitter
    Is there an estimate on when the pypi version of pyvw will have CATS support? 8.9.0 doesn't support CATS labels
    Max Pagels
    @maxpagels_twitter
    I'm fiddling around with CATs, and have a simple setup with a fixed context. Per round, I ask for an action (range 0-100) and calculate a cost that is zero at 50, otherwise quadratically the absolute distance from 50 in either direction. If tried gridsearching a whole mess of bandwith, epsilon and learning rate values, but the learning is just all over the place. I would have expected the system to converge to an optimal prediction of 50.0 per round pretty easily since the context is always fixed. Instead, it either bounces around or gets stuck on some non-optimal values around 40. Any tips?
    olgavrou
    @olgavrou
    Hi @maxpagels_twitter what is the parameter you pass to --cats? have you experimented with that at all? For cats I would try different combinations of number of discrete actions used by the algorithm (passed in to the --cats arg) and bandwidths (bandwidth being a property of the continuous range). e.g. I would try a grid of num_actions [8, 16, 32, 64, 128, 256, 1024] and e.g. bandwidths [1, 2, 4, 6, 8, 10, 14, 20]. For different number of discrete actions you might need more data for CATS to converge to something sensible. CATS label support in pyvw should be available in the next release (coming soon-ish, we don't want to wait another year for the next vw release). Let me know if you get better results from CATS or not :)
    Max Pagels
    @maxpagels_twitter
    I tried gridsearching a whole mess of options, including a bunch of action counts, and can get relatively close to an optimum, but the hyperparams seem to be super important to get just right or the learning is way off. But I'll experiment further and report back
    3 replies
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hello guys!
    Can someone help me understand why propensities scores are important when training a CB?
    I've been thinking about this lately and just couldn't wrap my head around a good explanation...
    Let's take the epsilon-greedy, for example. When we train a CB model with epsilon-greedy, the pmf output is always the same (just the indexes change). This makes me assume that propensities scores aren't supposed to teach the CB how to output probabilities. Moreover, I believe they are used for "importance weighting", i.e., prob(new_policy)/prob(logging_policy), but isn't this only for when we use IPS? I think I'm missing something quite obvious here...
    Also, when we offline train a new model using logged CB data, how is the new CB able to achieve better performance than the logging policy? I mean, it's an excellent thing, but I would like to understand how that is possible.
    Thanks!
    George Fei
    @georgefei
    Hi all, I have a few questions related to contextual bandit evaluation:
    1.How do I compare the performance of different policies’ decisions using --eval? Do I look at the average loss in the output? If the costs in the input data are all negative and a lower cost is better, does a lower average loss mean one policy is better? What does average loss represent?
    2.How do I interpret the output of --explore_eval? More specifically update count, violation count, and final multiplier (what variables do they correspond to in the algorithm on slide 9 of https://pdfs.semanticscholar.org/presentation/f2c3/d41ef70df24b68884a5c826f0a4b48f17095.pdf). Do I also look at the average loss to compare different exploration algo + hyperparameter combinations?

    3.In order to use -explore_eval I have to convert my data from cb format to cb_adf format since the cb format is not supported when using -explore_eval. For the example data with two arms below, are the two ways to represent the data equivalent?:

    2:10.02:0.5 | x0:0.47 x1:0.84 x2:0.29
    1:8.90:0.5 | x0:0.51 x1:0.65 x2:0.67

    shared | x0:0.47 x1:0.84 x2:0.29
    | a1
    0:10.02:0.5 | a2

    shared | x0:0.51 x1:0.65 x2:0.67
    0:8.90:0.5 | a1
    | a2

    Wes
    @wmelton
    Hello all - ive been evaluating Microsoft Personalizer for our company, which i have largely assumed is VW under the hood with MS specific tech/service written on top of it.
    My question is this - within a given namespace, does the order of features or their names matter? Im assuming yes, but the VW documentation out there doesnt make it super clear how to handle a situation where two given documents have the same keywords in them, but after tokenization, the keywords are not in the same order due to variance in the number of keywords found in each document. Appreciate guidance there.
    finally, i referenced Personalizer only because it sparked this train of thought largely because the documentation for it leverages only rhe JSON format of input data, but seems to neglect any instruction with regards to variation in keyword order if your features are keywords extracted from a document. Thanks!
    18 replies
    Max Pagels
    @maxpagels_twitter
    @georgefei did you already get an answer to your questions? I'd be very interested in them, too
    1 reply
    Particularly, I imagine lots of folks do evaluation by gridsearching learning with cb_type ips/dm/dr and choosing the one with the best reported loss. Isn't that wrong, especially considering dm is biased? --eval throws an error if you use DM.
    George Fei
    @georgefei
    related to my third question above about whether the same data in cb and cb_adf formats are going to yield the same result. Not sure about the vw implementation but in the contextual bandit bake-off paper, the reward estimation is formulated differently for each case:
    image.png
    49 replies
    Max Pagels
    @maxpagels_twitter
    I think there is a very clear need for a policy evaluation tutorial on vowpalwabbit.org. I'd be happy to write one, assuming someone can help answer questions as they arise, since I have a couple of outstanding ones myself. Would folks find this valuable?
    10 replies
    Max Pagels
    @maxpagels_twitter

    OPE PR: VowpalWabbit/vowpalwabbit.github.io#193

    @lalo @olgavrou et al, Note that I will need expert advice on this. There is a checklist that needs to be confirmed to absolute certainty or, if untrue, commented on to provide me with the correct interpretation.

    Max Pagels
    @maxpagels_twitter
    @pmineiro would be also be a very good additional reviewer
    olgavrou
    @olgavrou
    @maxpagels_twitter thanks for taking a stab at this, much appreciated! Will add reviewers to the PR
    Max Pagels
    @maxpagels_twitter
    @olgavrou no problem. I have a bunch more to come :)
    Travis Brady
    @travisbrady
    Anyone here have experience using CBs in VW with a "no op" arm that by definition can't generate a reward?
    For example, imagine a use case where we intend to potentially contact a user automatically. so arms are "email user", "send text message to user", "send push notification to user" or "do nothing". The first 3 options all have "click through" as the reward, but "do nothing" of course has no such obvious reward. Is there a standard way to handle this? All pointers much appreciated.
    Max Pagels
    @maxpagels_twitter
    In my view, doing nothing is a valid action but even doing nothing can be either good or bad, and per the CB problem setup it needs some reward. Perhaps CTR isn't the best reward metric to use? Can you find another signal that applies to all actions?
    4 replies
    Wenjuan Dou
    @darlwen
    @here for policy of contextual bandits, VW provides ips, dr, dm and mtr for now. For mtr, it uses linear model to optimize the policy. I wonder wether we can use a tree-based or DNN based model?
    Crystal Wang
    @cwang506
    Hi everyone! I’m currently using VWClassifier to predict binary labels (-1, 1) on some dummy dataset where y = sigmoid(X@w) for some random X and w. I am able to use pyvw to fit perfectly to the training dataset when I use VWClassifier without any regularization, but I’m noticing strange behavior once I add in regularization. For example, when I add in l1 regularization of 1e-3, all of my training and testing labels get pushed to 1, and the ROC_AUC score between the predicted and actual labels are 0.5 for both training and test. When compared to sklearn packages SGDClassifier and LogisticRegression, I get vastly different results—the labels do not get pushed to 1, and the ROC_AUC score are all >0.5 when I compare the predicted outcome to the actual outcome. Here is the code I'm running, and any help would be greatly appreciated! Thanks :)
    image.png
    Crystal Wang
    @cwang506
    image.png