Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Max Pagels
    @maxpagels_twitter
    Anyone else have issues with PLT? Can't see any difference at all if I change kary_tree, lr
    2 replies
    CLI, 8.10.1 (git commit: 3887696)
    kornilcdima
    @kornilcdima
    Hey, I have 2 questions about CATs.
    1. What value should I put for pdf when I pre-train a model on historical data. It differs from discrete action space where I could set a pmf based on prior distribution of arms, in case of CATs the action space is continuous. I have 2 options in mind: 1-to use a constant, 2-to use pdf-value from vw.predict.
    2. I'd like to imitate Thomson Sampling (TS). Does it have sense to use --botstrap in order to imitate TS? According to this article it does have sense. https://arxiv.org/abs/1706.04687
    olgavrou
    @olgavrou
    @kornilcdima yes cats will expect the value that the pdf had at the action predicted, which is the prob value that you see from vw.predict (the action predicted and the pdf_value at that action) , so using that is the right way to go here if you are training on historical data
    George Fei
    @georgefei
    hey team, I have a very basic question regarding --save_resume, I noticed after setting a non-default learning rate, power_t, lambda and other hyperparameters in the first pass and setting --save_resume, when I load the model during the second pass the parameters displayed in the output went back to the default ones. this makes me wonder if the original hyperparameters are still being used in the subsequent passes?
    9 replies
    kornilcdima
    @kornilcdima

    Hi everyone, could someone suggest what is better to do in my case.
    I'm using CATs for predicting CTR (the balance is 4-10%) where action space belongs to deciding on the optimal price for buying a click. The price-space is non-stationary and changes over time. The reward function is: -1 - for win, 0 - for lose. I also tried to use probability (from LogReg) of win instead of -1.
    I failed to get any good pre-train policy on my historical data. Tried to use different exploration policies, different parameters. All the time I get a very low cumulative reward rate (see the picture).
    The distribution of predicted prices that I get from vw.predict is uniform and with a small bandwidth does not cover the whole range of prices.

    Is it a good idea to do pre-training then, since I only get the uniform distribution of prices?

    image.png
    image.png
    George Fei
    @georgefei
    Hi everyone, I noticed the average loss when using cb_explore is lower than that when using cb, on the same logged data with all the hyperparameters fixed. Is it because cb_explore uses the pmf to pick an action stochastically and cb always picks the action that is estimated to perform the best? If this is the case, when tuning the hyperparameters in backtesting using logged data, should I run in cb_explore mode since it's closer to the production setting?
    2 replies
    kornilcdima
    @kornilcdima

    Hey, I have a question about --cats_pdf output
    When I print pdf-value, that gives 2 corteges with chosen action range and exploit probability. The values inside are always constant. Is it normal behavior? I expected to see different values, i.e. different ranges and pdf_values. My version of python VW is '8.10.0'.

    If I use --cats_pdf, then pdf output is:

    1 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    2 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    3 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    4 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    5 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    6 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    7 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    8 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    9 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)

    When I use just --cats, I put values for learning in the following manner:

    prediction, pdf_value = vw.predict(f'ca | {f_2} {f_3} {f_4} {f_5} {f_6}')
    vw.learn(f'ca {price_pred}:{cost}:{pdf_value} |{f_2} {f_3} {f_4} {f_5} {f_6}')

    Model's parameters:

    min_value = 0
    max_value = 80
    bandwidth = 16
    
    vw = pyvw.vw(f"--cats_pdf {num_actions} --bandwidth {bandwidth} \
                 --min_value {min_value} --max_value {max_value} \
                 --dsjson --chain_hash --{exploration}")
    2 replies
    Max Pagels
    @maxpagels_twitter

    I have a silly IPS question. Let's say I have a uniform random logging policy that chooses between two actions and always receives a reward of 1 regardless of context. Evidence would suggest that no matter what policy I deploy afterwards, I would continue to get a reward of 1 per round.

    Not let's say I have a candidate policy P that happens to choose the same actions at each timestep, though not with any randomness/exploration.

    Based on this data, per round, the IPS estimate is 1/0.5 = 2, and since both policies agree each round, the average IPS over the history is also 2, when you would expect it to be one given that regardless of context or exploit/explore, that's the reward the logging policy saw each round. The candidate policy, if deployed, won't get to a reward of 2 per round, but rather 1.

    What assumption am I violating in this example? Is there some stationarity requirement? I thought the IPS estimator is a martingale.

    14 replies
    Max Pagels
    @maxpagels_twitter
    I'd add that with snips instead of ips, I get the expected 1.0.
    MochizukiShinichi
    @MochizukiShinichi
    Hello everyone, VW newbie here :) I'm trying to use vw contextual bandits implementation to solve for an optimization problem where the arm would be a product and feedback would be click/dismiss. Unlike optimizing for CTR where the cost is 0/1, I'm thinking of assigning a value representing the estimated monetary impact to each (arm, feedback) combo. For instance a click on product X would yield a reward $5. Are there any caveats I should be aware of when using non binary rewards? Thanks in advance!
    3 replies
    kornilcdima
    @kornilcdima

    @olgavrou is it possible to take a saved CATs model and change the flag --cats to --cats_pdf in order to get not only a continuous prediction but also ranges itself?

    From what I see, If I use saved model and switch the flag the model's output still is prediction and pdf_value. But I'd be good to get range buckets as well.

    2 replies
    George Fei
    @georgefei

    Hi everyone, I noticed the average loss when using cb_explore is lower than that when using cb, on the same logged data with all the hyperparameters fixed. Is it because cb_explore uses the pmf to pick an action stochastically and cb always picks the action that is estimated to perform the best? If this is the case, when tuning the hyperparameters in backtesting using logged data, should I run in cb_explore mode since it's closer to the production setting?

    could someone quickly confirm if my understanding is correct? I also had a typo in the original question; the average loss when using cb_explore is higher than that when using cb

    Marco Rossi
    @marco-rossi29

    Hi everyone, I noticed the average loss when using cb_explore is lower than that when using cb, on the same logged data with all the hyperparameters fixed. Is it because cb_explore uses the pmf to pick an action stochastically and cb always picks the action that is estimated to perform the best? If this is the case, when tuning the hyperparameters in backtesting using logged data, should I run in cb_explore mode since it's closer to the production setting?

    could someone quickly confirm if my understanding is correct? I also had a typo in the original question; the average loss when using cb_explore is higher than that when using cb

    answered in the original question thread

    Ryan Angi
    @rangi513

    @pmineiro I watched your presentation on Distributionally Robust Optimization from December at NeurIPS and it was really well done. One question I have has to do with your first point on why this works for —cb_adf (offline optimization) but not —cb_explore_adf (online optimization). I similarly see loss improvements using this offline with data collected online from Policy A to train another policy - policy B (with a slight increase in the constant learning rate). However, I'm trying to rationalize why this would be a bad idea to add --cb_dro to my online policy that is sequentially trained with minibatches with --save_resume and --cb_explore_adf (epsilon greedy).

    Will this not work well for me because the (tau) exponentiatedly weighted averages of the sufficient statistics will no longer be able to keep track of what time t it is at? Or is it some other reason?

    Would the better way to think about using this feature be: use —cb_dro offline to discover the best hyperparameters to use, and then use those hyperparameters in the online setting? My hope is to use this almost as a regularization technique if I have a lot of features to improve online learning, but I would love some guidance if I have some fundamental misunderstanding on this feature and I should just be using -l1 regularization online instead.

    1 reply
    Max Pagels
    @maxpagels_twitter

    A feature request (or actually two) came to mind and I'm wondering if there is a) a need for it and b) how technically challenging it is to implement.

    The first is that the CLI average loss would be less confusing to newcomers if it stated if the loss is a PV or holdout loss (the little h might not be apparent) and, more importantly, what exactly the reported loss is (e.g 0/1, rmse, etc.)

    The second is being able to use some --cb_type but report a different loss. E.g. train with DR but report IPS. I guess this is more tricky to implement but for consistency in policy evaluation, it would be nice.

    Thoughts? Does anyone else think these might be improvements to the experience?

    Jacob Alber
    @lokitoth
    Hey @maxpagels_twitter, it seems like this issue covers the request for using a different loss function in reporting. Does that match the second part of the request? VowpalWabbit/vowpal_wabbit#2222
    1 reply
    André Monteiro
    @drelum
    image.png
    Hi everyone. The issue VowpalWabbit/vowpal_wabbit#2943 is marked as fixed but the error is still present in version 8.10.2. Anyone else facing the same problem?
    image.png
    olgavrou
    @olgavrou
    @drelum 8.10.2 and 8.10.1 were patch releases and solved very specific things, they did not include the bug fix from that issue unfortunately. If you use one of the latest python wheels from master then the problem should not persist: see https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Python#bleeding-edge-latest-commit-on-master
    Sean Pinkney
    @spinkney

    I'm struggling with how to exactly input this data into vowpal wabbit cb

    This is one example. I have the machine-session as a name. Then I have a numeric feature which should be split up as a vector (vector 1231242), "clues" which have associated confidences between 0-1 associated with them, and the action (ie demos with K classes). In this case, there are 3 actions. Ideally, the algorithm would choose action 1 as there are 2 clues each with high confidence. The outcome should be only in 1 class.

        machine session  vector    clues        confidences        demos
     1:       1       1 1231242    {1, 2, 5}    {1.0, 1.0, 0.8}    {1, 1, 2}

    I don't have probabilities associated with the actions and I'm not sure how this would look. In one sense the https://vowpalwabbit.org/tutorials/cb_simulation.html shows something that I'd like to try as

    shared | User machine=1 session=1
    :0.0:|Action demo=1 |Vector :1 :2 :3 :1 :2 :4 :2
    :0.0:|Action demo=1 |Vector :1 :2 :3 :1 :2 :4 :2
    :0.2:|Action demo=2 |Vector :1 :2 :3 :1 :2 :4 :2

    But maybe I should just do the multiclass as but this leaves out that there are 2 clues for 1

    1:0 2:0.2 |Vector :1 :2 :3 :1 :2 :4 :2
    4 replies
    Ryan Angi
    @rangi513
    In Contextual Bandits we care about minimizing regret (maximizing reward) over time. Generally OPE methods and progressive validation loss is helpful in determining the average performance of a policy offline. Do we ever care about measuring how accurate (RMSE or otherwise) the greedy linear model is underneath the policy at estimating cost against a test set? If I did care about measuring the performance of the cost sensitive classifier underneath the policy is that something I could extract from a VW cb_explore_adf policy or do I need to train a new regression from scratch in VW with the same parameters?
    2 replies
    kornilcdima
    @kornilcdima
    Is it possible somehow to change the posterior distribution for a chosen context? As MABT selected an optimal arm many times, the variance of posterior was decreased and now it is not able anymore to choose another arm which became optimal for a chosen context.
    1 reply
    Jack Gerrits
    @jackgerrits
    Just a quick announcement/FYI, we're using issues as a way to communicate and discuss deprecations and removals for VW. Take a look at the 'deprecation' (there's just two) tag and if it is something you have an opinion on then please feel free to comment https://github.com/VowpalWabbit/vowpal_wabbit/issues?q=is%3Aissue+is%3Aopen+label%3ADeprecation We're hoping this is a reasonable way to communicate changes to allow us to make progress while not adversely affecting anyone
    MochizukiShinichi
    @MochizukiShinichi
    Hey folks, could anyone please point me to some resources I can read on algorithm details of --cb_adf implementation in VowpalWabbit?
    1 reply
    K Krishna Chaitanya
    @kkchaitu27
    Hi Everyone, I have a doubt regarding action probabilities in input format of vowpalwabbit for contextual bandit. In the wiki, it is said that the input format must be action:cost:probability | features . what is probability here, is it probability for the action to get a reward/cost or something else. I read somewhere that it is the probability of exploration for that action, what does it mean?
    Adam Stepan
    @AdamStepan

    hello, I am trying to train vowpal model using C++ API using this piece of code:

        vw* vw = VW::initialize("-f train1.vw --progress 1");
        {
            ezexample ex(vw, false);
    
            ex.set_label("1");
            ex.addf('a', "a", 0.0);
            ex.addf('a', "b", 1.0);
            ex.addf('a', "c", 2.0);
            ex.train();
            ex.finish();
        }
        {
            ezexample ex1(vw, false);
    
            ex1.set_label("0");
            ex1.addf('a', "a", 2.0);
            ex1.addf('a', "b", 1.0);
            ex1.addf('a', "c", 0.0);
            ex1.train();
            ex1.finish();
        }
    
        VW::finish(*vw);

    this snippet generates the model, but the number of examples and number of features is 0, am I doing something wrong? I also tried to use example instead of ezexample and the result was the same and in either case, I did not see a progress log...

    6 replies
    Max Pagels
    @maxpagels_twitter
    @kkchaitu27 contextual bandits have exploration, ie there should always be a nonzero probability of choosing some action. the reason for this is to try out different actions to learn what works and what doesn’t. this probability is the one mentioned in the docs. it’s value depends on the exploration algorithm, for epsilon greedy with two actions and 10 percent exploration the best action is chosen with prob .95 and the other with .05. if you use cb_explore when collecting data, vw calculates these probabilities for you
    K Krishna Chaitanya
    @kkchaitu27
    @maxpagels_twitter Thanks for your response, how do I compute probability if I have historic data? Is it equal to number of times that action has been chosen/total number of times the context has appeared?
    1 reply
    Ryan Angi
    @rangi513

    I'm happy to turn this into a github issue, but want to make sure I'm not attempting some unintended behavior first.

    I am attempting to do multiple passes over a cb_adf dataset to hopefully improve the quality of my q function. I'm thinking of trying an offline bandit using the whole dataset and multiple passes instead of online with iterative updates. However, I get the following error after the first pass:

    libc++abi.dylib: terminating with uncaught exception of type VW::vw_exception: cb_adf: badly formatted example, only one line can have a cost
    [1]    90720 abort      vw --cb_adf --passes 2 -c -d train.dat

    Here is my command and dataset for reproducibility:
    vw --cb_adf --passes 2 -c -d train.dat

    train.dat

    shared | a:1 b:0.5
    0:0.1:0.75 | a:0.5 b:1 c:2
    | a:1 c:3
    
    shared | s_1 s_2
    0:1.0:0.5 | a:1 b:1 c:1
    | a:0.5 b:2 c:1

    I'm using version 8.10.1. I found this SO post and VowpalWabbit/vowpal_wabbit@431c270 by @jackgerrits that maybe was supposed to fix this but also could be unrelated.

    Are multiple passes not supported for --cb_adf? If so, maybe some better error messaging might be useful here?

    2 replies
    K Krishna Chaitanya
    @kkchaitu27

    This is a sample dataset I created

    1:1:1.0 2:2 3:3 4:4 | a b c
    1:1 2:2:1.0 3:3 4:4 | a b c
    1:1 2:2 3:3:1.0 4:4 | a b c
    1:1 2:2 3:3 4:4:1.0 | a b c
    1:1 2:2:0.7 3:3 4:4 | d e f

    when I do

    vw -d sampledata.vw --cb 4
    8 replies
    I get
    Num weight bits = 18
    learning rate = 0.5
    initial_t = 0
    power_t = 0.5
    using no cache
    Reading datafile = sampledata.vw
    num sources = 1
    Enabled reductions: gd, scorer, csoaa_ldf, cb_adf, shared_feature_merger, cb_to_cbadf
    average  since         example        example  current  current  current
    loss     last          counter         weight    label  predict features
    [critical] vw (cb_adf.cc:279): cb_adf: badly formatted example, only one cost can be known.
    Why are these reductions csoaa_ldf, cb_adf, shared_feature_merger, cb_to_cbadf
    enabled when I just do --cb actions?
    K Krishna Chaitanya
    @kkchaitu27
    Hi All, I want to deploy vowpalwabbit model for online learning. I am using historical data to warmstart model. I want to use that trained model and start doing online serving and learning. I can see mmlspark provides functionality to serve vw model but how do I train vw model on real time? Are there any resources for deploying vw contextual bandit model in an online learning and serving fashion?
    1 reply
    MochizukiShinichi
    @MochizukiShinichi
    Hello folks, I'm reading VowpalWabbit/vowpal_wabbit#1306 on removing arms in adf format data but I don't seem to have found a solution in the thread. Could anyone please let me how to format multiline codes to indicate some arms are no longer eligible?
    3 replies
    Owain Steer
    @0wainSteer_twitter
    Hey everyone, I'm currently following the OPE tutorial with both IPS and DR estimators on my own data. I'm finding that average loss while using DR changes from positive to negative at times when using the default 'squared' loss function, but not when using the 'classic' method of squared loss. I was assuming this is because of the importance weight aware updates skewing results but don't know how to interpret the negative loss or if this should be happening at all? Would appreciate any insight into this, thanks!
    George Fei
    @georgefei
    Hi all, I have a quick question. for cb_explore usage like following: vw --cb_explore 2 --cover 10 -d train.txt -l 0.1 --power_t 0 --save_resume vw.model --l1 1e-08; since --power_t is set to 0 and the learning rates don't decay, whether or not having--save_resume makes no difference to the model performance?
    10 replies
    Rohit Choudhary
    @tanquerey
    Should I use a Compute Optimized or Memory Optimized machine to increased training speed ? Or it doesn’t matter ?
    1 reply