Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    kornilcdima
    @kornilcdima
    image.png
    image.png
    George Fei
    @georgefei
    Hi everyone, I noticed the average loss when using cb_explore is lower than that when using cb, on the same logged data with all the hyperparameters fixed. Is it because cb_explore uses the pmf to pick an action stochastically and cb always picks the action that is estimated to perform the best? If this is the case, when tuning the hyperparameters in backtesting using logged data, should I run in cb_explore mode since it's closer to the production setting?
    2 replies
    kornilcdima
    @kornilcdima

    Hey, I have a question about --cats_pdf output
    When I print pdf-value, that gives 2 corteges with chosen action range and exploit probability. The values inside are always constant. Is it normal behavior? I expected to see different values, i.e. different ranges and pdf_values. My version of python VW is '8.10.0'.

    If I use --cats_pdf, then pdf output is:

    1 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    2 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    3 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    4 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    5 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    6 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    7 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    8 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)
    9 (0.0, 17.923999786376953, 0.04280378296971321) (17.923999786376953, 80.0, 0.003750000149011612)

    When I use just --cats, I put values for learning in the following manner:

    prediction, pdf_value = vw.predict(f'ca | {f_2} {f_3} {f_4} {f_5} {f_6}')
    vw.learn(f'ca {price_pred}:{cost}:{pdf_value} |{f_2} {f_3} {f_4} {f_5} {f_6}')

    Model's parameters:

    min_value = 0
    max_value = 80
    bandwidth = 16
    
    vw = pyvw.vw(f"--cats_pdf {num_actions} --bandwidth {bandwidth} \
                 --min_value {min_value} --max_value {max_value} \
                 --dsjson --chain_hash --{exploration}")
    2 replies
    Max Pagels
    @maxpagels_twitter

    I have a silly IPS question. Let's say I have a uniform random logging policy that chooses between two actions and always receives a reward of 1 regardless of context. Evidence would suggest that no matter what policy I deploy afterwards, I would continue to get a reward of 1 per round.

    Not let's say I have a candidate policy P that happens to choose the same actions at each timestep, though not with any randomness/exploration.

    Based on this data, per round, the IPS estimate is 1/0.5 = 2, and since both policies agree each round, the average IPS over the history is also 2, when you would expect it to be one given that regardless of context or exploit/explore, that's the reward the logging policy saw each round. The candidate policy, if deployed, won't get to a reward of 2 per round, but rather 1.

    What assumption am I violating in this example? Is there some stationarity requirement? I thought the IPS estimator is a martingale.

    14 replies
    Max Pagels
    @maxpagels_twitter
    I'd add that with snips instead of ips, I get the expected 1.0.
    MochizukiShinichi
    @MochizukiShinichi
    Hello everyone, VW newbie here :) I'm trying to use vw contextual bandits implementation to solve for an optimization problem where the arm would be a product and feedback would be click/dismiss. Unlike optimizing for CTR where the cost is 0/1, I'm thinking of assigning a value representing the estimated monetary impact to each (arm, feedback) combo. For instance a click on product X would yield a reward $5. Are there any caveats I should be aware of when using non binary rewards? Thanks in advance!
    3 replies
    kornilcdima
    @kornilcdima

    @olgavrou is it possible to take a saved CATs model and change the flag --cats to --cats_pdf in order to get not only a continuous prediction but also ranges itself?

    From what I see, If I use saved model and switch the flag the model's output still is prediction and pdf_value. But I'd be good to get range buckets as well.

    2 replies
    George Fei
    @georgefei

    Hi everyone, I noticed the average loss when using cb_explore is lower than that when using cb, on the same logged data with all the hyperparameters fixed. Is it because cb_explore uses the pmf to pick an action stochastically and cb always picks the action that is estimated to perform the best? If this is the case, when tuning the hyperparameters in backtesting using logged data, should I run in cb_explore mode since it's closer to the production setting?

    could someone quickly confirm if my understanding is correct? I also had a typo in the original question; the average loss when using cb_explore is higher than that when using cb

    Marco Rossi
    @marco-rossi29

    Hi everyone, I noticed the average loss when using cb_explore is lower than that when using cb, on the same logged data with all the hyperparameters fixed. Is it because cb_explore uses the pmf to pick an action stochastically and cb always picks the action that is estimated to perform the best? If this is the case, when tuning the hyperparameters in backtesting using logged data, should I run in cb_explore mode since it's closer to the production setting?

    could someone quickly confirm if my understanding is correct? I also had a typo in the original question; the average loss when using cb_explore is higher than that when using cb

    answered in the original question thread

    Ryan Angi
    @rangi513

    @pmineiro I watched your presentation on Distributionally Robust Optimization from December at NeurIPS and it was really well done. One question I have has to do with your first point on why this works for —cb_adf (offline optimization) but not —cb_explore_adf (online optimization). I similarly see loss improvements using this offline with data collected online from Policy A to train another policy - policy B (with a slight increase in the constant learning rate). However, I'm trying to rationalize why this would be a bad idea to add --cb_dro to my online policy that is sequentially trained with minibatches with --save_resume and --cb_explore_adf (epsilon greedy).

    Will this not work well for me because the (tau) exponentiatedly weighted averages of the sufficient statistics will no longer be able to keep track of what time t it is at? Or is it some other reason?

    Would the better way to think about using this feature be: use —cb_dro offline to discover the best hyperparameters to use, and then use those hyperparameters in the online setting? My hope is to use this almost as a regularization technique if I have a lot of features to improve online learning, but I would love some guidance if I have some fundamental misunderstanding on this feature and I should just be using -l1 regularization online instead.

    1 reply
    Max Pagels
    @maxpagels_twitter

    A feature request (or actually two) came to mind and I'm wondering if there is a) a need for it and b) how technically challenging it is to implement.

    The first is that the CLI average loss would be less confusing to newcomers if it stated if the loss is a PV or holdout loss (the little h might not be apparent) and, more importantly, what exactly the reported loss is (e.g 0/1, rmse, etc.)

    The second is being able to use some --cb_type but report a different loss. E.g. train with DR but report IPS. I guess this is more tricky to implement but for consistency in policy evaluation, it would be nice.

    Thoughts? Does anyone else think these might be improvements to the experience?

    Jacob Alber
    @lokitoth
    Hey @maxpagels_twitter, it seems like this issue covers the request for using a different loss function in reporting. Does that match the second part of the request? VowpalWabbit/vowpal_wabbit#2222
    1 reply
    André Monteiro
    @drelum
    image.png
    Hi everyone. The issue VowpalWabbit/vowpal_wabbit#2943 is marked as fixed but the error is still present in version 8.10.2. Anyone else facing the same problem?
    image.png
    olgavrou
    @olgavrou
    @drelum 8.10.2 and 8.10.1 were patch releases and solved very specific things, they did not include the bug fix from that issue unfortunately. If you use one of the latest python wheels from master then the problem should not persist: see https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Python#bleeding-edge-latest-commit-on-master
    Sean Pinkney
    @spinkney

    I'm struggling with how to exactly input this data into vowpal wabbit cb

    This is one example. I have the machine-session as a name. Then I have a numeric feature which should be split up as a vector (vector 1231242), "clues" which have associated confidences between 0-1 associated with them, and the action (ie demos with K classes). In this case, there are 3 actions. Ideally, the algorithm would choose action 1 as there are 2 clues each with high confidence. The outcome should be only in 1 class.

        machine session  vector    clues        confidences        demos
     1:       1       1 1231242    {1, 2, 5}    {1.0, 1.0, 0.8}    {1, 1, 2}

    I don't have probabilities associated with the actions and I'm not sure how this would look. In one sense the https://vowpalwabbit.org/tutorials/cb_simulation.html shows something that I'd like to try as

    shared | User machine=1 session=1
    :0.0:|Action demo=1 |Vector :1 :2 :3 :1 :2 :4 :2
    :0.0:|Action demo=1 |Vector :1 :2 :3 :1 :2 :4 :2
    :0.2:|Action demo=2 |Vector :1 :2 :3 :1 :2 :4 :2

    But maybe I should just do the multiclass as but this leaves out that there are 2 clues for 1

    1:0 2:0.2 |Vector :1 :2 :3 :1 :2 :4 :2
    4 replies
    Ryan Angi
    @rangi513
    In Contextual Bandits we care about minimizing regret (maximizing reward) over time. Generally OPE methods and progressive validation loss is helpful in determining the average performance of a policy offline. Do we ever care about measuring how accurate (RMSE or otherwise) the greedy linear model is underneath the policy at estimating cost against a test set? If I did care about measuring the performance of the cost sensitive classifier underneath the policy is that something I could extract from a VW cb_explore_adf policy or do I need to train a new regression from scratch in VW with the same parameters?
    2 replies
    kornilcdima
    @kornilcdima
    Is it possible somehow to change the posterior distribution for a chosen context? As MABT selected an optimal arm many times, the variance of posterior was decreased and now it is not able anymore to choose another arm which became optimal for a chosen context.
    1 reply
    Jack Gerrits
    @jackgerrits
    Just a quick announcement/FYI, we're using issues as a way to communicate and discuss deprecations and removals for VW. Take a look at the 'deprecation' (there's just two) tag and if it is something you have an opinion on then please feel free to comment https://github.com/VowpalWabbit/vowpal_wabbit/issues?q=is%3Aissue+is%3Aopen+label%3ADeprecation We're hoping this is a reasonable way to communicate changes to allow us to make progress while not adversely affecting anyone
    MochizukiShinichi
    @MochizukiShinichi
    Hey folks, could anyone please point me to some resources I can read on algorithm details of --cb_adf implementation in VowpalWabbit?
    1 reply
    K Krishna Chaitanya
    @kkchaitu27
    Hi Everyone, I have a doubt regarding action probabilities in input format of vowpalwabbit for contextual bandit. In the wiki, it is said that the input format must be action:cost:probability | features . what is probability here, is it probability for the action to get a reward/cost or something else. I read somewhere that it is the probability of exploration for that action, what does it mean?
    Adam Stepan
    @AdamStepan

    hello, I am trying to train vowpal model using C++ API using this piece of code:

        vw* vw = VW::initialize("-f train1.vw --progress 1");
        {
            ezexample ex(vw, false);
    
            ex.set_label("1");
            ex.addf('a', "a", 0.0);
            ex.addf('a', "b", 1.0);
            ex.addf('a', "c", 2.0);
            ex.train();
            ex.finish();
        }
        {
            ezexample ex1(vw, false);
    
            ex1.set_label("0");
            ex1.addf('a', "a", 2.0);
            ex1.addf('a', "b", 1.0);
            ex1.addf('a', "c", 0.0);
            ex1.train();
            ex1.finish();
        }
    
        VW::finish(*vw);

    this snippet generates the model, but the number of examples and number of features is 0, am I doing something wrong? I also tried to use example instead of ezexample and the result was the same and in either case, I did not see a progress log...

    6 replies
    Max Pagels
    @maxpagels_twitter
    @kkchaitu27 contextual bandits have exploration, ie there should always be a nonzero probability of choosing some action. the reason for this is to try out different actions to learn what works and what doesn’t. this probability is the one mentioned in the docs. it’s value depends on the exploration algorithm, for epsilon greedy with two actions and 10 percent exploration the best action is chosen with prob .95 and the other with .05. if you use cb_explore when collecting data, vw calculates these probabilities for you
    K Krishna Chaitanya
    @kkchaitu27
    @maxpagels_twitter Thanks for your response, how do I compute probability if I have historic data? Is it equal to number of times that action has been chosen/total number of times the context has appeared?
    1 reply
    Ryan Angi
    @rangi513

    I'm happy to turn this into a github issue, but want to make sure I'm not attempting some unintended behavior first.

    I am attempting to do multiple passes over a cb_adf dataset to hopefully improve the quality of my q function. I'm thinking of trying an offline bandit using the whole dataset and multiple passes instead of online with iterative updates. However, I get the following error after the first pass:

    libc++abi.dylib: terminating with uncaught exception of type VW::vw_exception: cb_adf: badly formatted example, only one line can have a cost
    [1]    90720 abort      vw --cb_adf --passes 2 -c -d train.dat

    Here is my command and dataset for reproducibility:
    vw --cb_adf --passes 2 -c -d train.dat

    train.dat

    shared | a:1 b:0.5
    0:0.1:0.75 | a:0.5 b:1 c:2
    | a:1 c:3
    
    shared | s_1 s_2
    0:1.0:0.5 | a:1 b:1 c:1
    | a:0.5 b:2 c:1

    I'm using version 8.10.1. I found this SO post and VowpalWabbit/vowpal_wabbit@431c270 by @jackgerrits that maybe was supposed to fix this but also could be unrelated.

    Are multiple passes not supported for --cb_adf? If so, maybe some better error messaging might be useful here?

    2 replies
    K Krishna Chaitanya
    @kkchaitu27

    This is a sample dataset I created

    1:1:1.0 2:2 3:3 4:4 | a b c
    1:1 2:2:1.0 3:3 4:4 | a b c
    1:1 2:2 3:3:1.0 4:4 | a b c
    1:1 2:2 3:3 4:4:1.0 | a b c
    1:1 2:2:0.7 3:3 4:4 | d e f

    when I do

    vw -d sampledata.vw --cb 4
    8 replies
    I get
    Num weight bits = 18
    learning rate = 0.5
    initial_t = 0
    power_t = 0.5
    using no cache
    Reading datafile = sampledata.vw
    num sources = 1
    Enabled reductions: gd, scorer, csoaa_ldf, cb_adf, shared_feature_merger, cb_to_cbadf
    average  since         example        example  current  current  current
    loss     last          counter         weight    label  predict features
    [critical] vw (cb_adf.cc:279): cb_adf: badly formatted example, only one cost can be known.
    Why are these reductions csoaa_ldf, cb_adf, shared_feature_merger, cb_to_cbadf
    enabled when I just do --cb actions?
    K Krishna Chaitanya
    @kkchaitu27
    Hi All, I want to deploy vowpalwabbit model for online learning. I am using historical data to warmstart model. I want to use that trained model and start doing online serving and learning. I can see mmlspark provides functionality to serve vw model but how do I train vw model on real time? Are there any resources for deploying vw contextual bandit model in an online learning and serving fashion?
    2 replies
    MochizukiShinichi
    @MochizukiShinichi
    Hello folks, I'm reading VowpalWabbit/vowpal_wabbit#1306 on removing arms in adf format data but I don't seem to have found a solution in the thread. Could anyone please let me how to format multiline codes to indicate some arms are no longer eligible?
    3 replies
    Owain Steer
    @0wainSteer_twitter
    Hey everyone, I'm currently following the OPE tutorial with both IPS and DR estimators on my own data. I'm finding that average loss while using DR changes from positive to negative at times when using the default 'squared' loss function, but not when using the 'classic' method of squared loss. I was assuming this is because of the importance weight aware updates skewing results but don't know how to interpret the negative loss or if this should be happening at all? Would appreciate any insight into this, thanks!
    1 reply
    George Fei
    @georgefei
    Hi all, I have a quick question. for cb_explore usage like following: vw --cb_explore 2 --cover 10 -d train.txt -l 0.1 --power_t 0 --save_resume vw.model --l1 1e-08; since --power_t is set to 0 and the learning rates don't decay, whether or not having--save_resume makes no difference to the model performance?
    10 replies
    Rohit Choudhary
    @tanquerey
    Should I use a Compute Optimized or Memory Optimized machine to increased training speed ? Or it doesn’t matter ?
    2 replies
    Rohit Choudhary
    @tanquerey

    I am trying to load model from a file

    modelfromfile = pyvw.vw(quiet=True).load('some.model')

    But I am getting following error --
    AttributeError: 'vw' object has no attribute 'load'

    1 reply
    George Fei
    @georgefei
    Hi everyone, I have a batch contextual bandit problem where we make decisions for one cohort and receive the reward afterwards for everyone in the cohort. during backtesting I noticed the order of the training samples that get fed into the model matters a lot when using the default update setting (--adaptive, --normalized and --invariant). Different orderings have different optimal hyperparameter choices, and the validation loss also differs a lot. to address this issue, I was thinking of adopting a nested validation scheme by reshuffling the data before feeding into the model for multiple times and taking the hyperparameter that on average perform the best. Another solution is to use --bfgs to do batch updates. I believe the second method is preferred; is that correct?
    26 replies
    K Krishna Chaitanya
    @kkchaitu27
    Hi everyone, I am experimenting contextual bandits with continuous action with the following line of code.
    2 replies
    vw = pyvw.vw("–cats_pdf 300 –bandwidth 1 –min_value 1 –max_value 300")
    I see that I have to give data to continuous action as follows: action:cost:pdf_value |[namespace] <features> . What should I give pdf_value? Can I give it as 1.0 as I do not clearly have probability distribution function value for the data I have. How does this pdf_value affect learning process in the algorithm?
    Bernardo Favoreto
    @Favoreto_B_twitter

    Hey guys, I was wondering... what influences the time to load a model? I've tested model files of different sizes, but it doesn't seem like there's any correlation.

    Is it the total number weights?

    The number of non-null weights? (I don't think so because this is correlated to file size - which isn't correlated to loading time)... and by the way, is there an easy way to count the number of non-null weights? I'm currently iterating on all weights and counting those that aren't 0

    Jack Gerrits
    @jackgerrits
    A model contains a command line which dictates how VW is instantiated, so if the command line is different then the load times for two different models can be different as it depends on what the reductions which are being setup are doing. In the presence of two models with the same command line then the next things are reduction specific data (this depends on the reduction but I would say most of the time this is a constant time operation) and then the non-zero model weights as you mentioned.
    4 replies
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hello again, I have a follow-up to my previous question regarding the model load time. We're currently building a multi-tenant system to serve VW models, and I was wondering... is there a recommended approach in this scenario? Can we keep models on memory, or do we have to load them for every prediction?
    Any input here is welcome!
    4 replies
    Bernardo Favoreto
    @Favoreto_B_twitter

    Good morning everyone... I have a simple question regarding one of the Personalizer's examples.

    Why the JSON input in this example defines the features as an array of objects? Why not simply an object? Does it have anything to do with Namespaces?

    3 replies
    memeplex
    @memeplex

    Hi all, I was wondering how lrq (aka factorization machine) plays with Importance Weight Aware Updates (https://arxiv.org/pdf/1011.1576.pdf) since the development in that paper is for linear models:

    In this paper we focus on linear models i.e. p = <w, x> where w is a vector of weights

    but lrq models are not linear given that they involve products of weights. So what about the property:

    Therefore all gradients of a given example point to the same direction and only differ in magnitude.

    that's assumed in the paper? I wasn't able to find any related discussion.

    Ignacio Amaya
    @xinofekuator:matrix.org
    [m]
    Hi all, I found this answer (https://stackoverflow.com/questions/48687328/how-much-preprocessing-vowpal-wabbit-input-needs) which says that normalization is used for SDG by default. I was wondering if this is also true for contextual bandits (using cb or cb_explore) numerical variables.
    1 reply
    Tahseen Shabab
    @thshabab_gitlab
    Hi all, great to be a part of this community! I have a quick question regarding the use of Softmax Explorer in cb_explore_adf. VW documentation states the Softmax Explorer predicts a score indicating the quality of each action. Is this Score = Reward, or is this Score proportional to Reward, to something else? Thank you:)
    6 replies
    Jack Gerrits
    @jackgerrits
    We're running a survey to better understand how VW is used. If you have a moment we'd greatly appreciate your input. More details on the blog: https://vowpalwabbit.org/blog/vowpalwabbit-survey.html
    bef55
    @bef55
    I am trying to perform an active learning task to find the documents that will be most helpful to get labeled. But when I run vw in --active mode, it returns an importance weight of 1.0 for every unlabeled example I feed it. Specifically, I run this command: vw --active --port 6075 --daemon --foreground -i existing_model_filename.ext Then I run python3.9 active_iterator.py localhost 6075 unlabeled_examples_filename.ext. All of the over 800K unlabeled examples return an importance of exactly 1.0, even though the predictions are variable and largely accurate. In the past I have received highly useful and variable importance weights, and I cannot figure out what is wrong now. The only possibility that even occurs to me is that in active_iterator.py I had to change the first line of the recvall function to buf=s.recv(n).decode() from buf=s.recv(n), and I changed the sendall calls from sock.sendall(line) to sock.sendall(line.encode()). Any ideas? Thanks very much.
    16 replies
    Priyanshu Agarwal
    @priyanshuone6
    Hey, I am Priyanshu Agarwal, currently a third year undergraduate pursuing my bachelor's degree. I want to contribute to this project, could you please please guide me on how to start contributing and point out some beginner-friendly issues to get me started on this project? Thanks!
    2 replies