cb_type ips/dm/drand choosing the one with the best reported loss. Isn't that wrong, especially considering dm is biased? --eval throws an error if you use DM.
@lalo @olgavrou et al, Note that I will need expert advice on this. There is a checklist that needs to be confirmed to absolute certainty or, if untrue, commented on to provide me with the correct interpretation.
Hi all, I had a problem when using SquareCB algorithm to train contextual bandit model, especially when saving & loading it again.
I trained and saved SquareCB model in this way (using the simulation setting as in https://vowpalwabbit.org/tutorials/cb_simulation.html):
vw = pyvw.vw("--cb_explore_adf -q UA -f squarecb.model --save_resume --quiet --squarecb") num_iterations = 5000 ctr = run_simulation(vw, num_iterations, users, times_of_day, actions, get_cost) plot_ctr(num_iterations, ctr) vw.finish()
and then loaded the model :
vw_loaded=pyvw.vw('--cb_explore_adf -q UA -i squarecb.model') num_iterations = 5000 ctr = run_simulation(vw_loaded, num_iterations, users, times_of_day, actions, get_cost, do_learn=False) plot_ctr(num_iterations, ctr) print(ctr[-1])
and the loaded model seems like doing a random exploration.
Could anyone explain how to save and load this model correctly? Thanks in advance.
Hey everybody. I've just started to use VW. And I'm solving a dynamic pricing problem where price is discrete action space (10 arms). Prices are cut on buckets, every bucket stores 10% of prices. My cost is CTR. My probability is constant 0.1 since I have 10 arms each of them appears in 10% of cases. My goal is to find optimal prices which lead to increasing CTR.
I know that CATs is better for my case but I prefer not using it as the first attempt.
I have the following questions questions:
1). What is the main difference between --cb and -cb_explore. As I understood --cb_explore just gives probabilities and --cb doesn't. I’ve noticed that it was mentioned that --cb doesn't do exploration and --cb_explore does. Am I right at this point?
2). VW requires the following format action: cost: probability. And probability here is nothing but pmf. Would it be right in my case just to set 0.1 for all cases.
3). I do kind of pretraining on logged data (existing dataset) to learn policy with parameters: --cb_explore 10 cover 13. After that I use a pre-trained model with the flag -i. I get the output with probas and take the highest proba as predicted value. Will I be exploring in this case?
Please forgive my naive questions and many thanks for answers in advance )
So as far as I know netflix actually does so that the possible combos of e.g title and image are predefined, and those form a single arm. Of course the amount of combos is massive, so I don't think they use all.
There has to be some prefiltering going on since i suspect showing (title: crime dramas, show: top gear, picture: jurassic park) would lead to issues :). So I think that they aren't using slates as slates in VW are defined, merely a large action space where one action is one predefined combo of title, genre, picture and so on.
I may also be wrong here
Does anyone have an example of daemon style code for CATs? Right now I’m using a python wrapper which I took from Olga’s notebook example and It works fine. However, I have a subtle vision of how to launch it in daemon-style.
Is it something like this?
pre-training the model on historical data
vw --cats 6 --bandwidth 0.5 --min_value 0 --max_value 3--epsilon 0.3 -d train.dat -f model.vw
raising the ready model
vw --cats 6 --bandwidth 0.5 --min_value 0 --max_value 3--epsilon 0.3 --save_resume --daemon --quiet --num_children 1 --port 8080 -i model.vw -f model.vw
updating the model on new data
vw --cats 6 --bandwidth 0.5 --min_value 0 --max_value 3--epsilon 0.3 --save_resume -i model.vw -d train.dat -f model.vw
hi @kornilcdima here is some documentation on how to use vw in deamon mode and it should work fine if you start vw with the appropriate cats arguments: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format#on-demand-model-saving
@olgavrou, thank for the answer. I already launched VW for discrete action space. So, as I understood, I should I use the same syntax. Then the main question is in what situation I should use --cats_pdf instead of --cats ?
cats_pdfunder the hood and then sample from the pdf for you. So you would use
catswhen you want vw to do the pdf sampling for you and
cats_pdfwhen you want the entire pdf and/or want to do the sampling yourself (see here:https://github.com/VowpalWabbit/vowpal_wabbit/wiki/CATS,-CATS-pdf-for-Continuous-Actions)
@maxpagels_twitter Thanks a ton for your OPE tutorial and your really insightful questions above - I've found it extremely useful.
Currently I have a logged dataset generated from an online bandit policy
--epsilon 0.05 --cb_type dr. I want to determine whether I should be using
mtr (IWR) for my
cb_type for my online bandit (assuming I restart my policy in the future). I can run
--cb_adf over the logged dataset:
vw --cb_adf -d train.dat -q AF --cb_type mtr however, based on the OPE tutorial and above comments/questions I understand that I shouldn't compare the PV loss across different OPE estimators. Is there a method I should use to determine the best
cb_type option to use for my online policy? (mtr shows a much lower loss than dr, but I understand this isn't really comparable.)
Please let me know if I'm thinking about this completely wrong and if I should continue to use doubly robust and spend my time fiddling with hyperparameters instead of focusing too much on the PE estimator.