--cats
? have you experimented with that at all? For cats I would try different combinations of number of discrete actions used by the algorithm (passed in to the --cats arg) and bandwidths (bandwidth being a property of the continuous range). e.g. I would try a grid of num_actions [8, 16, 32, 64, 128, 256, 1024] and e.g. bandwidths [1, 2, 4, 6, 8, 10, 14, 20]. For different number of discrete actions you might need more data for CATS to converge to something sensible. CATS label support in pyvw should be available in the next release (coming soon-ish, we don't want to wait another year for the next vw release). Let me know if you get better results from CATS or not :)
prob(new_policy)/prob(logging_policy)
, but isn't this only for when we use IPS? I think I'm missing something quite obvious here...3.In order to use -explore_eval I have to convert my data from cb format to cb_adf format since the cb format is not supported when using -explore_eval. For the example data with two arms below, are the two ways to represent the data equivalent?:
2:10.02:0.5 | x0:0.47 x1:0.84 x2:0.29
1:8.90:0.5 | x0:0.51 x1:0.65 x2:0.67shared | x0:0.47 x1:0.84 x2:0.29
| a1
0:10.02:0.5 | a2shared | x0:0.51 x1:0.65 x2:0.67
0:8.90:0.5 | a1
| a2
cb_type ips/dm/dr
and choosing the one with the best reported loss. Isn't that wrong, especially considering dm is biased? --eval throws an error if you use DM.
OPE PR: VowpalWabbit/vowpalwabbit.github.io#193
@lalo @olgavrou et al, Note that I will need expert advice on this. There is a checklist that needs to be confirmed to absolute certainty or, if untrue, commented on to provide me with the correct interpretation.
Hi all, I had a problem when using SquareCB algorithm to train contextual bandit model, especially when saving & loading it again.
I trained and saved SquareCB model in this way (using the simulation setting as in https://vowpalwabbit.org/tutorials/cb_simulation.html):
vw = pyvw.vw("--cb_explore_adf -q UA -f squarecb.model --save_resume --quiet --squarecb")
num_iterations = 5000
ctr = run_simulation(vw, num_iterations, users, times_of_day, actions, get_cost)
plot_ctr(num_iterations, ctr)
vw.finish()
and then loaded the model :
vw_loaded=pyvw.vw('--cb_explore_adf -q UA -i squarecb.model')
num_iterations = 5000
ctr = run_simulation(vw_loaded, num_iterations, users, times_of_day, actions, get_cost, do_learn=False)
plot_ctr(num_iterations, ctr)
print(ctr[-1])
and the loaded model seems like doing a random exploration.
Could anyone explain how to save and load this model correctly? Thanks in advance.
Hey everybody. I've just started to use VW. And I'm solving a dynamic pricing problem where price is discrete action space (10 arms). Prices are cut on buckets, every bucket stores 10% of prices. My cost is CTR. My probability is constant 0.1 since I have 10 arms each of them appears in 10% of cases. My goal is to find optimal prices which lead to increasing CTR.
I know that CATs is better for my case but I prefer not using it as the first attempt.
I have the following questions questions:
1). What is the main difference between --cb and -cb_explore. As I understood --cb_explore just gives probabilities and --cb doesn't. I’ve noticed that it was mentioned that --cb doesn't do exploration and --cb_explore does. Am I right at this point?
2). VW requires the following format action: cost: probability. And probability here is nothing but pmf. Would it be right in my case just to set 0.1 for all cases.
3). I do kind of pretraining on logged data (existing dataset) to learn policy with parameters: --cb_explore 10 cover 13. After that I use a pre-trained model with the flag -i. I get the output with probas and take the highest proba as predicted value. Will I be exploring in this case?
Please forgive my naive questions and many thanks for answers in advance )