prob(new_policy)/prob(logging_policy)
, but isn't this only for when we use IPS? I think I'm missing something quite obvious here...3.In order to use -explore_eval I have to convert my data from cb format to cb_adf format since the cb format is not supported when using -explore_eval. For the example data with two arms below, are the two ways to represent the data equivalent?:
2:10.02:0.5 | x0:0.47 x1:0.84 x2:0.29
1:8.90:0.5 | x0:0.51 x1:0.65 x2:0.67shared | x0:0.47 x1:0.84 x2:0.29
| a1
0:10.02:0.5 | a2shared | x0:0.51 x1:0.65 x2:0.67
0:8.90:0.5 | a1
| a2
cb_type ips/dm/dr
and choosing the one with the best reported loss. Isn't that wrong, especially considering dm is biased? --eval throws an error if you use DM.
OPE PR: VowpalWabbit/vowpalwabbit.github.io#193
@lalo @olgavrou et al, Note that I will need expert advice on this. There is a checklist that needs to be confirmed to absolute certainty or, if untrue, commented on to provide me with the correct interpretation.
Hi all, I had a problem when using SquareCB algorithm to train contextual bandit model, especially when saving & loading it again.
I trained and saved SquareCB model in this way (using the simulation setting as in https://vowpalwabbit.org/tutorials/cb_simulation.html):
vw = pyvw.vw("--cb_explore_adf -q UA -f squarecb.model --save_resume --quiet --squarecb")
num_iterations = 5000
ctr = run_simulation(vw, num_iterations, users, times_of_day, actions, get_cost)
plot_ctr(num_iterations, ctr)
vw.finish()
and then loaded the model :
vw_loaded=pyvw.vw('--cb_explore_adf -q UA -i squarecb.model')
num_iterations = 5000
ctr = run_simulation(vw_loaded, num_iterations, users, times_of_day, actions, get_cost, do_learn=False)
plot_ctr(num_iterations, ctr)
print(ctr[-1])
and the loaded model seems like doing a random exploration.
Could anyone explain how to save and load this model correctly? Thanks in advance.
Hey everybody. I've just started to use VW. And I'm solving a dynamic pricing problem where price is discrete action space (10 arms). Prices are cut on buckets, every bucket stores 10% of prices. My cost is CTR. My probability is constant 0.1 since I have 10 arms each of them appears in 10% of cases. My goal is to find optimal prices which lead to increasing CTR.
I know that CATs is better for my case but I prefer not using it as the first attempt.
I have the following questions questions:
1). What is the main difference between --cb and -cb_explore. As I understood --cb_explore just gives probabilities and --cb doesn't. I’ve noticed that it was mentioned that --cb doesn't do exploration and --cb_explore does. Am I right at this point?
2). VW requires the following format action: cost: probability. And probability here is nothing but pmf. Would it be right in my case just to set 0.1 for all cases.
3). I do kind of pretraining on logged data (existing dataset) to learn policy with parameters: --cb_explore 10 cover 13. After that I use a pre-trained model with the flag -i. I get the output with probas and take the highest proba as predicted value. Will I be exploring in this case?
Please forgive my naive questions and many thanks for answers in advance )
So as far as I know netflix actually does so that the possible combos of e.g title and image are predefined, and those form a single arm. Of course the amount of combos is massive, so I don't think they use all.
There has to be some prefiltering going on since i suspect showing (title: crime dramas, show: top gear, picture: jurassic park) would lead to issues :). So I think that they aren't using slates as slates in VW are defined, merely a large action space where one action is one predefined combo of title, genre, picture and so on.
I may also be wrong here