HI. I am using https://vowpalwabbit.org/tutorials/cb_simulation.html. I need to save the model and use it to later as the reward(feedback comes to the system few hours later). vw1 = pyvw.vw("--cb_explore_adf -q UA --quiet --epsilon 0.2 save_resume=True")
num_iterations = 5000
ctr = run_simulation(vw1, num_iterations, users, times_of_day, actions, get_cost)
Hello, can anyone tell how to make
cb_explore_adf agent respond to requests in daemon mode properly? i send the multiline commands via
echo ... | netcat ... like in documentation and get no response.
If i launch w/
--audit flag i recieve a bunch of info with unintuitive formatting (see attachment). i assume the 1st value in each line is action probability, and the very last line is some combined gradients or whatnot. Very different from a pmf output, like in python example on website.
Sorry, couldn't attach the img to my prev message thread
for some reasons i get
n+1 size of pmf for data with
n distinct action.
When i do training on
cb_explore_adf for datapoints only with 3 actions (no features apart from
shared|...) and supply one of these examples for testing (
action:cost:proba removed obviously), i get 4 actions in the output file. Why it might be?
I trained the contextual bandit as vw1 = pyvw.vw("-d data/cb_load.dat --cb_explore_adf -q UA -P 1 --invert_hash mymodel.inverted") on https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/test/train-sets/cb_load.dat with the --inverted-hash and I got the mymodel.inverted
I could understand the user and action features. What does 18107:0.137426 . means in "User^time_of_day=afternoonAction^article=politics:18107:0.137426" ?
I think 18107 is the hash value for "User^time_of_day=afternoonAction^article=politics" and 0.137426 is weight. I don't know if this is correct?
How can I get the probability corresponding to the user and action features from the weights?
options: --cb_adf --cb_explore_adf --cb_type mtr --csoaa_ldf multiline --csoaa_rank --quadratic UA
Hi All,Is there any way to add importance weight to off line training of the contextual bandits? Similar to linear regression where we specify importance weight as 2 in the training example " 1 2 second_house | price:.18 sqft:.15 age:.35 1976".
This will help in reducing the training time of contextual bandits as the training data points are in billions. But we get good reduction if we use importance weight as data points are repeated.
--epsilonvalue and then load with another?
I'm NG Sai, final year UG @ IIIT Sri City. I got to know about Microsoft RLOS programme via LinkedIn. My experience with open source includes contributing to C++ ML libraries such as: shogun, tensorflow-lite support and, mlpack where I've done GSoC'21 and currently serve as a member. My github.
I came across the Safe Contextual Bandits. Is this topic taken for this summer or will it be available for the year 2022. My forte is implementing algorithms from research papers so I wanted to inquire about this.
Thanks in advance!
vw -h) but didn't find and it's neither in the documentation. I'm interesting in knowing the values for gamma_scale (I believe I saw in the presentation that it's set to 1000 but would be good to confirm) and gamma_exponent.
vw.get_arguments()on the model also doesn't show the default values.
ccb shared | s_1 s_2 ccb action | a:1 b:1 c:1 ccb action | a:0.5 b:2 c:1 ccb action | a:0.5 ccb action | c:1 ccb slot | d:4 ccb slot 1:0.8:0.8,0:0.2 0,1,3 | d:7
[warning] Unlabeled example in train set, was this intentional?, I didn't understand
ccb slot | d:4what does it mean? Does it mean this is the fourth slot? In L7, what's happening? + the error term. Would be really grateful for an explanation, thanks
In CCB, what's the difference between the weight of an example vs it's cost. I thought weight is just an inverse of cost?
I tried feeding the training data from CCB page, and the weight seems to be equal to the example counter, is that intentional?