## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
Raphael Ottoni
@raphaottoni
I am asking this because I am not sure if the |Action namespace for the cb_explore_adf threats this as a feature to be quadratic related to the reward
this is the only thing I thought would explain why the model converges if it is a value but when those values are multiplied by a constant (lets say the price each arm represent) it stops to converge
Raphael Ottoni
@raphaottoni
0:-79.8:0.866666661699613 |Price value:399 |Action price=399
|Price value:499 |Action price=499
|Price value:599 |Action price=599
I am trying things like that... but it wont converge either =(
Those are the curves.... one would thing it is easy to converge:
{ curve_type: "Gaussian", curve_id: "arm_1", mean: 20.0, std: 0.0},
{ curve_type: "Gaussian", curve_id: "arm_2", mean: 5.0, std: 0.0},
{ curve_type: "Gaussian", curve_id: "arm_3", mean: 4.0, std: 0.0}
Those are the arm_prices:
{"399": "arm_1", "499": "arm_2", "599": "arm_3"}
reward = Arm_value * Gaussian Sample
vw = pyvw.vw("--cb_explore_adf -q :: --epision 0.2")
Raphael Ottoni
@raphaottoni
Above is a graph of this setup, I really dont know why the agent chosen the most expensive arm!
if i simple divide the rewards by 100 and run the very same experiment:
this thing bugs me
=(
I forgot to mention: reward is actually -1 X arm_value X Gaussian Sample
Max Pagels
@raphaottoni could you provide a github gist of your data?
Raphael Ottoni
@raphaottoni
there is no data training data... just a "simulator" which is a object that would return a sample from those curves regarding the arm_id:
{ curve_type: "Gaussian", curve_id: "arm_1", mean: 20.0, std: 0.0},
{ curve_type: "Gaussian", curve_id: "arm_2", mean: 5.0, std: 0.0},
{ curve_type: "Gaussian", curve_id: "arm_3", mean: 4.0, std: 0.0}
{"399": "arm_1", "499": "arm_2", "599": "arm_3"}
each step VW chooses a ARM, I sample from this curve and multiply the result by the Arm's Price. Then I change the signal so it would be reward instead of cost and them fit it to the model...
Raphael Ottoni
@raphaottoni
The problem appears to be solved if we apply a log function upon the reward.
Max Pagels

there is no data training data... just a "simulator" which is a object that would return a sample from those curves regarding the arm_id:

if you have a simulator, you are training on some VW data somewhere. could you provide that dataset as a gist?

Bernardo Favoreto
Hey guys, I was checking the Slates formulation out of curiosity, and it got me thinking. I could swear streaming services like Netflix used slates for recommendations. I know they personalize both the title recommendation and the thumbnail for each title. Thus, it seemed like the perfect use case for Slates (here, the title would be a slot and the image another slot): there is a single global reward (play or not), and the action set is disjointed.
However, when trying to visualize how this would work in VW, I noticed that it probably wouldn't. What made me think this is that Slates predicts for all slots at once, and therefore there is no way we could select first the title, then pre-filter the possible thumbnails for that tile, and then make a prediction for the thumbnail slot.
Am I missing something here? What are some use-cases of Slates for personalization using VW? The only one that comes into mind is "whole page optimization".
Thanks!
2 replies
@JuiP
Hi everyone, I was looking at the estimators repository issue: VowpalWabbit/estimators#1, we already have an implementation of ips estimator in Python. My question is why is "convert current IPS estimator to Python" mentioned as a Goal for this project? Can someone please clarify?
2 replies
pushpendre
@pushpendre

Hi I was wondering if I could get a pointer to the implementation of the --cb k --cb_type dr in the source code? Basically I am trying to understand the parameters that are learnt at the end of off-policy CB training in VW. E.g. I did

vw --cb 3 --cb_type ips  -f cb.model -d train.txt --invert_hash readable_ips.model
vw --cb 3 --cb_type dm  -f cb.model -d train.txt --invert_hash readable_dm.model
vw --cb 3 --cb_type dr  -f cb.model -d train.txt --invert_hash readable_dr.model

and the dr model obviously contains parameters equal to ips+ dm but I want to know exactly what is the linear regression formula that is being implemented in dr.

3 replies
CP500
@CP500
Hi everyone, Just a newbie question on CATS. Does it give you a PMF when you call predict?
vw = pyvw.vw("––cats_pdf 7 –bandwidth 0.1 –min_value 0 –max_value 1")
ex = vw.parse('ca | c1:0.5 c2:1.3', labelType=8)
vw.predict(ex)
Bernardo Favoreto
Hey guys, I would like to know if anyone found an appropriate way of calculating feature importance after training a model?
I tried using sklearn/eli5 permutation methods but neither properly worked.
Then, I decided to code my own, where I first train a model, then do permutation importance in a held out set. I'm a bit concerned as to whether the results are significant, mainly because of all interactions created on the fly with VW. I should mention I am aware of the multilinearity/correlation problem, and this is not my biggest concern.
Does it even make sense to calculate feature importance in VW? (I assume so because this was one of the topics from the VW presentation at: https://slideslive.com/38942331/vowpal-wabbit
Thanks!
olgavrou
@olgavrou
@CP500 cats pdf should give you a pdf (probability density function) and not a pmf (probability mass function) as cats is predicting in a continuous action space. PDF is in the form of (left:right:pdf_value) triples so you could check that the pdf integrates to 1 by doing (left - right) pdf_value + (left - right) pdf_value for all the returned triples
7 replies
Bernardo Favoreto
Hello everyone!
Does anyone know where (and if) I can find the notebook for the Estimators library? I would like to use this lib, but there's not much documentation/examples on how to do so.
Also, does it make sense to use this lib for CCBs?
Thanks!
Jack Gerrits
@jackgerrits
@Favoreto_B_twitter that repo is very much still a work in progress - so docs/examples are not there yet unfortunately. For CCB the approach that has been taken so far is to do CFE on the first slot only - so in this context I think it does make sense to use it. But it would need some adapting and I am not positive here.
9 replies
Max Pagels
Is there an estimate on when the pypi version of pyvw will have CATS support? 8.9.0 doesn't support CATS labels
Max Pagels
I'm fiddling around with CATs, and have a simple setup with a fixed context. Per round, I ask for an action (range 0-100) and calculate a cost that is zero at 50, otherwise quadratically the absolute distance from 50 in either direction. If tried gridsearching a whole mess of bandwith, epsilon and learning rate values, but the learning is just all over the place. I would have expected the system to converge to an optimal prediction of 50.0 per round pretty easily since the context is always fixed. Instead, it either bounces around or gets stuck on some non-optimal values around 40. Any tips?
olgavrou
@olgavrou
Hi @maxpagels_twitter what is the parameter you pass to --cats? have you experimented with that at all? For cats I would try different combinations of number of discrete actions used by the algorithm (passed in to the --cats arg) and bandwidths (bandwidth being a property of the continuous range). e.g. I would try a grid of num_actions [8, 16, 32, 64, 128, 256, 1024] and e.g. bandwidths [1, 2, 4, 6, 8, 10, 14, 20]. For different number of discrete actions you might need more data for CATS to converge to something sensible. CATS label support in pyvw should be available in the next release (coming soon-ish, we don't want to wait another year for the next vw release). Let me know if you get better results from CATS or not :)
Max Pagels
I tried gridsearching a whole mess of options, including a bunch of action counts, and can get relatively close to an optimum, but the hyperparams seem to be super important to get just right or the learning is way off. But I'll experiment further and report back
3 replies
Bernardo Favoreto
Hello guys!
Can someone help me understand why propensities scores are important when training a CB?
Let's take the epsilon-greedy, for example. When we train a CB model with epsilon-greedy, the pmf output is always the same (just the indexes change). This makes me assume that propensities scores aren't supposed to teach the CB how to output probabilities. Moreover, I believe they are used for "importance weighting", i.e., prob(new_policy)/prob(logging_policy), but isn't this only for when we use IPS? I think I'm missing something quite obvious here...
Also, when we offline train a new model using logged CB data, how is the new CB able to achieve better performance than the logging policy? I mean, it's an excellent thing, but I would like to understand how that is possible.
Thanks!
George Fei
@georgefei
Hi all, I have a few questions related to contextual bandit evaluation:
1.How do I compare the performance of different policies’ decisions using --eval? Do I look at the average loss in the output? If the costs in the input data are all negative and a lower cost is better, does a lower average loss mean one policy is better? What does average loss represent?
2.How do I interpret the output of --explore_eval? More specifically update count, violation count, and final multiplier (what variables do they correspond to in the algorithm on slide 9 of https://pdfs.semanticscholar.org/presentation/f2c3/d41ef70df24b68884a5c826f0a4b48f17095.pdf). Do I also look at the average loss to compare different exploration algo + hyperparameter combinations?

3.In order to use -explore_eval I have to convert my data from cb format to cb_adf format since the cb format is not supported when using -explore_eval. For the example data with two arms below, are the two ways to represent the data equivalent?:

2:10.02:0.5 | x0:0.47 x1:0.84 x2:0.29
1:8.90:0.5 | x0:0.51 x1:0.65 x2:0.67

shared | x0:0.47 x1:0.84 x2:0.29
| a1
0:10.02:0.5 | a2

shared | x0:0.51 x1:0.65 x2:0.67
0:8.90:0.5 | a1
| a2

Wes
@wmelton
Hello all - ive been evaluating Microsoft Personalizer for our company, which i have largely assumed is VW under the hood with MS specific tech/service written on top of it.
My question is this - within a given namespace, does the order of features or their names matter? Im assuming yes, but the VW documentation out there doesnt make it super clear how to handle a situation where two given documents have the same keywords in them, but after tokenization, the keywords are not in the same order due to variance in the number of keywords found in each document. Appreciate guidance there.
finally, i referenced Personalizer only because it sparked this train of thought largely because the documentation for it leverages only rhe JSON format of input data, but seems to neglect any instruction with regards to variation in keyword order if your features are keywords extracted from a document. Thanks!
18 replies
Max Pagels
@georgefei did you already get an answer to your questions? I'd be very interested in them, too
Particularly, I imagine lots of folks do evaluation by gridsearching learning with cb_type ips/dm/dr and choosing the one with the best reported loss. Isn't that wrong, especially considering dm is biased? --eval throws an error if you use DM.
George Fei
@georgefei
related to my third question above about whether the same data in cb and cb_adf formats are going to yield the same result. Not sure about the vw implementation but in the contextual bandit bake-off paper, the reward estimation is formulated differently for each case:
49 replies
Max Pagels
I think there is a very clear need for a policy evaluation tutorial on vowpalwabbit.org. I'd be happy to write one, assuming someone can help answer questions as they arise, since I have a couple of outstanding ones myself. Would folks find this valuable?
10 replies
Max Pagels