Hi there, I have been pointed to this chan regarding my query, seems that this a perfect question for you.
For my online advertisement project I'm using Thompson Sampling to try optimise which of my headline yield the most clicks. I'm using thompson sampling because after reading several and several paper it looks like it is one of the best suited when there is a delayed feedback (I can only update the TS params every 30 minutes)
My question is, is there a general formula for TS that would tell me how much experiences (ads impression in my scenario) would I need in order for the optimisation to be considered effective?
Hi @pmineiro thank you for the clarification. I haven't the chance yet to fully play with VW because (for now) I just want to do something very basic (with no context) so I went and chosen TS which was simple to understand and to code.
Drawing the distirbution is the soltuion I'm currently doing to see how the variation evolve over time and the only way I found so fare is , like you mentionned, simulate with different param to see how much impression are needed in order to reach for example 95% confidence
But I was wondering if there are some general formulas that could apply for any MAB algos
vw --cb_explore_adf --bootstrap 100 -d train.datand to get out confidence intervals on the final PVL
--progress 1that is the loss of each training example
I'm new here! It's been a few years since I've used VW so I'm really glad I have found this community :) I'm currently writing the second edition of my book Data Science at the Command Line and VW will play a big role in Chapter 9: Modeling Data. I'm also working on the Data Science Toolbox which will include VW and many other command-line tools.
I was wondering, when installing VW via
pip, is the command-line tool
vw also installed? The documentation seems to suggest so, but I'm unable to locate it. I'm on Ubuntu.
pipwill not install the command line tool as far as I know. https://vowpalwabbit.org/start.html has info about how to get the C++/command line tool by building from source (or brew on MacOS). Please feel free to reach out to me if you have any more questions!
Hi all, I’m a newbie to contextual bandits and learning to use VW.
Could anyone help me understand if I’m using it correctly.
Problem: I have a few hundred thousands of historical data and I want to use them to learn a warm-start model. I saw there are some tutorials showing how to use cli in wiki. But i wonder if I can use its python version in this way, assuming the data has been formatted:
vw = pyvw.vw("--cb 20 -q UA --cb_type ips") for i in range(len(historical_data)): vw.learn(historical_data[i])
my questions are:
1) Is this the correct way to warm start the model?
2) If so, what prob should I use for each training instance? If it is deterministic, I guess it would be 1.0?
3) For exploitation/exploration after having this initial model, can I save the policy and then apply
--cb_explore 20 -q UA --cb_type ips --epsilon 0.2 -i cb.model to continue the learning?
Thanks for the help in advance!
Hi Guys, I am working on a project similar to News Recommendation Engine which predicts the most relevant articles given user feature vector. I wanted to used VW's contextual bandit for the same.
I have tried using VW, but it seems that VW only output's a single action per trial. Instead, I wanted some sort of ranking mechanism such that I can get the top k articles per trial.
Is there any way to use VW for such use case?
I have asked this question in stackoverflow as well. (https://stackoverflow.com/questions/63635815/how-to-learn-to-rank-using-vowpal-wabbits-contextual-bandit )
Thanks in Advance.
Hi! Thanks to VW authors for the CCB support, finding it very useful!
Quick question: how is offline policy evaluation handled for CCBs in VW? IPS, DM, something else? Was wondering if there is a paper I can read about this. Was looking into https://arxiv.org/abs/1605.04812 but wasn't sure this estimator is the one VW uses specifically for CCBs.
@pmineiro excellent, thanks for the response.
A second question: let's say I have collected bandit data from several policies deployed to production one after the other, i.e. thought of as a whole, it is nonstationary.
Can I use all of the logged data to train a new policy, even though the logged data is generated by X different policies? If so, are ips/dm/dr all acceptable choices or do they break against nonstationary logged data?
How about offline evaluation of a policy? This paper https://arxiv.org/pdf/1210.4862.pdf suggest that IPS can't be used, is
explore_eval the right option?
What I'm looking for is the "correct" way for a data scientist to offline test & learn new policies, possibly with different exploration strategies, using as much data as possible from N previous deployments with N different policies. The same question also applies to automatic retraining of policies on new data as part of a production system, I'm unsure of the "proper" way to do it
Nice, thanks! I've used the personalizer service, just curious as to how it works under the hood. So with IPS & DM it's ok to train model on logged dataset A-> deploy model -> collect logged data B -> train on A+B -> repeat with ever-growing dataset?
What is the purpose of