I don't know what you mean by
multiple lines where cost label will come
2,3,4,5 is not a CB label, you can see CB labels here
cost sensitive one against allalgorithm?
@Sharad24 from the exercises page:
Send the code change as a diff (use git diff) and the output of vw --version with your change
Hi everyone! I have been advised to come on this Gitter since you guys are the expert en this subject.
I have a small project where I need to show different adverts to a bunch of people. What I have been doing so fare is simply randomly showing all the ads to everyone for a period of time, then take the best performing one (Highest ctr) and from there on only show 'that' ad to everyone else.
This does work in a sense but most of the time the 'best' performing ad that was performing well on the testing phase doesn't always perform the best on the exploitation phase.
So I wanted to 'up' the game to try to smartly learn which ad is performing well based on its past success and slow converge to showing that ad (knowing that they success of an ad might change over time like described above) And so I have discovered the MAB problem which I think might be perfect for my use case.
I have been reading quite a lot of papers of the subject and they all agree that algorithm such as UCB-1 or Thompson Sampling are a great solution for display advertising use cases.
However, after reading all this paper and seeing how the algo were implemented, they perform UCB or TS on 'every single' user event. So somehow for every single 'impression' they know if it has yielded a click or not and from there do the computation for every single event to know which 'arm' to pull (which ad to show)
In my case, since I'm dealing with a big traffic I cannot do this. Instead, I have a system that is able to aggregate all the ad events (click, impression etc..) in real time.
So for example I can aggregate the clicks 'count' and impression 'counts' for any ads for the 'past 5 minute'.
So if I were to exploit MAB, the algorithm would have as inputData the aggregated sum of events (sum of clicks and sum of impression and so the true CTR) for the past 5 minutes (5 minute here is for the example).
This is where I got a bit confused because I'm not exactly sure if I will be able to use the current form of TS or UCB on aggregated data like this?
Would it be correct to simply use the aggregated numbers to for example feed them to the UCB formula ?
Or use them to calculate the beta distribution for TS ?
--eval(not included in the current argument listing on the wiki but referenced in the tests) to simulate this policy and get an average loss on a test set. I naively expected to use
--testonlyto do the same for the vw-learned policies, but the loss reported is dismal in comparison. However, if I use the actual actions predicted by those runs and run
--evalon those, losses are much better. What is the difference between what
--testonlyis doing vs. running
--evalon the outputs of
--eval(which i was unaware of until now) doesn't do any learning or even prediction, it just takes whatever label is present in the input and computes the multiclass loss function experienced by the policy ... i've never used it but looking at the code it appears to be designed to be used with either 1) non-vw policies from the universe or 2) a vw policy that has already had the predictions placed in the file as cb labeled examples ... perhaps the intention was apples-to-apples comparison of a vw policy with some exogenous baseline system
--testonly, which provides both predictions and an average loss. I take those predictions, and put them as a first column in the test data file, then run
--evalon that file - it now has both the action predicted by the trained model, and the action taken by the exploration policy.
--testonlyrun is very different from the run using
--evalon the same predictions.
--evalon my constant policy. I somehow assumed that doing an eval this way was independent of the model, but I guess it still loads, for example, the accumulated loss estimates from that file so loss estimates will be different.
output_example( https://github.com/VowpalWabbit/vowpal_wabbit/blob/ac3a2c21a9760b68ce49368b11a35bf95faeb8b8/vowpalwabbit/cb_algs.cc#L96 ) which calls
get_cost_estimate( https://github.com/VowpalWabbit/vowpal_wabbit/blob/ac3a2c21a9760b68ce49368b11a35bf95faeb8b8/vowpalwabbit/cb_algs.h#L64 ). Confusingly,
get_cost_estimateis reporting the loss of the surrogate doubly-robust objective ( https://github.com/VowpalWabbit/vowpal_wabbit/blob/ac3a2c21a9760b68ce49368b11a35bf95faeb8b8/vowpalwabbit/gen_cs_example.h#L130 ) when a model is loaded. If you rerun your repro with
--cb_type ipsyou will see more consistency. Since (the default) doubly-robust is generally helpful, I would suggest rendering all decisions from the vw policy to a file and then calling
--evalon the rendered file without loading the model to compare with the decisions from your exogenous policy.
--cb_type ipswhen calling
--cb_type drthere makes the production policy look way better than the learned policies, I think because DR never uses the losses on anything but the one static action in the static policy, whereas the other policies pick other actions and DR often starts estimating massive losses even when they haven't switch actions much (I added examples of this here https://colab.research.google.com/drive/1mX6rnF8ZTER_vCyDPPGPjM8yYn56Jdv2#scrollTo=VRfgxBEEkau9&line=2&uniqifier=1 and below that).
--eval? I guess I can calculate that or check the source, just in case you know...