by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Jack Gerrits
    @jackgerrits

    I don't know what you mean by

    multiple lines where cost label will come

    2,3,4,5 is not a CB label, you can see CB labels here

    Shivansh Mundra
    @Shivanshmundra
    Sorry, by the label I mean for every feature like this 91:1.0 92:1.0 1:1.0 2:1.0 93:1.0 94:1.0 4:1.0 95:1.0 we have multiple correct actions like 2,3,4,5. This type of data is from tag selection on a text paragraph. So how to simulate this kind of data?
    Jack Gerrits
    @jackgerrits
    This response from Paul should help answer your question VowpalWabbit/vowpal_wabbit#2262
    Shivansh Mundra
    @Shivanshmundra
    Thanks, @jackgerrits above discussion and http://users.umiacs.umd.edu/~hal/tmp/multiclassVW.html cleared some doubts!
    Also, do you know how can we generate graph from verbose with python/command-line interface? --progressive flag dump data in a non-uniform way so is there a function already?
    Jack Gerrits
    @jackgerrits
    I'm sorry I don't understand what you are asking
    Shivansh Mundra
    @Shivanshmundra
    image.png
    Sorry for not making it clear, we make a loss graph in the process of training. Such as this in CB content personalization notebook:
    Is there any inbuilt method which can give this graph or loss data is structured format for simpler methods like cost sensitive one against all algorithm?
    Jack Gerrits
    @jackgerrits
    No there isn't an inbuilt way to produce a graph like this right now
    You can query things like the loss of each example, or the sum loss so far of the model
    so you could reasonably produce a similar plot for loss
    But CTR is not queryable at the moment, is a bit more specific to the scenario
    Shivam Kumar Jha
    @thealphadollar
    Hey @jackgerrits , I wanted to know are we allowed to get review on our execution plan from you and other mentors?
    Sharad Chitlangia
    @Sharad24
    @jackgerrits For the C++ screening exercise, just a .txt with the code diff should be enough right?
    Jack Gerrits
    @jackgerrits
    @thealphadollar We are not able to review due to time pressures

    @Sharad24 from the exercises page:

    Send the code change as a diff (use git diff) and the output of vw --version with your change

    Eric Madisson
    @Banjolus_twitter

    Hi everyone! I have been advised to come on this Gitter since you guys are the expert en this subject.

    I have a small project where I need to show different adverts to a bunch of people. What I have been doing so fare is simply randomly showing all the ads to everyone for a period of time, then take the best performing one (Highest ctr) and from there on only show 'that' ad to everyone else.
    This does work in a sense but most of the time the 'best' performing ad that was performing well on the testing phase doesn't always perform the best on the exploitation phase.

    So I wanted to 'up' the game to try to smartly learn which ad is performing well based on its past success and slow converge to showing that ad (knowing that they success of an ad might change over time like described above) And so I have discovered the MAB problem which I think might be perfect for my use case.

    I have been reading quite a lot of papers of the subject and they all agree that algorithm such as UCB-1 or Thompson Sampling are a great solution for display advertising use cases.

    However, after reading all this paper and seeing how the algo were implemented, they perform UCB or TS on 'every single' user event. So somehow for every single 'impression' they know if it has yielded a click or not and from there do the computation for every single event to know which 'arm' to pull (which ad to show)

    In my case, since I'm dealing with a big traffic I cannot do this. Instead, I have a system that is able to aggregate all the ad events (click, impression etc..) in real time.
    So for example I can aggregate the clicks 'count' and impression 'counts' for any ads for the 'past 5 minute'.

    So if I were to exploit MAB, the algorithm would have as inputData the aggregated sum of events (sum of clicks and sum of impression and so the true CTR) for the past 5 minutes (5 minute here is for the example).

    This is where I got a bit confused because I'm not exactly sure if I will be able to use the current form of TS or UCB on aggregated data like this?
    Would it be correct to simply use the aggregated numbers to for example feed them to the UCB formula ?
    Or use them to calculate the beta distribution for TS ?

    Paul Mineiro
    @pmineiro
    @Banjolus_twitter ... first let me emphasize that UCB and TS based approaches are only good when you have relatively little contextual information available. If you have no contextual information available (and the environment in stationary), you'll actually find UCB and TS to be much better than the algorithms we employ for contextual bandits in VW. This advantage continues if your contextual information is restricted to a small cardinality categorical (i.e., if you have a small number of contexts that you can count and they are completely distinct, so that the contexts are merely "context 1", "context 2", ...) and you run UCB or TS independently on each context. This becomes increasingly inefficient as the number of contexts increases, and when you get to contexts with rich features (e.g., context with features from the text, web graph, inbound search, etc.) running UCB or TS requires either ignoring or clustering the contextual information and at that point actual contextual bandit algorithms dominate. SImilar considerations apply when you have information about the "arms" (e.g., contextual bandit algorithms can work with advertisements which only occur once by leveraging the features on the advertisements).
    Eric Madisson
    @Banjolus_twitter
    Hi Paul, for the moment I will ignore any context. All what I have is number of clicks and number of impression. I was going to look into contextual bandit once I had the simple bandit up and running
    (In the future I will be able to have context such devices, country, gender etc.. but for now, only clicks and impressions)
    Paul Mineiro
    @pmineiro
    @Banjolus_twitter ... So the best answer to your question is, "use a contextual bandit algorithms, which among other things are robust to delays in receiving the reward." Having said that, I have seen TS be effective in a scenario where the number of contexts was small (circa 10) and the content being personalized was relatively static. Because TS essentially does a softmax, the policy is randomizing between viable alternatives in between data aggregations, so while you react more slowly you can avoid starving any arm. Hard-max versions of UCB are not advised with data aggregation delay. If you do a "randomize over all non-dominated actions" style of UCB that can be ok, but TS will exploit more and in my experience work better. Finally, because of nonstationarity you need to be decaying the counts or these algos will "miss out" on the fact that the rewards have changed.
    jdong95
    @jdong951_twitter
    Hey, have the selection results for RL Open Source Fest been out?
    Jack Gerrits
    @jackgerrits
    If they are not already out then they should be soon
    Aditya Singh
    @adityauser
    Hi Jack
    Jack Gerrits
    @jackgerrits
    Replied
    Shivam Kumar Jha
    @thealphadollar
    Results for RL Open Source Fest are out. I just received a rejection mail; I'm pretty sure people would have got confirmation as well. Congratulations everyone <3
    Sanit
    @sanitgupta
    Congrats to everyone who got in!
    Sanit
    @sanitgupta
    Also, I was wondering if those of us who didn't get in could get some feedback on our applications. I think it would be really helpful for us.
    Jack Gerrits
    @jackgerrits
    @sanitgupta, unfortunately with over 200 applicants we aren't able to provide feedback on individual submissions. On behalf of the team, thank you for your interest and the time and effort in your submission. I'd encourage you to apply again next year!
    Prateek Chanda
    @prateekiiest
    hey @jackgerrits, I just got to know about this project from info related to the RL Fest. I am currently working across the MSR India campus, and hence I am not qualified as per the RL Fest contribution guidelines. I just wanted to know if this project is open to contributions outside the RL Fest as well and which project portions I can start contributing to. Thanks and congrats to everyone who got selected.
    Jack Gerrits
    @jackgerrits
    Hi @prateekiiest, welcome! Vowpal wabbit has always been super open to community contributions. I’d recommend taking a look through the issues and see if something catches your eye. Or you have another idea, feel free to open a feature request and volunteer to work on it
    Prateek Chanda
    @prateekiiest
    Thanks @jackgerrits
    shreyshrivastava4799
    @shreyshrivastava4799
    Congratulations to everyone selected. I did some work on Project 7. I would be glad if it could be of some help.
    shreyshrivastava4799/vowpal_wabbit@cd2544a
    Hanan Shteingart
    @chanansh
    Hi, did someone managed to install VW on a mac with Python 37/38 and Anaconda? I am finding many thread about this issue in the internet but no clear solution. Should I just use dockers?
    First I had make issues regarding Boost https://stackoverflow.com/questions/51271997/vowpal-wabbit-python3-interface-installation-on-osx-10-13-2-python-3-5-1-anaco
    Then when I was able to compile from a clone version I got segmentation errors
    VowpalWabbit/vowpal_wabbit#1095
    Sharad Chitlangia
    @Sharad24
    Installing without any anaconda environment (even base), worked for me.
    Jack Gerrits
    @jackgerrits
    If you install without conda it should work on MacOS
    Peter Gorniak
    @sumpfork
    I'm trying to establish performance of some vw cb models by comparing them to a baseline (namely the current single-action constant policy in a production system). After some searching I found --eval (not included in the current argument listing on the wiki but referenced in the tests) to simulate this policy and get an average loss on a test set. I naively expected to use --testonly to do the same for the vw-learned policies, but the loss reported is dismal in comparison. However, if I use the actual actions predicted by those runs and run --eval on those, losses are much better. What is the difference between what --testonly is doing vs. running --eval on the outputs of --testonly?
    Paul Mineiro
    @pmineiro
    @sumpfork --eval (which i was unaware of until now) doesn't do any learning or even prediction, it just takes whatever label is present in the input and computes the multiclass loss function experienced by the policy ... i've never used it but looking at the code it appears to be designed to be used with either 1) non-vw policies from the universe or 2) a vw policy that has already had the predictions placed in the file as cb labeled examples ... perhaps the intention was apples-to-apples comparison of a vw policy with some exogenous baseline system
    Peter Gorniak
    @sumpfork
    Which is what I'm trying to do. Why are the results different from using --testonly on an existing model though?
    I thought that was also supposed to only do predictions.
    Paul Mineiro
    @pmineiro
    @sumpfork hard for me to say with the information provided. if you could make a jupyter notebook somewhere or some other concise repro that would help
    Peter Gorniak
    @sumpfork
    Hmm, ok, I can't really share the data. I can give my commands. But in principle: I trained a model on a training set, saved the model. Then I loaded the model and ran --testonly, which provides both predictions and an average loss. I take those predictions, and put them as a first column in the test data file, then run --eval on that file - it now has both the action predicted by the trained model, and the action taken by the exploration policy.
    I observe that the loss reported by the --testonly run is very different from the run using --eval on the same predictions.
    Paul Mineiro
    @pmineiro
    @sumpfork 1) i'll need the exact commands, often times due to the confusing nature of vw the problem is in obscure semantics of command line parameters. 2) make up a small dataset that reproduces your issue.
    Peter Gorniak
    @sumpfork
    ok, I'll prepare something
    Peter Gorniak
    @sumpfork
    @pmineiro I think I figured out my discrepancy, and it makes some sense: https://colab.research.google.com/drive/1mX6rnF8ZTER_vCyDPPGPjM8yYn56Jdv2?usp=sharing
    Peter Gorniak
    @sumpfork
    I was not loading the model in the case of running --eval on my constant policy. I somehow assumed that doing an eval this way was independent of the model, but I guess it still loads, for example, the accumulated loss estimates from that file so loss estimates will be different.
    That makes sense, but it makes me wonder what the best way to compare any optimized policies to an existing non-exploration policy is.
    Paul Mineiro
    @pmineiro
    @sumpfork The repro is very helpful. The loss for cb is computed here in output_example ( https://github.com/VowpalWabbit/vowpal_wabbit/blob/ac3a2c21a9760b68ce49368b11a35bf95faeb8b8/vowpalwabbit/cb_algs.cc#L96 ) which calls get_cost_estimate ( https://github.com/VowpalWabbit/vowpal_wabbit/blob/ac3a2c21a9760b68ce49368b11a35bf95faeb8b8/vowpalwabbit/cb_algs.h#L64 ). Confusingly, get_cost_estimate is reporting the loss of the surrogate doubly-robust objective ( https://github.com/VowpalWabbit/vowpal_wabbit/blob/ac3a2c21a9760b68ce49368b11a35bf95faeb8b8/vowpalwabbit/gen_cs_example.h#L130 ) when a model is loaded. If you rerun your repro with --cb_type ips you will see more consistency. Since (the default) doubly-robust is generally helpful, I would suggest rendering all decisions from the vw policy to a file and then calling --eval on the rendered file without loading the model to compare with the decisions from your exogenous policy.
    Peter Gorniak
    @sumpfork
    Should I still use --cb_type ips when calling --eval though? Using --cb_type dr there makes the production policy look way better than the learned policies, I think because DR never uses the losses on anything but the one static action in the static policy, whereas the other policies pick other actions and DR often starts estimating massive losses even when they haven't switch actions much (I added examples of this here https://colab.research.google.com/drive/1mX6rnF8ZTER_vCyDPPGPjM8yYn56Jdv2#scrollTo=VRfgxBEEkau9&line=2&uniqifier=1 and below that).
    If so, what is ips using as a probability for the actions specified through --eval? I guess I can calculate that or check the source, just in case you know...