## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
Wenjuan Dou
@darlwen
@here could someone help answer my question above? Thanks a lot!
Max Pagels

Epsilon greedy works as follows (example with 4 arms):

Per round, choose the best arm given context (i.e. arm with lowest cost) with probability epsilon. With probability 1-epsilon, choose an arm uniformly at random.

With epsilon = 0.1, at any given round, the probability of choosing the best arm is 1-0.1 plus 0.1 x 1/4 -> 0.925 ("exploit"). The probability of choosing a suboptimal arm is 0.1 * (1/4) = 0.025 ("explore")

3 replies
Max Pagels

Watched the great content at https://slideslive.com/38942331/vowpal-wabbit, thanks to all involved! A related question:

I am implementing a ranking system, where the action sets per slot are not disjoint, i.e. i basically want a ranking without duplicates. The video mentions that the theory behind slates is worked out for the intersected/joint action case, but that it's still being worked on in VW.

Am I shooting myself in the foot if I use CCB instead of slates now? Is there some rough estimate of when joint action sets will be supported in slates mode? Is slates mode planned as a replacement for CCB? @jackgerrits is probably the one to ask :)

olgavrou
@olgavrou
Hi @maxpagels_twitter it sounds like you do have a CCB problem. Slates is an extension of CCB where the action set is supposed to be disjoint and there is a single global reward for the entire slate. In CCB you have a joint action space and therefore it will do a ranking for you. In CCB you can specify rewards for each slot. Attaching the documentation for each in case you want to take a closer look: CCB and Slates
Max Pagels
@olgavrou thanks, that was what I was thinking. Just wondered since the video mentioned slates with joint action spaces is on the roadmap, and I was wondering about its status. I'd rather have something with a global reward. For now though, I'll use CCB.

Regarding CCBs, I have a follow-up question. The docs mention this:

"If action_ids_to_include is excluded then all actions are implicitly included". What's the use case for action_ids_to_include?

It also states "This is currently unsupported". Does that refer to action_ids_to_include or the exclusion of action_ids_to_include :)?

olgavrou
@olgavrou
We do want to get there eventually but it isn't currently on the roadmap (at least not in the near future). A suggestion regarding the global reward might be assigning a global reward and distributing evenly to all slots? But I haven't tried that out myself so not sure what results you will get there :)
Nishant Kumar
@nishantkr18
Hey everyone! Would we be having the RLOS fest this yr? I believe the applications should have opened by now?
Lalit Jain
@lalitkumarj
Hi all, I am trying to get active learning working with VW. I'm successfully able to send unlabeled examples however, vw sends back a single float (presumably a prediction) which is always 0. In active_interactor.py (which seems quite out of data) it seems that sometimes vw should send back a list pf prediction,tag,importance which I can then send back along with the feature. This is also the model in these slides:https://cilvr.cs.nyu.edu/diglib/lsml/lecture12_active.pdf. Would anybody be able to provide some guidance on what could be going wrong? Thank you!!
Lalit Jain
@lalitkumarj
One additional data point: I just backed up to version 8.2.0 and things seem to be working fine there.
AnkitRai-22
@AnkitRai-22
Hi everyone, I plan to contribute to RLOSF 2021, problem number 20th - "AutoML for online learning". We are supposed to implement AutoML HPO(Hyperparameter Optimization) techniques for VW. But there are many algos available to achieve so. I am planning to use ParamILS to achieve so. Any suggestions or comments would be highiy appreciated.
Max Pagels

CCBs: I always get undefined loss with --passes >1 on my example dataset. is this intended?

More generally, there doesn't seem to be a ccb_adf option, only ccb_explore_adf, so it's not clear how to properly evaluate the policy (not the exploration algorithm) offline

Utkarsh Sharma
@utkarshsharma00
Hi, I plan to contribute to RLOSF 2021. As per the website the applications have started from 14th January 2021, but I am not able to find a link to application form. Any help would be highly appreciated.
Max Pagels
Related to my CCB question, pretty sure it's a bug. Made a Github issue: VowpalWabbit/vowpal_wabbit#2781
Josh Minor
@jishminor
To leverage the contextual bandit adf learner in vw, must the data samples supplied always have one action labeled with a:c:p? If I have existing data for contexts, actions and rewards (no probabilities), can this be used to train a model which would then be used to warm start an online learning session where vw generates predicted actions?
3 replies
Wenjuan Dou
@darlwen

Hi everyone, in vw source code, when compute prediction, we have the following code:

float finalize_prediction(shared_data* sd, vw_logger& logger, float ret)
{
if (std::isnan(ret))
{
ret = 0.;
if (!logger.quiet)
{ std::cerr << "NAN prediction in example " << sd->example_number + 1 << ", forcing " << ret << std::endl; }
return ret;
}
if (ret > sd->max_label) return (float)sd->max_label;
if (ret < sd->min_label) return (float)sd->min_label;
return ret;
}

If I use squaredloss, then the prediction for the above function's input is 1.36777e+09, but after finalize_prediction, it become 0, does it make sense?

peterychang
@peterychang

Hi, I plan to contribute to RLOSF 2021. As per the website the applications have started from 14th January 2021, but I am not able to find a link to application form. Any help would be highly appreciated.

Sorry about that, the date has been moved back to Feb 1 per https://www.microsoft.com/en-us/research/academic-program/rl-open-source-fest/

Jack Gerrits
@jackgerrits
@darlwen what is max_label and min_label when it is called?
8 replies
Bernardo Favoreto
Hey guys!
I am trying to run some experiments using --cb_explore_adf and I noticed that very often the model gets biased (the probability mass function output is mostly the same, regardless of context). I've tried using regularizers, modify the LR, add decay, and some other stuff, but I'm still not convinced the model is not biased, because when I run a few predictions for visualization, the PMF is often the same or at least the highest probability is at the same index.
That being said, I would like to know if anyone has any suggestion of what might be causing this? (I know that my dataset is not biased, though not perfectly balanced)
Also, when sampling an action from the PMF, why we don't always grab the index at which max(prob) occurs? I.e., why it is recommended to use the sample_custom_pmf (from: https://vowpalwabbit.org/tutorials/cb_simulation.html#getting-a-decision-from-vowpal-wabbit)? As I understand this is to add some kind of randomization, but aren't the model already exploring when we train it with explore_adf?
Would love to hear your feedback.
Cheers!
olgavrou
@olgavrou
Hi @Favoreto_B_twitter if you are doing epsilon-greedy then the pmf provided will have a probability (1-e) on the predicted action and the remaining probability is distributed evenly on the remaining actions. The reason you see it always at index 0 is that VW will swap the predicted action with the first index so that it is always at index 0.
For your second question, if we didn't sample from the pmf and just returned the predicted action (i.e. the one with the highest probability) then we would not be doing any exploration we would be exploiting 100% of the time. Sampling from the pmf means exactly that: that we will sample with a higher probability (1-e) the predicted action (exploiting) and with less probability one of the other actions (exploring)
The model doesn't explore, the model learns and predicts. The exploration happens with what you decide to eventually show to the user.
Bernardo Favoreto
Thanks @olgavrou.
I am not using only epsilon-greedy, though. I've seen bias for other algorithms as well, but the chosen action is not necessarily always at index 0 (could you elaborate on that?). One other thing I've noticed is that, depending on the namespace interactions I use, some contextual features don't seem to influence at all the model's prediction (e.g., if I'm using -q UA (user-action) and change a Location feature, it doesn't change the prediction), any idea why is that (that happened to me while using softmax explorer)?
The second part is pretty clear to me now, thanks!
pushpendre
@pushpendre
Hi, I was wondering if there are any online regression models implemented in VW beyond a linear model ? For example, is there a tree-based regressor in VW that can be trained online? or a DNN based regressor?
pushpendre
@pushpendre

For example,

The bandit bakeoff paper mentions that

We run our CB algorithms in an online fashion using Vowpal Wabbit: .... we consider online CSC or regression oracles. Online CSC itself reduces to multiple online regression problems in VW...

I understand the loss function and the gradient updates but I want to know what is online regression model class implemented in VW ?

6 replies
pushpendre
@pushpendre
Just for record, my question above is still open, the thread (till first 7 replies) went into another direction.
pushpendre
@pushpendre
Hi everyone, one more question, how do importance weights interact with AdaGrad ? IIUC importance weights are derived for vanilla SGD and not for AdaGrad. I was wondering how exactly these two tweaks are implemented together?
Josh Minor
@jishminor
This message was deleted
4 replies
pushpendre
@pushpendre

what is online regression model class implemented in VW ?

figured both out. thanks.

Raphael Ottoni
@raphaottoni
Hello guys
is anybody here?

I am following the tutorial on CTR with cb_explore_adf and I would love to know if it is possible to use the namespace feature article to be numeric...
in the tutorial, you guys tells us to do like this:

shared |User user=Tom time_of_day=morning
|Action article=politics
|Action article=sports
|Action article=music
|Action article=food

is it possible to pass numerical values and let the model generialize better when there is a new feature in the middle?

shared |User user=Tom time_of_day=morning
|Action price:2.99
|Action price:10.99

so later, when I want to test a new price, let's say 6.99 .. it will have a better estimator for it?
2 replies
Raphael Ottoni
@raphaottoni
I also opened a stack overflow question, so I could update the findings and help others š
Raphael Ottoni
@raphaottoni
@olgavrou, Iām little bit confused by the answer you gabe to @Favoreto_B_twitter. You said, and I quote, ā The reason you see it always at index 0 is that VW will swap the predicted action with the first index so that it is always at index 0ā .. what you are saying is that the internal index of a arm could change ?! In on intersction the index 0 of the PMF would be related to arm1 but in the next to arm2? How Am I suppose to know which arms are at each index given the pmf ? How can I validate those things ?
@olgavrou does it happe with the ācb_explore_adf ? I think it doesnt ... due to the order we pass on predicit, right ?
shared |User user=Tom time_of_day=morning
|Action article=politics
|Action article=sports
|Action article=music
|Action article=food
In this example , politicis would always be index 0 abd food always index 3, in the PMF right ?
Wilson Cheung
@wcheung-code
Hey all! Just wanted to celebrate that I finally got Vowpal Wabbit to successfully install on my Windows machine after 3 long nights after work of reading documentations. I am looking forward to play with VW more this weekend and start preparing my application for RL Open Source Fest to see where I can help contribute :) Looking forward to meeting you all!
Bernardo Favoreto
Hey guys, I am testing a new use case on data from a real website and would like to know: What you generally do to find which are the best contextual attributes to use? Purely intuition? Statistical analysis? Would love to hear some thoughts on that!
6 replies
Harsh Sharma
@hs2361

Hello everyone,
I'm Harsh Sharma, an undergraduate student from IIIT, Gwalior, pursuing Computer Science and Engineering. I'm interested in participating in the Microsoft RL Open Source fest this year, and I'm specifically interested in working on these projects:
17 - RL-based query planner for open-source SQL engine
20 - AutoML for online learning

Since I've worked with Deep Learning for the NL2SQL task before, I would like to work on 17. Could someone here please clarify what the "query planner" here refers to? Does it mean join query optimization? Also, I'd be really grateful if someone could guide me as to what would be the first step to implement such a query planner in an SQL engine.

Bernardo Favoreto

I have a question about using VW with cb_explore_adf and softmax explorer for ranking.

I am trying to use VW to perform ranking using the contextual bandit framework, specifically using --cb_explore_adf --softmax --lambda X. The choice of softmax is because, according to VW's docs: "This is a different explorer, which uses the policy not only to predict an action but also predict a score indicating the quality of each action." This quality-related score is what I would like to use for ranking.

The scenario is this: I have a list of items [A, B, C, D], and I would like to sort it in an order that maximizes a pre-defined metric (e.g., CTR). One of the problems, as I see, is that we cannot evaluate the items individually because we can't know for sure which item made the user click or not.

To test some approaches, I've created a dummy dataset. As a way to try and solve the above problem, I am using the entire ordered list as a way to evaluate if a click happens or not (e.g., given the context for user X, he will click if the items are [C, A, B, D]). Then, I reward the items individually according to their position on the list, i.e., reward = 1/P for 0 < P < len(list). Here, the reward for C, A, B, D is 1, 0.5, and 0.25, 0.125, respectively. If there's no click, the reward is zero for all items. The reasoning behind this is that more important items will stabilize on top and less important on the bottom.

Also, one of the difficulties I found was defining a sampling function for this approach. Typically, we're interested in selecting only one option, but here I have to sample multiple times (4 in the example). Because of that, it's not very clear how I should incorporate exploration when sampling items. I have a few ideas:

• Copy the probability mass function and assign it to copy_pmf. Draw a random number between 0 and max(copy_pmf) and for each probability value in copy_pmf, increment the sum_prob variable (very similar to the tutorial here:https://vowpalwabbit.org/tutorials/cb_simulation.html). When sum_prob > draw, we add the current item/prob to a list. Then, we remove this probability from copy_pmf, set sum_prob = 0, and draw a new number again between 0 and max(copy_pmf) (which might change or not).
• Another option is drawing a random number and, if the maximum probability, i.e., max(pmf) is greater than this number, we exploit. If it isn't, we shuffle the list and return this (explore). This approach requires tuning the lambda parameter, which controls the output pmf (I have seen cases where the max prob is > 0.99, which would mean around a 1% chance of exploring. I have also seen instances where max prob is ~0.5, which is around 50% exploration.

I would like to know if there are any suggestions regarding this problem, specifically sampling and the reward function. Also, if there are any things I might be missing here.

Thank you!

olgavrou
@olgavrou
Hi @raphaottoni I can see why my response is confusing, I was mixing up cb_explore, and cb_explore_adf. The tutorial does cb_explore_adf and will return a pmf from which we need to sample from. The pmf will have the a larger probability on the action that the model predicted (giving us a higher probability that we will exploit the predicted action) and smaller probabilities for the rest of the actions (giving us a smaller probability that we will explore). There is not index swapping here you are right.
olgavrou
@olgavrou
Hi @Favoreto_B_twitter have you checked out Conditional Contextual Bandits? It seems like your description is pointing towards that: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Conditional-Contextual-Bandit
Bernardo Favoreto
Hey @olgavrou, I did come across CCBs but never really dug deep to fully understand how they work. I've searched some tutorials but could not find, are there any?
Indeed, they seem like a good option, but I feel like the lack of material on the topic might be a barrier. If you have something to point me out to, I would love to see it!
Thanks!
14 replies
Bernardo Favoreto
@olgavrou After some reading, I am pretty confident CCB is the way to go for the problem I described. However, it's still unclear to me whether I should use CCB or Slates?
I can't really see how they differ. Apparently, Slates are built on top of CCB, but what is their purpose?
Marcos Passos
@marcospassos
Hi everyone! I just watched Milind Agarwal's talk about Contextual Bandits Data Visualization, and I got very interested in the library he mentioned, but I could not find it anywhere. Is it available somewhere?