Regarding CCBs, I have a follow-up question. The docs mention this:
"If action_ids_to_include is excluded then all actions are implicitly included". What's the use case for action_ids_to_include?
It also states "This is currently unsupported". Does that refer to action_ids_to_include or the exclusion of action_ids_to_include :)?
Hi everyone, in vw source code, when compute prediction, we have the following code:
float finalize_prediction(shared_data* sd, vw_logger& logger, float ret)
{
if (std::isnan(ret))
{
ret = 0.;
if (!logger.quiet)
{ std::cerr << "NAN prediction in example " << sd->example_number + 1 << ", forcing " << ret << std::endl; }
return ret;
}
if (ret > sd->max_label) return (float)sd->max_label;
if (ret < sd->min_label) return (float)sd->min_label;
return ret;
}
If I use squaredloss
, then the prediction for the above function's input is 1.36777e+09
, but after finalize_prediction
, it become 0, does it make sense?
Hi, I plan to contribute to RLOSF 2021. As per the website the applications have started from 14th January 2021, but I am not able to find a link to application form. Any help would be highly appreciated.
Sorry about that, the date has been moved back to Feb 1 per https://www.microsoft.com/en-us/research/academic-program/rl-open-source-fest/
--cb_explore_adf
and I noticed that very often the model gets biased (the probability mass function output is mostly the same, regardless of context). I've tried using regularizers, modify the LR, add decay, and some other stuff, but I'm still not convinced the model is not biased, because when I run a few predictions for visualization, the PMF is often the same or at least the highest probability is at the same index.max(prob)
occurs? I.e., why it is recommended to use the sample_custom_pmf
(from: https://vowpalwabbit.org/tutorials/cb_simulation.html#getting-a-decision-from-vowpal-wabbit)? As I understand this is to add some kind of randomization, but aren't the model already exploring when we train it with explore_adf?For example,
The bandit bakeoff paper mentions that
We run our CB algorithms in an online fashion using
Vowpal Wabbit: .... we consider online CSC or regression oracles. Online CSC itself reduces to multiple online
regression problems in VW...
I understand the loss function and the gradient updates but I want to know what is online regression model class implemented in VW ?
I am following the tutorial on CTR with cb_explore_adf and I would love to know if it is possible to use the namespace feature article to be numeric...
in the tutorial, you guys tells us to do like this:
shared |User user=Tom time_of_day=morning
|Action article=politics
|Action article=sports
|Action article=music
|Action article=food
is it possible to pass numerical values and let the model generialize better when there is a new feature in the middle?
shared |User user=Tom time_of_day=morning
|Action price:2.99
|Action price:10.99
Hello everyone,
I'm Harsh Sharma, an undergraduate student from IIIT, Gwalior, pursuing Computer Science and Engineering. I'm interested in participating in the Microsoft RL Open Source fest this year, and I'm specifically interested in working on these projects:
17 - RL-based query planner for open-source SQL engine
20 - AutoML for online learning
Since I've worked with Deep Learning for the NL2SQL task before, I would like to work on 17. Could someone here please clarify what the "query planner" here refers to? Does it mean join query optimization? Also, I'd be really grateful if someone could guide me as to what would be the first step to implement such a query planner in an SQL engine.
I have a question about using VW with cb_explore_adf
and softmax explorer for ranking.
I am trying to use VW to perform ranking using the contextual bandit framework, specifically using --cb_explore_adf --softmax --lambda X
. The choice of softmax is because, according to VW's docs: "This is a different explorer, which uses the policy not only to predict an action but also predict a score indicating the quality of each action." This quality-related score is what I would like to use for ranking.
The scenario is this: I have a list of items [A, B, C, D], and I would like to sort it in an order that maximizes a pre-defined metric (e.g., CTR). One of the problems, as I see, is that we cannot evaluate the items individually because we can't know for sure which item made the user click or not.
To test some approaches, I've created a dummy dataset. As a way to try and solve the above problem, I am using the entire ordered list as a way to evaluate if a click happens or not (e.g., given the context for user X, he will click if the items are [C, A, B, D]). Then, I reward the items individually according to their position on the list, i.e., reward = 1/P
for 0 < P < len(list). Here, the reward for C, A, B, D is 1, 0.5, and 0.25, 0.125, respectively. If there's no click, the reward is zero for all items. The reasoning behind this is that more important items will stabilize on top and less important on the bottom.
Also, one of the difficulties I found was defining a sampling function for this approach. Typically, we're interested in selecting only one option, but here I have to sample multiple times (4 in the example). Because of that, it's not very clear how I should incorporate exploration when sampling items. I have a few ideas:
copy_pmf
. Draw a random number between 0 and max(copy_pmf)
and for each probability value in copy_pmf
, increment the sum_prob
variable (very similar to the tutorial here:https://vowpalwabbit.org/tutorials/cb_simulation.html). When sum_prob > draw
, we add the current item/prob to a list. Then, we remove this probability from copy_pmf
, set sum_prob = 0
, and draw a new number again between 0 and max(copy_pmf)
(which might change or not). max(pmf)
is greater than this number, we exploit. If it isn't, we shuffle the list and return this (explore). This approach requires tuning the lambda
parameter, which controls the output pmf
(I have seen cases where the max prob is > 0.99, which would mean around a 1% chance of exploring. I have also seen instances where max prob is ~0.5, which is around 50% exploration.I would like to know if there are any suggestions regarding this problem, specifically sampling and the reward function. Also, if there are any things I might be missing here.
Thank you!
Guys would please help me with two questions:
1) Using a cb_explore_adf, for a pricing agent. I was trying two types of reward: i) Sales and ii) sales X Price where each arm is a Price. I have noticed that the cb_explore_adf converge well when the reward is sales, but when we multiple the sales by the arm price. it simply doesnt converge at all. Is it possible that it is sensitive to scale? sales are in units ( like 40 at most) and price are in cents ( e.g 399).
2) Another quick question? How to pass multiple Namespaces features in the the -q UA parameter.... I mean I want to add more variables from another namespace, something like -q [UA, MA].
-q UA UM MA
and so on, depending on how many namespaces you have--interactions
flag. Here, you could use, for example --interactions UA UM UAM
-q ::
. You should notice that using this is slower because there are lots of features created on-the-fly. Also, you should see a warning saying that some repeated features were ignored (by default in VW).