hi @darlwen ,
Stack of reductions for every vw run is defined by 2 things:
1) DAG of dependencies that are defined in setup function for every reduction.
i.e. here:
https://github.com/VowpalWabbit/vowpal_wabbit/blob/b8732ffec3f8c7150dace1c41434bf3cdb4d8436/vowpalwabbit/cb_explore_adf_greedy.cc#L96
if we have cb_explore_adf reduction included, we also include cb_adf one.
2) topoligical order here: https://github.com/VowpalWabbit/vowpal_wabbit/blob/b8732ffec3f8c7150dace1c41434bf3cdb4d8436/vowpalwabbit/parse_args.cc#L1246
So, final stack of reduction for each vw run is actually sub-stack from 2) that contains:
1) reductions that you explicitly provided in your command line
2) reductions that defined in input model file (if any)
3) reductions populated as dependencies.
In your case you have ccb_explore_adf, ftrl provided explicitly by you, others are populated as dependencies:
ccb_explore_adf -> cb_sample
ccb_explore_adf -> cb_explore_adf_greedy -> cb_adf -> csoaa_ldf
thanks @ataymano much more clear now. In VW::LEARNER::base_learner* setup_base(options_i& options, vw& all)
when enter the following logic,
else
{
all.enabled_reductions.push_back(std::get<0>(setup_func));
return base;
}
my understanding is that it won't do auto setup_func = all.reduction_stack.top();
anymore, for example, when we get "ftrl_setup" then it enters the else
logic, then how it makes the rest reductions(scorer, ccb_explore_adf etc.) enabled?
pyvw.vw
object to process a data file when I instantiate it with a --data
argument. Based on this fairly recent s.o. answer https://stackoverflow.com/a/62876763, my understanding is that it should do just that, but I am not having any luck. I'm using vw version 8.9.0, did something change in a recent release? I have confirmed that using the same options from the command line works so I don't think I'm doing something obviously wrong like using a wrong file name
reported cost/probability
, or 0 if cost is not reported. (c(a) = cost/probability * I(observed action = a)
). Unbiased if probabilities are correct, usually high varianceThis is the only thing I’ve found that describes the implementation for csoaa: http://users.umiacs.umd.edu/~hal/tmp/multiclassVW.html. As I read it, that means csc based bandit methods:
If so, is it reasonable to think of ips and mtr as essentially the same except:
cost * I(action = observed action)/probability
as target and 1
as weightcost
as target and I(action = observed action)/probability
as weightmean(cost * I(observed action = predicted action) / probability))
or something more sophisticated, like https://arxiv.org/abs/1210?
I'm using explore_eval
to evaluate exploration algorithms (e-greedy with different epsilon values). Can someone confirm that explore_eval
isn't intended for use with more than one pass over the data?
The core issue I have is that i'd like to evaluate the best policy + exploration algorithm for a system in which the policy is trained once per week and then deployed. So the model itself is stationary for a week but across e.g. a year, it isn't. I'd like to use data generated by this system to do offline evaluation of new policies + exploration algorithms
Hi all, I am reading the code to make clear how vw do epsilon greedy exploration.
I find the following code in cb_explore_adf_greedy.cc:
void cb_explore_adf_greedy::predict_or_learn_impl(VW::LEARNER::multi_learner& base, multi_ex& examples)
{
// Explore uniform random an epsilon fraction of the time.
VW::LEARNER::multiline_learn_or_predict<is_learn>(base, examples, examples[0]->ft_offset);
ACTION_SCORE::action_scores& preds = examples[0]->pred.a_s;
uint32_t num_actions = (uint32_t)preds.size();
size_t tied_actions = fill_tied(preds);
const float prob = _epsilon / num_actions;
for (size_t i = 0; i < num_actions; i++) preds[i].score = prob;
if (!_first_only)
{
for (size_t i = 0; i < tied_actions; ++i) preds[i].score += (1.f - _epsilon) / tied_actions;
}
else
preds[0].score += 1.f - _epsilon;
}
It givens the action with the largest cost a score: 1-epsilon, and the rest actions a score: epsilon/num_actions. Is this how it do exploration based on epsilon? I am a little confused about it, can someone help explain it?
Epsilon greedy works as follows (example with 4 arms):
Per round, choose the best arm given context (i.e. arm with lowest cost) with probability epsilon. With probability 1-epsilon, choose an arm uniformly at random.
With epsilon = 0.1, at any given round, the probability of choosing the best arm is 1-0.1 plus 0.1 x 1/4 -> 0.925 ("exploit"). The probability of choosing a suboptimal arm is 0.1 * (1/4) = 0.025 ("explore")
Watched the great content at https://slideslive.com/38942331/vowpal-wabbit, thanks to all involved! A related question:
I am implementing a ranking system, where the action sets per slot are not disjoint, i.e. i basically want a ranking without duplicates. The video mentions that the theory behind slates is worked out for the intersected/joint action case, but that it's still being worked on in VW.
Am I shooting myself in the foot if I use CCB instead of slates now? Is there some rough estimate of when joint action sets will be supported in slates mode? Is slates mode planned as a replacement for CCB? @jackgerrits is probably the one to ask :)
Regarding CCBs, I have a follow-up question. The docs mention this:
"If action_ids_to_include is excluded then all actions are implicitly included". What's the use case for action_ids_to_include?
It also states "This is currently unsupported". Does that refer to action_ids_to_include or the exclusion of action_ids_to_include :)?
Hi everyone, in vw source code, when compute prediction, we have the following code:
float finalize_prediction(shared_data* sd, vw_logger& logger, float ret)
{
if (std::isnan(ret))
{
ret = 0.;
if (!logger.quiet)
{ std::cerr << "NAN prediction in example " << sd->example_number + 1 << ", forcing " << ret << std::endl; }
return ret;
}
if (ret > sd->max_label) return (float)sd->max_label;
if (ret < sd->min_label) return (float)sd->min_label;
return ret;
}
If I use squaredloss
, then the prediction for the above function's input is 1.36777e+09
, but after finalize_prediction
, it become 0, does it make sense?
Hi, I plan to contribute to RLOSF 2021. As per the website the applications have started from 14th January 2021, but I am not able to find a link to application form. Any help would be highly appreciated.
Sorry about that, the date has been moved back to Feb 1 per https://www.microsoft.com/en-us/research/academic-program/rl-open-source-fest/