Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Max Pagels
    @maxpagels_twitter

    Regarding CCBs, I have a follow-up question. The docs mention this:

    "If action_ids_to_include is excluded then all actions are implicitly included". What's the use case for action_ids_to_include?

    It also states "This is currently unsupported". Does that refer to action_ids_to_include or the exclusion of action_ids_to_include :)?

    olgavrou
    @olgavrou
    We do want to get there eventually but it isn't currently on the roadmap (at least not in the near future). A suggestion regarding the global reward might be assigning a global reward and distributing evenly to all slots? But I haven't tried that out myself so not sure what results you will get there :)
    Nishant Kumar
    @nishantkr18
    Hey everyone! Would we be having the RLOS fest this yr? I believe the applications should have opened by now?
    Lalit Jain
    @lalitkumarj
    Hi all, I am trying to get active learning working with VW. I'm successfully able to send unlabeled examples however, vw sends back a single float (presumably a prediction) which is always 0. In active_interactor.py (which seems quite out of data) it seems that sometimes vw should send back a list pf prediction,tag,importance which I can then send back along with the feature. This is also the model in these slides:https://cilvr.cs.nyu.edu/diglib/lsml/lecture12_active.pdf. Would anybody be able to provide some guidance on what could be going wrong? Thank you!!
    Lalit Jain
    @lalitkumarj
    One additional data point: I just backed up to version 8.2.0 and things seem to be working fine there.
    AnkitRai-22
    @AnkitRai-22
    Hi everyone, I plan to contribute to RLOSF 2021, problem number 20th - "AutoML for online learning". We are supposed to implement AutoML HPO(Hyperparameter Optimization) techniques for VW. But there are many algos available to achieve so. I am planning to use ParamILS to achieve so. Any suggestions or comments would be highiy appreciated.
    Max Pagels
    @maxpagels_twitter

    CCBs: I always get undefined loss with --passes >1 on my example dataset. is this intended?

    More generally, there doesn't seem to be a ccb_adf option, only ccb_explore_adf, so it's not clear how to properly evaluate the policy (not the exploration algorithm) offline

    Utkarsh Sharma
    @utkarshsharma00
    Hi, I plan to contribute to RLOSF 2021. As per the website the applications have started from 14th January 2021, but I am not able to find a link to application form. Any help would be highly appreciated.
    Max Pagels
    @maxpagels_twitter
    Related to my CCB question, pretty sure it's a bug. Made a Github issue: VowpalWabbit/vowpal_wabbit#2781
    Josh Minor
    @jishminor
    To leverage the contextual bandit adf learner in vw, must the data samples supplied always have one action labeled with a:c:p? If I have existing data for contexts, actions and rewards (no probabilities), can this be used to train a model which would then be used to warm start an online learning session where vw generates predicted actions?
    3 replies
    Wenjuan Dou
    @darlwen

    Hi everyone, in vw source code, when compute prediction, we have the following code:

    float finalize_prediction(shared_data* sd, vw_logger& logger, float ret)
    {
      if (std::isnan(ret))
      {
        ret = 0.;
        if (!logger.quiet)
        { std::cerr << "NAN prediction in example " << sd->example_number + 1 << ", forcing " << ret << std::endl; }
        return ret;
      }
      if (ret > sd->max_label) return (float)sd->max_label;
      if (ret < sd->min_label) return (float)sd->min_label;
      return ret;
    }

    If I use squaredloss, then the prediction for the above function's input is 1.36777e+09, but after finalize_prediction, it become 0, does it make sense?

    peterychang
    @peterychang

    Hi, I plan to contribute to RLOSF 2021. As per the website the applications have started from 14th January 2021, but I am not able to find a link to application form. Any help would be highly appreciated.

    Sorry about that, the date has been moved back to Feb 1 per https://www.microsoft.com/en-us/research/academic-program/rl-open-source-fest/

    Jack Gerrits
    @jackgerrits
    @darlwen what is max_label and min_label when it is called?
    8 replies
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys!
    I am trying to run some experiments using --cb_explore_adf and I noticed that very often the model gets biased (the probability mass function output is mostly the same, regardless of context). I've tried using regularizers, modify the LR, add decay, and some other stuff, but I'm still not convinced the model is not biased, because when I run a few predictions for visualization, the PMF is often the same or at least the highest probability is at the same index.
    That being said, I would like to know if anyone has any suggestion of what might be causing this? (I know that my dataset is not biased, though not perfectly balanced)
    Also, when sampling an action from the PMF, why we don't always grab the index at which max(prob) occurs? I.e., why it is recommended to use the sample_custom_pmf (from: https://vowpalwabbit.org/tutorials/cb_simulation.html#getting-a-decision-from-vowpal-wabbit)? As I understand this is to add some kind of randomization, but aren't the model already exploring when we train it with explore_adf?
    Would love to hear your feedback.
    Cheers!
    olgavrou
    @olgavrou
    Hi @Favoreto_B_twitter if you are doing epsilon-greedy then the pmf provided will have a probability (1-e) on the predicted action and the remaining probability is distributed evenly on the remaining actions. The reason you see it always at index 0 is that VW will swap the predicted action with the first index so that it is always at index 0.
    For your second question, if we didn't sample from the pmf and just returned the predicted action (i.e. the one with the highest probability) then we would not be doing any exploration we would be exploiting 100% of the time. Sampling from the pmf means exactly that: that we will sample with a higher probability (1-e) the predicted action (exploiting) and with less probability one of the other actions (exploring)
    The model doesn't explore, the model learns and predicts. The exploration happens with what you decide to eventually show to the user.
    Bernardo Favoreto
    @Favoreto_B_twitter
    Thanks @olgavrou.
    I am not using only epsilon-greedy, though. I've seen bias for other algorithms as well, but the chosen action is not necessarily always at index 0 (could you elaborate on that?). One other thing I've noticed is that, depending on the namespace interactions I use, some contextual features don't seem to influence at all the model's prediction (e.g., if I'm using -q UA (user-action) and change a Location feature, it doesn't change the prediction), any idea why is that (that happened to me while using softmax explorer)?
    The second part is pretty clear to me now, thanks!
    pushpendre
    @pushpendre
    Hi, I was wondering if there are any online regression models implemented in VW beyond a linear model ? For example, is there a tree-based regressor in VW that can be trained online? or a DNN based regressor?
    pushpendre
    @pushpendre

    For example,

    The bandit bakeoff paper mentions that

    We run our CB algorithms in an online fashion using Vowpal Wabbit: .... we consider online CSC or regression oracles. Online CSC itself reduces to multiple online regression problems in VW...

    I understand the loss function and the gradient updates but I want to know what is online regression model class implemented in VW ?

    6 replies
    pushpendre
    @pushpendre
    Just for record, my question above is still open, the thread (till first 7 replies) went into another direction.
    pushpendre
    @pushpendre
    Hi everyone, one more question, how do importance weights interact with AdaGrad ? IIUC importance weights are derived for vanilla SGD and not for AdaGrad. I was wondering how exactly these two tweaks are implemented together?
    Josh Minor
    @jishminor
    This message was deleted
    4 replies
    pushpendre
    @pushpendre

    what is online regression model class implemented in VW ?
    one more question, how do importance weights interact with AdaGrad ?

    figured both out. thanks.

    Raphael Ottoni
    @raphaottoni
    Hello guys
    is anybody here?

    I am following the tutorial on CTR with cb_explore_adf and I would love to know if it is possible to use the namespace feature article to be numeric...
    in the tutorial, you guys tells us to do like this:

    shared |User user=Tom time_of_day=morning
    |Action article=politics
    |Action article=sports
    |Action article=music
    |Action article=food

    is it possible to pass numerical values and let the model generialize better when there is a new feature in the middle?

    shared |User user=Tom time_of_day=morning
    |Action price:2.99
    |Action price:10.99

    so later, when I want to test a new price, let's say 6.99 .. it will have a better estimator for it?
    2 replies
    Raphael Ottoni
    @raphaottoni
    I also opened a stack overflow question, so I could update the findings and help others šŸ˜Š
    Raphael Ottoni
    @raphaottoni
    @olgavrou, Iā€™m little bit confused by the answer you gabe to @Favoreto_B_twitter. You said, and I quote, ā€œ The reason you see it always at index 0 is that VW will swap the predicted action with the first index so that it is always at index 0ā€ .. what you are saying is that the internal index of a arm could change ?! In on intersction the index 0 of the PMF would be related to arm1 but in the next to arm2? How Am I suppose to know which arms are at each index given the pmf ? How can I validate those things ?
    @olgavrou does it happe with the ā€”cb_explore_adf ? I think it doesnt ... due to the order we pass on predicit, right ?
    shared |User user=Tom time_of_day=morning
    |Action article=politics
    |Action article=sports
    |Action article=music
    |Action article=food
    In this example , politicis would always be index 0 abd food always index 3, in the PMF right ?
    Wilson Cheung
    @wcheung-code
    Hey all! Just wanted to celebrate that I finally got Vowpal Wabbit to successfully install on my Windows machine after 3 long nights after work of reading documentations. I am looking forward to play with VW more this weekend and start preparing my application for RL Open Source Fest to see where I can help contribute :) Looking forward to meeting you all!
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys, I am testing a new use case on data from a real website and would like to know: What you generally do to find which are the best contextual attributes to use? Purely intuition? Statistical analysis? Would love to hear some thoughts on that!
    6 replies
    Harsh Sharma
    @hs2361

    Hello everyone,
    I'm Harsh Sharma, an undergraduate student from IIIT, Gwalior, pursuing Computer Science and Engineering. I'm interested in participating in the Microsoft RL Open Source fest this year, and I'm specifically interested in working on these projects:
    17 - RL-based query planner for open-source SQL engine
    20 - AutoML for online learning

    Since I've worked with Deep Learning for the NL2SQL task before, I would like to work on 17. Could someone here please clarify what the "query planner" here refers to? Does it mean join query optimization? Also, I'd be really grateful if someone could guide me as to what would be the first step to implement such a query planner in an SQL engine.

    Bernardo Favoreto
    @Favoreto_B_twitter

    I have a question about using VW with cb_explore_adf and softmax explorer for ranking.

    I am trying to use VW to perform ranking using the contextual bandit framework, specifically using --cb_explore_adf --softmax --lambda X. The choice of softmax is because, according to VW's docs: "This is a different explorer, which uses the policy not only to predict an action but also predict a score indicating the quality of each action." This quality-related score is what I would like to use for ranking.

    The scenario is this: I have a list of items [A, B, C, D], and I would like to sort it in an order that maximizes a pre-defined metric (e.g., CTR). One of the problems, as I see, is that we cannot evaluate the items individually because we can't know for sure which item made the user click or not.

    To test some approaches, I've created a dummy dataset. As a way to try and solve the above problem, I am using the entire ordered list as a way to evaluate if a click happens or not (e.g., given the context for user X, he will click if the items are [C, A, B, D]). Then, I reward the items individually according to their position on the list, i.e., reward = 1/P for 0 < P < len(list). Here, the reward for C, A, B, D is 1, 0.5, and 0.25, 0.125, respectively. If there's no click, the reward is zero for all items. The reasoning behind this is that more important items will stabilize on top and less important on the bottom.

    Also, one of the difficulties I found was defining a sampling function for this approach. Typically, we're interested in selecting only one option, but here I have to sample multiple times (4 in the example). Because of that, it's not very clear how I should incorporate exploration when sampling items. I have a few ideas:

    • Copy the probability mass function and assign it to copy_pmf. Draw a random number between 0 and max(copy_pmf) and for each probability value in copy_pmf, increment the sum_prob variable (very similar to the tutorial here:https://vowpalwabbit.org/tutorials/cb_simulation.html). When sum_prob > draw, we add the current item/prob to a list. Then, we remove this probability from copy_pmf, set sum_prob = 0, and draw a new number again between 0 and max(copy_pmf) (which might change or not).
    • Another option is drawing a random number and, if the maximum probability, i.e., max(pmf) is greater than this number, we exploit. If it isn't, we shuffle the list and return this (explore). This approach requires tuning the lambda parameter, which controls the output pmf (I have seen cases where the max prob is > 0.99, which would mean around a 1% chance of exploring. I have also seen instances where max prob is ~0.5, which is around 50% exploration.

    I would like to know if there are any suggestions regarding this problem, specifically sampling and the reward function. Also, if there are any things I might be missing here.

    Thank you!

    olgavrou
    @olgavrou
    Hi @raphaottoni I can see why my response is confusing, I was mixing up cb_explore, and cb_explore_adf. The tutorial does cb_explore_adf and will return a pmf from which we need to sample from. The pmf will have the a larger probability on the action that the model predicted (giving us a higher probability that we will exploit the predicted action) and smaller probabilities for the rest of the actions (giving us a smaller probability that we will explore). There is not index swapping here you are right.
    olgavrou
    @olgavrou
    Hi @Favoreto_B_twitter have you checked out Conditional Contextual Bandits? It seems like your description is pointing towards that: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Conditional-Contextual-Bandit
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey @olgavrou, I did come across CCBs but never really dug deep to fully understand how they work. I've searched some tutorials but could not find, are there any?
    Indeed, they seem like a good option, but I feel like the lack of material on the topic might be a barrier. If you have something to point me out to, I would love to see it!
    Thanks!
    14 replies
    Bernardo Favoreto
    @Favoreto_B_twitter
    @olgavrou After some reading, I am pretty confident CCB is the way to go for the problem I described. However, it's still unclear to me whether I should use CCB or Slates?
    I can't really see how they differ. Apparently, Slates are built on top of CCB, but what is their purpose?
    1 reply
    Marcos Passos
    @marcospassos
    Hi everyone! I just watched Milind Agarwal's talk about Contextual Bandits Data Visualization, and I got very interested in the library he mentioned, but I could not find it anywhere. Is it available somewhere?
    Arunagirinadan Sudharshan
    @SudhanAruna
    Hi everyone, I'm student applying for RLOS 2021 and I'm planning to work on vw daemon to use gRPC task. I wanted to whether I can email any mentors or maintainers to clarify certain doubts which will be helpful to finish the proposal. Please let me know if its possible.
    Raphael Ottoni
    @raphaottoni

    Guys would please help me with two questions:
    1) Using a cb_explore_adf, for a pricing agent. I was trying two types of reward: i) Sales and ii) sales X Price where each arm is a Price. I have noticed that the cb_explore_adf converge well when the reward is sales, but when we multiple the sales by the arm price. it simply doesnt converge at all. Is it possible that it is sensitive to scale? sales are in units ( like 40 at most) and price are in cents ( e.g 399).

    2) Another quick question? How to pass multiple Namespaces features in the the -q UA parameter.... I mean I want to add more variables from another namespace, something like -q [UA, MA].

    6 replies
    (U)ser and (M)erchant and (A)ction
    Bernardo Favoreto
    @Favoreto_B_twitter
    @raphaottoni I can help with your second question: adding multiple interactions is pretty straightforward in VW. You can do it in a few ways:
    1) Simply pass multiple interactions after the -q flag, e.g. -q UA UM MA and so on, depending on how many namespaces you have
    2) You can also pass different types of interactions (i.e., quadratic or cubic) using the --interactions flag. Here, you could use, for example --interactions UA UM UAM
    3) Finally, you can specify all possible interactions by using a special "symbol" (don't know if it's the appropriate way of calling it). For example, specify all possible quadratic interactions -q ::. You should notice that using this is slower because there are lots of features created on-the-fly. Also, you should see a warning saying that some repeated features were ignored (by default in VW).
    Raphael Ottoni
    @raphaottoni
    @Favoreto_B_twitter thank you!
    7 replies
    @Favoreto_B_twitter Could you please, point me in the Documentation on this --intereactions? I didnt get what is the difference between -q and --interactions or how to say it is quadratic or cubic
    Bernardo Favoreto
    @Favoreto_B_twitter
    @raphaottoni Unfortunately there isn't any specification of it in the docs (at least I did not find it). I feel your pain, I just discovered about this by accident when watching their latest presentation, so I don't really know the details.
    As for how to say when it is quadratic or cubic, you can just think of how many namespaces are you interacting: if you use UAM, this is a cubic interaction. If you use UA, this is a quadratic. I believe this is valid, though I cannot confirm
    3 replies