Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    olgavrou
    @olgavrou
    Hi @Favoreto_B_twitter if you are doing epsilon-greedy then the pmf provided will have a probability (1-e) on the predicted action and the remaining probability is distributed evenly on the remaining actions. The reason you see it always at index 0 is that VW will swap the predicted action with the first index so that it is always at index 0.
    For your second question, if we didn't sample from the pmf and just returned the predicted action (i.e. the one with the highest probability) then we would not be doing any exploration we would be exploiting 100% of the time. Sampling from the pmf means exactly that: that we will sample with a higher probability (1-e) the predicted action (exploiting) and with less probability one of the other actions (exploring)
    The model doesn't explore, the model learns and predicts. The exploration happens with what you decide to eventually show to the user.
    Bernardo Favoreto
    @Favoreto_B_twitter
    Thanks @olgavrou.
    I am not using only epsilon-greedy, though. I've seen bias for other algorithms as well, but the chosen action is not necessarily always at index 0 (could you elaborate on that?). One other thing I've noticed is that, depending on the namespace interactions I use, some contextual features don't seem to influence at all the model's prediction (e.g., if I'm using -q UA (user-action) and change a Location feature, it doesn't change the prediction), any idea why is that (that happened to me while using softmax explorer)?
    The second part is pretty clear to me now, thanks!
    pushpendre
    @pushpendre
    Hi, I was wondering if there are any online regression models implemented in VW beyond a linear model ? For example, is there a tree-based regressor in VW that can be trained online? or a DNN based regressor?
    pushpendre
    @pushpendre

    For example,

    The bandit bakeoff paper mentions that

    We run our CB algorithms in an online fashion using Vowpal Wabbit: .... we consider online CSC or regression oracles. Online CSC itself reduces to multiple online regression problems in VW...

    I understand the loss function and the gradient updates but I want to know what is online regression model class implemented in VW ?

    6 replies
    pushpendre
    @pushpendre
    Just for record, my question above is still open, the thread (till first 7 replies) went into another direction.
    pushpendre
    @pushpendre
    Hi everyone, one more question, how do importance weights interact with AdaGrad ? IIUC importance weights are derived for vanilla SGD and not for AdaGrad. I was wondering how exactly these two tweaks are implemented together?
    Josh Minor
    @jishminor
    This message was deleted
    4 replies
    pushpendre
    @pushpendre

    what is online regression model class implemented in VW ?
    one more question, how do importance weights interact with AdaGrad ?

    figured both out. thanks.

    Raphael Ottoni
    @raphaottoni
    Hello guys
    is anybody here?

    I am following the tutorial on CTR with cb_explore_adf and I would love to know if it is possible to use the namespace feature article to be numeric...
    in the tutorial, you guys tells us to do like this:

    shared |User user=Tom time_of_day=morning
    |Action article=politics
    |Action article=sports
    |Action article=music
    |Action article=food

    is it possible to pass numerical values and let the model generialize better when there is a new feature in the middle?

    shared |User user=Tom time_of_day=morning
    |Action price:2.99
    |Action price:10.99

    so later, when I want to test a new price, let's say 6.99 .. it will have a better estimator for it?
    2 replies
    Raphael Ottoni
    @raphaottoni
    I also opened a stack overflow question, so I could update the findings and help others šŸ˜Š
    Raphael Ottoni
    @raphaottoni
    @olgavrou, Iā€™m little bit confused by the answer you gabe to @Favoreto_B_twitter. You said, and I quote, ā€œ The reason you see it always at index 0 is that VW will swap the predicted action with the first index so that it is always at index 0ā€ .. what you are saying is that the internal index of a arm could change ?! In on intersction the index 0 of the PMF would be related to arm1 but in the next to arm2? How Am I suppose to know which arms are at each index given the pmf ? How can I validate those things ?
    @olgavrou does it happe with the ā€”cb_explore_adf ? I think it doesnt ... due to the order we pass on predicit, right ?
    shared |User user=Tom time_of_day=morning
    |Action article=politics
    |Action article=sports
    |Action article=music
    |Action article=food
    In this example , politicis would always be index 0 abd food always index 3, in the PMF right ?
    Wilson Cheung
    @wcheung-code
    Hey all! Just wanted to celebrate that I finally got Vowpal Wabbit to successfully install on my Windows machine after 3 long nights after work of reading documentations. I am looking forward to play with VW more this weekend and start preparing my application for RL Open Source Fest to see where I can help contribute :) Looking forward to meeting you all!
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys, I am testing a new use case on data from a real website and would like to know: What you generally do to find which are the best contextual attributes to use? Purely intuition? Statistical analysis? Would love to hear some thoughts on that!
    6 replies
    Harsh Sharma
    @hs2361

    Hello everyone,
    I'm Harsh Sharma, an undergraduate student from IIIT, Gwalior, pursuing Computer Science and Engineering. I'm interested in participating in the Microsoft RL Open Source fest this year, and I'm specifically interested in working on these projects:
    17 - RL-based query planner for open-source SQL engine
    20 - AutoML for online learning

    Since I've worked with Deep Learning for the NL2SQL task before, I would like to work on 17. Could someone here please clarify what the "query planner" here refers to? Does it mean join query optimization? Also, I'd be really grateful if someone could guide me as to what would be the first step to implement such a query planner in an SQL engine.

    Bernardo Favoreto
    @Favoreto_B_twitter

    I have a question about using VW with cb_explore_adf and softmax explorer for ranking.

    I am trying to use VW to perform ranking using the contextual bandit framework, specifically using --cb_explore_adf --softmax --lambda X. The choice of softmax is because, according to VW's docs: "This is a different explorer, which uses the policy not only to predict an action but also predict a score indicating the quality of each action." This quality-related score is what I would like to use for ranking.

    The scenario is this: I have a list of items [A, B, C, D], and I would like to sort it in an order that maximizes a pre-defined metric (e.g., CTR). One of the problems, as I see, is that we cannot evaluate the items individually because we can't know for sure which item made the user click or not.

    To test some approaches, I've created a dummy dataset. As a way to try and solve the above problem, I am using the entire ordered list as a way to evaluate if a click happens or not (e.g., given the context for user X, he will click if the items are [C, A, B, D]). Then, I reward the items individually according to their position on the list, i.e., reward = 1/P for 0 < P < len(list). Here, the reward for C, A, B, D is 1, 0.5, and 0.25, 0.125, respectively. If there's no click, the reward is zero for all items. The reasoning behind this is that more important items will stabilize on top and less important on the bottom.

    Also, one of the difficulties I found was defining a sampling function for this approach. Typically, we're interested in selecting only one option, but here I have to sample multiple times (4 in the example). Because of that, it's not very clear how I should incorporate exploration when sampling items. I have a few ideas:

    • Copy the probability mass function and assign it to copy_pmf. Draw a random number between 0 and max(copy_pmf) and for each probability value in copy_pmf, increment the sum_prob variable (very similar to the tutorial here:https://vowpalwabbit.org/tutorials/cb_simulation.html). When sum_prob > draw, we add the current item/prob to a list. Then, we remove this probability from copy_pmf, set sum_prob = 0, and draw a new number again between 0 and max(copy_pmf) (which might change or not).
    • Another option is drawing a random number and, if the maximum probability, i.e., max(pmf) is greater than this number, we exploit. If it isn't, we shuffle the list and return this (explore). This approach requires tuning the lambda parameter, which controls the output pmf (I have seen cases where the max prob is > 0.99, which would mean around a 1% chance of exploring. I have also seen instances where max prob is ~0.5, which is around 50% exploration.

    I would like to know if there are any suggestions regarding this problem, specifically sampling and the reward function. Also, if there are any things I might be missing here.

    Thank you!

    olgavrou
    @olgavrou
    Hi @raphaottoni I can see why my response is confusing, I was mixing up cb_explore, and cb_explore_adf. The tutorial does cb_explore_adf and will return a pmf from which we need to sample from. The pmf will have the a larger probability on the action that the model predicted (giving us a higher probability that we will exploit the predicted action) and smaller probabilities for the rest of the actions (giving us a smaller probability that we will explore). There is not index swapping here you are right.
    olgavrou
    @olgavrou
    Hi @Favoreto_B_twitter have you checked out Conditional Contextual Bandits? It seems like your description is pointing towards that: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Conditional-Contextual-Bandit
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey @olgavrou, I did come across CCBs but never really dug deep to fully understand how they work. I've searched some tutorials but could not find, are there any?
    Indeed, they seem like a good option, but I feel like the lack of material on the topic might be a barrier. If you have something to point me out to, I would love to see it!
    Thanks!
    14 replies
    Bernardo Favoreto
    @Favoreto_B_twitter
    @olgavrou After some reading, I am pretty confident CCB is the way to go for the problem I described. However, it's still unclear to me whether I should use CCB or Slates?
    I can't really see how they differ. Apparently, Slates are built on top of CCB, but what is their purpose?
    1 reply
    Marcos Passos
    @marcospassos
    Hi everyone! I just watched Milind Agarwal's talk about Contextual Bandits Data Visualization, and I got very interested in the library he mentioned, but I could not find it anywhere. Is it available somewhere?
    Arunagirinadan Sudharshan
    @SudhanAruna
    Hi everyone, I'm student applying for RLOS 2021 and I'm planning to work on vw daemon to use gRPC task. I wanted to whether I can email any mentors or maintainers to clarify certain doubts which will be helpful to finish the proposal. Please let me know if its possible.
    Raphael Ottoni
    @raphaottoni

    Guys would please help me with two questions:
    1) Using a cb_explore_adf, for a pricing agent. I was trying two types of reward: i) Sales and ii) sales X Price where each arm is a Price. I have noticed that the cb_explore_adf converge well when the reward is sales, but when we multiple the sales by the arm price. it simply doesnt converge at all. Is it possible that it is sensitive to scale? sales are in units ( like 40 at most) and price are in cents ( e.g 399).

    2) Another quick question? How to pass multiple Namespaces features in the the -q UA parameter.... I mean I want to add more variables from another namespace, something like -q [UA, MA].

    6 replies
    (U)ser and (M)erchant and (A)ction
    Bernardo Favoreto
    @Favoreto_B_twitter
    @raphaottoni I can help with your second question: adding multiple interactions is pretty straightforward in VW. You can do it in a few ways:
    1) Simply pass multiple interactions after the -q flag, e.g. -q UA UM MA and so on, depending on how many namespaces you have
    2) You can also pass different types of interactions (i.e., quadratic or cubic) using the --interactions flag. Here, you could use, for example --interactions UA UM UAM
    3) Finally, you can specify all possible interactions by using a special "symbol" (don't know if it's the appropriate way of calling it). For example, specify all possible quadratic interactions -q ::. You should notice that using this is slower because there are lots of features created on-the-fly. Also, you should see a warning saying that some repeated features were ignored (by default in VW).
    Raphael Ottoni
    @raphaottoni
    @Favoreto_B_twitter thank you!
    7 replies
    @Favoreto_B_twitter Could you please, point me in the Documentation on this --intereactions? I didnt get what is the difference between -q and --interactions or how to say it is quadratic or cubic
    Bernardo Favoreto
    @Favoreto_B_twitter
    @raphaottoni Unfortunately there isn't any specification of it in the docs (at least I did not find it). I feel your pain, I just discovered about this by accident when watching their latest presentation, so I don't really know the details.
    As for how to say when it is quadratic or cubic, you can just think of how many namespaces are you interacting: if you use UAM, this is a cubic interaction. If you use UA, this is a quadratic. I believe this is valid, though I cannot confirm
    3 replies
    Raphael Ottoni
    @raphaottoni
    I know I use the namespace Action to set the price: like :
    |Action price=399
    is it expected of me..to also use a name space called price, so it will help to converge since I am building a reward that is a Gaussian value times the Price the arm represent?
    I am supose to build another namespace like:
    |Price value:399
    or
    |Price value=399
    I am asking this because I am not sure if the |Action namespace for the cb_explore_adf threats this as a feature to be quadratic related to the reward
    this is the only thing I thought would explain why the model converges if it is a value but when those values are multiplied by a constant (lets say the price each arm represent) it stops to converge
    Raphael Ottoni
    @raphaottoni
    shared |Merchant merchant_id=Restaurante_Japidin city=sao_paulo radius=500
    0:-79.8:0.866666661699613 |Price value:399 |Action price=399
    |Price value:499 |Action price=499
    |Price value:599 |Action price=599
    I am trying things like that... but it wont converge either =(
    Those are the curves.... one would thing it is easy to converge:
    { curve_type: "Gaussian", curve_id: "arm_1", mean: 20.0, std: 0.0},
    { curve_type: "Gaussian", curve_id: "arm_2", mean: 5.0, std: 0.0},
    { curve_type: "Gaussian", curve_id: "arm_3", mean: 4.0, std: 0.0}
    Those are the arm_prices:
    {"399": "arm_1", "499": "arm_2", "599": "arm_3"}
    reward = Arm_value * Gaussian Sample
    vw = pyvw.vw("--cb_explore_adf -q :: --epision 0.2")
    Raphael Ottoni
    @raphaottoni
    Screen Shot 2021-02-22 at 19.39.41.png
    Above is a graph of this setup, I really dont know why the agent chosen the most expensive arm!
    if i simple divide the rewards by 100 and run the very same experiment:
    Screen Shot 2021-02-22 at 19.40.32.png