Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Raphael Ottoni
    @raphaottoni
    @olgavrou, I’m little bit confused by the answer you gabe to @Favoreto_B_twitter. You said, and I quote, “ The reason you see it always at index 0 is that VW will swap the predicted action with the first index so that it is always at index 0” .. what you are saying is that the internal index of a arm could change ?! In on intersction the index 0 of the PMF would be related to arm1 but in the next to arm2? How Am I suppose to know which arms are at each index given the pmf ? How can I validate those things ?
    @olgavrou does it happe with the —cb_explore_adf ? I think it doesnt ... due to the order we pass on predicit, right ?
    shared |User user=Tom time_of_day=morning
    |Action article=politics
    |Action article=sports
    |Action article=music
    |Action article=food
    In this example , politicis would always be index 0 abd food always index 3, in the PMF right ?
    Wilson Cheung
    @wcheung-code
    Hey all! Just wanted to celebrate that I finally got Vowpal Wabbit to successfully install on my Windows machine after 3 long nights after work of reading documentations. I am looking forward to play with VW more this weekend and start preparing my application for RL Open Source Fest to see where I can help contribute :) Looking forward to meeting you all!
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys, I am testing a new use case on data from a real website and would like to know: What you generally do to find which are the best contextual attributes to use? Purely intuition? Statistical analysis? Would love to hear some thoughts on that!
    6 replies
    Harsh Sharma
    @hs2361

    Hello everyone,
    I'm Harsh Sharma, an undergraduate student from IIIT, Gwalior, pursuing Computer Science and Engineering. I'm interested in participating in the Microsoft RL Open Source fest this year, and I'm specifically interested in working on these projects:
    17 - RL-based query planner for open-source SQL engine
    20 - AutoML for online learning

    Since I've worked with Deep Learning for the NL2SQL task before, I would like to work on 17. Could someone here please clarify what the "query planner" here refers to? Does it mean join query optimization? Also, I'd be really grateful if someone could guide me as to what would be the first step to implement such a query planner in an SQL engine.

    Bernardo Favoreto
    @Favoreto_B_twitter

    I have a question about using VW with cb_explore_adf and softmax explorer for ranking.

    I am trying to use VW to perform ranking using the contextual bandit framework, specifically using --cb_explore_adf --softmax --lambda X. The choice of softmax is because, according to VW's docs: "This is a different explorer, which uses the policy not only to predict an action but also predict a score indicating the quality of each action." This quality-related score is what I would like to use for ranking.

    The scenario is this: I have a list of items [A, B, C, D], and I would like to sort it in an order that maximizes a pre-defined metric (e.g., CTR). One of the problems, as I see, is that we cannot evaluate the items individually because we can't know for sure which item made the user click or not.

    To test some approaches, I've created a dummy dataset. As a way to try and solve the above problem, I am using the entire ordered list as a way to evaluate if a click happens or not (e.g., given the context for user X, he will click if the items are [C, A, B, D]). Then, I reward the items individually according to their position on the list, i.e., reward = 1/P for 0 < P < len(list). Here, the reward for C, A, B, D is 1, 0.5, and 0.25, 0.125, respectively. If there's no click, the reward is zero for all items. The reasoning behind this is that more important items will stabilize on top and less important on the bottom.

    Also, one of the difficulties I found was defining a sampling function for this approach. Typically, we're interested in selecting only one option, but here I have to sample multiple times (4 in the example). Because of that, it's not very clear how I should incorporate exploration when sampling items. I have a few ideas:

    • Copy the probability mass function and assign it to copy_pmf. Draw a random number between 0 and max(copy_pmf) and for each probability value in copy_pmf, increment the sum_prob variable (very similar to the tutorial here:https://vowpalwabbit.org/tutorials/cb_simulation.html). When sum_prob > draw, we add the current item/prob to a list. Then, we remove this probability from copy_pmf, set sum_prob = 0, and draw a new number again between 0 and max(copy_pmf) (which might change or not).
    • Another option is drawing a random number and, if the maximum probability, i.e., max(pmf) is greater than this number, we exploit. If it isn't, we shuffle the list and return this (explore). This approach requires tuning the lambda parameter, which controls the output pmf (I have seen cases where the max prob is > 0.99, which would mean around a 1% chance of exploring. I have also seen instances where max prob is ~0.5, which is around 50% exploration.

    I would like to know if there are any suggestions regarding this problem, specifically sampling and the reward function. Also, if there are any things I might be missing here.

    Thank you!

    olgavrou
    @olgavrou
    Hi @raphaottoni I can see why my response is confusing, I was mixing up cb_explore, and cb_explore_adf. The tutorial does cb_explore_adf and will return a pmf from which we need to sample from. The pmf will have the a larger probability on the action that the model predicted (giving us a higher probability that we will exploit the predicted action) and smaller probabilities for the rest of the actions (giving us a smaller probability that we will explore). There is not index swapping here you are right.
    olgavrou
    @olgavrou
    Hi @Favoreto_B_twitter have you checked out Conditional Contextual Bandits? It seems like your description is pointing towards that: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Conditional-Contextual-Bandit
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey @olgavrou, I did come across CCBs but never really dug deep to fully understand how they work. I've searched some tutorials but could not find, are there any?
    Indeed, they seem like a good option, but I feel like the lack of material on the topic might be a barrier. If you have something to point me out to, I would love to see it!
    Thanks!
    14 replies
    Bernardo Favoreto
    @Favoreto_B_twitter
    @olgavrou After some reading, I am pretty confident CCB is the way to go for the problem I described. However, it's still unclear to me whether I should use CCB or Slates?
    I can't really see how they differ. Apparently, Slates are built on top of CCB, but what is their purpose?
    1 reply
    Marcos Passos
    @marcospassos
    Hi everyone! I just watched Milind Agarwal's talk about Contextual Bandits Data Visualization, and I got very interested in the library he mentioned, but I could not find it anywhere. Is it available somewhere?
    Arunagirinadan Sudharshan
    @SudhanAruna
    Hi everyone, I'm student applying for RLOS 2021 and I'm planning to work on vw daemon to use gRPC task. I wanted to whether I can email any mentors or maintainers to clarify certain doubts which will be helpful to finish the proposal. Please let me know if its possible.
    Raphael Ottoni
    @raphaottoni

    Guys would please help me with two questions:
    1) Using a cb_explore_adf, for a pricing agent. I was trying two types of reward: i) Sales and ii) sales X Price where each arm is a Price. I have noticed that the cb_explore_adf converge well when the reward is sales, but when we multiple the sales by the arm price. it simply doesnt converge at all. Is it possible that it is sensitive to scale? sales are in units ( like 40 at most) and price are in cents ( e.g 399).

    2) Another quick question? How to pass multiple Namespaces features in the the -q UA parameter.... I mean I want to add more variables from another namespace, something like -q [UA, MA].

    6 replies
    (U)ser and (M)erchant and (A)ction
    Bernardo Favoreto
    @Favoreto_B_twitter
    @raphaottoni I can help with your second question: adding multiple interactions is pretty straightforward in VW. You can do it in a few ways:
    1) Simply pass multiple interactions after the -q flag, e.g. -q UA UM MA and so on, depending on how many namespaces you have
    2) You can also pass different types of interactions (i.e., quadratic or cubic) using the --interactions flag. Here, you could use, for example --interactions UA UM UAM
    3) Finally, you can specify all possible interactions by using a special "symbol" (don't know if it's the appropriate way of calling it). For example, specify all possible quadratic interactions -q ::. You should notice that using this is slower because there are lots of features created on-the-fly. Also, you should see a warning saying that some repeated features were ignored (by default in VW).
    Raphael Ottoni
    @raphaottoni
    @Favoreto_B_twitter thank you!
    7 replies
    @Favoreto_B_twitter Could you please, point me in the Documentation on this --intereactions? I didnt get what is the difference between -q and --interactions or how to say it is quadratic or cubic
    Bernardo Favoreto
    @Favoreto_B_twitter
    @raphaottoni Unfortunately there isn't any specification of it in the docs (at least I did not find it). I feel your pain, I just discovered about this by accident when watching their latest presentation, so I don't really know the details.
    As for how to say when it is quadratic or cubic, you can just think of how many namespaces are you interacting: if you use UAM, this is a cubic interaction. If you use UA, this is a quadratic. I believe this is valid, though I cannot confirm
    3 replies
    Raphael Ottoni
    @raphaottoni
    I know I use the namespace Action to set the price: like :
    |Action price=399
    is it expected of me..to also use a name space called price, so it will help to converge since I am building a reward that is a Gaussian value times the Price the arm represent?
    I am supose to build another namespace like:
    |Price value:399
    or
    |Price value=399
    I am asking this because I am not sure if the |Action namespace for the cb_explore_adf threats this as a feature to be quadratic related to the reward
    this is the only thing I thought would explain why the model converges if it is a value but when those values are multiplied by a constant (lets say the price each arm represent) it stops to converge
    Raphael Ottoni
    @raphaottoni
    shared |Merchant merchant_id=Restaurante_Japidin city=sao_paulo radius=500
    0:-79.8:0.866666661699613 |Price value:399 |Action price=399
    |Price value:499 |Action price=499
    |Price value:599 |Action price=599
    I am trying things like that... but it wont converge either =(
    Those are the curves.... one would thing it is easy to converge:
    { curve_type: "Gaussian", curve_id: "arm_1", mean: 20.0, std: 0.0},
    { curve_type: "Gaussian", curve_id: "arm_2", mean: 5.0, std: 0.0},
    { curve_type: "Gaussian", curve_id: "arm_3", mean: 4.0, std: 0.0}
    Those are the arm_prices:
    {"399": "arm_1", "499": "arm_2", "599": "arm_3"}
    reward = Arm_value * Gaussian Sample
    vw = pyvw.vw("--cb_explore_adf -q :: --epision 0.2")
    Raphael Ottoni
    @raphaottoni
    Screen Shot 2021-02-22 at 19.39.41.png
    Above is a graph of this setup, I really dont know why the agent chosen the most expensive arm!
    if i simple divide the rewards by 100 and run the very same experiment:
    Screen Shot 2021-02-22 at 19.40.32.png
    this thing bugs me
    =(
    I forgot to mention: reward is actually -1 X arm_value X Gaussian Sample
    Max Pagels
    @maxpagels_twitter
    @raphaottoni could you provide a github gist of your data?
    Raphael Ottoni
    @raphaottoni
    there is no data training data... just a "simulator" which is a object that would return a sample from those curves regarding the arm_id:
    { curve_type: "Gaussian", curve_id: "arm_1", mean: 20.0, std: 0.0},
    { curve_type: "Gaussian", curve_id: "arm_2", mean: 5.0, std: 0.0},
    { curve_type: "Gaussian", curve_id: "arm_3", mean: 4.0, std: 0.0}
    {"399": "arm_1", "499": "arm_2", "599": "arm_3"}
    each step VW chooses a ARM, I sample from this curve and multiply the result by the Arm's Price. Then I change the signal so it would be reward instead of cost and them fit it to the model...
    Raphael Ottoni
    @raphaottoni
    The problem appears to be solved if we apply a log function upon the reward.
    Max Pagels
    @maxpagels_twitter

    there is no data training data... just a "simulator" which is a object that would return a sample from those curves regarding the arm_id:

    if you have a simulator, you are training on some VW data somewhere. could you provide that dataset as a gist?

    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys, I was checking the Slates formulation out of curiosity, and it got me thinking. I could swear streaming services like Netflix used slates for recommendations. I know they personalize both the title recommendation and the thumbnail for each title. Thus, it seemed like the perfect use case for Slates (here, the title would be a slot and the image another slot): there is a single global reward (play or not), and the action set is disjointed.
    However, when trying to visualize how this would work in VW, I noticed that it probably wouldn't. What made me think this is that Slates predicts for all slots at once, and therefore there is no way we could select first the title, then pre-filter the possible thumbnails for that tile, and then make a prediction for the thumbnail slot.
    Am I missing something here? What are some use-cases of Slates for personalization using VW? The only one that comes into mind is "whole page optimization".
    Thanks!
    2 replies
    Jui Pradhan
    @JuiP
    Hi everyone, I was looking at the estimators repository issue: VowpalWabbit/estimators#1, we already have an implementation of ips estimator in Python. My question is why is "convert current IPS estimator to Python" mentioned as a Goal for this project? Can someone please clarify?
    2 replies
    pushpendre
    @pushpendre

    Hi I was wondering if I could get a pointer to the implementation of the --cb k --cb_type dr in the source code? Basically I am trying to understand the parameters that are learnt at the end of off-policy CB training in VW. E.g. I did

    vw --cb 3 --cb_type ips  -f cb.model -d train.txt --invert_hash readable_ips.model
    vw --cb 3 --cb_type dm  -f cb.model -d train.txt --invert_hash readable_dm.model
    vw --cb 3 --cb_type dr  -f cb.model -d train.txt --invert_hash readable_dr.model

    and the dr model obviously contains parameters equal to ips+ dm but I want to know exactly what is the linear regression formula that is being implemented in dr.

    3 replies
    CP500
    @CP500
    Hi everyone, Just a newbie question on CATS. Does it give you a PMF when you call predict?
    vw = pyvw.vw("––cats_pdf 7 –bandwidth 0.1 –min_value 0 –max_value 1")
    ex = vw.parse('ca | c1:0.5 c2:1.3', labelType=8)
    vw.predict(ex)
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys, I would like to know if anyone found an appropriate way of calculating feature importance after training a model?
    I tried using sklearn/eli5 permutation methods but neither properly worked.
    Then, I decided to code my own, where I first train a model, then do permutation importance in a held out set. I'm a bit concerned as to whether the results are significant, mainly because of all interactions created on the fly with VW. I should mention I am aware of the multilinearity/correlation problem, and this is not my biggest concern.
    Does it even make sense to calculate feature importance in VW? (I assume so because this was one of the topics from the VW presentation at: https://slideslive.com/38942331/vowpal-wabbit
    Thanks!
    olgavrou
    @olgavrou
    @CP500 cats pdf should give you a pdf (probability density function) and not a pmf (probability mass function) as cats is predicting in a continuous action space. PDF is in the form of (left:right:pdf_value) triples so you could check that the pdf integrates to 1 by doing (left - right) pdf_value + (left - right) pdf_value for all the returned triples
    7 replies