Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Raphael Ottoni
    @raphaottoni
    I know I use the namespace Action to set the price: like :
    |Action price=399
    is it expected of me..to also use a name space called price, so it will help to converge since I am building a reward that is a Gaussian value times the Price the arm represent?
    I am supose to build another namespace like:
    |Price value:399
    or
    |Price value=399
    I am asking this because I am not sure if the |Action namespace for the cb_explore_adf threats this as a feature to be quadratic related to the reward
    this is the only thing I thought would explain why the model converges if it is a value but when those values are multiplied by a constant (lets say the price each arm represent) it stops to converge
    Raphael Ottoni
    @raphaottoni
    shared |Merchant merchant_id=Restaurante_Japidin city=sao_paulo radius=500
    0:-79.8:0.866666661699613 |Price value:399 |Action price=399
    |Price value:499 |Action price=499
    |Price value:599 |Action price=599
    I am trying things like that... but it wont converge either =(
    Those are the curves.... one would thing it is easy to converge:
    { curve_type: "Gaussian", curve_id: "arm_1", mean: 20.0, std: 0.0},
    { curve_type: "Gaussian", curve_id: "arm_2", mean: 5.0, std: 0.0},
    { curve_type: "Gaussian", curve_id: "arm_3", mean: 4.0, std: 0.0}
    Those are the arm_prices:
    {"399": "arm_1", "499": "arm_2", "599": "arm_3"}
    reward = Arm_value * Gaussian Sample
    vw = pyvw.vw("--cb_explore_adf -q :: --epision 0.2")
    Raphael Ottoni
    @raphaottoni
    Screen Shot 2021-02-22 at 19.39.41.png
    Above is a graph of this setup, I really dont know why the agent chosen the most expensive arm!
    if i simple divide the rewards by 100 and run the very same experiment:
    Screen Shot 2021-02-22 at 19.40.32.png
    this thing bugs me
    =(
    I forgot to mention: reward is actually -1 X arm_value X Gaussian Sample
    Max Pagels
    @maxpagels_twitter
    @raphaottoni could you provide a github gist of your data?
    Raphael Ottoni
    @raphaottoni
    there is no data training data... just a "simulator" which is a object that would return a sample from those curves regarding the arm_id:
    { curve_type: "Gaussian", curve_id: "arm_1", mean: 20.0, std: 0.0},
    { curve_type: "Gaussian", curve_id: "arm_2", mean: 5.0, std: 0.0},
    { curve_type: "Gaussian", curve_id: "arm_3", mean: 4.0, std: 0.0}
    {"399": "arm_1", "499": "arm_2", "599": "arm_3"}
    each step VW chooses a ARM, I sample from this curve and multiply the result by the Arm's Price. Then I change the signal so it would be reward instead of cost and them fit it to the model...
    Raphael Ottoni
    @raphaottoni
    The problem appears to be solved if we apply a log function upon the reward.
    Max Pagels
    @maxpagels_twitter

    there is no data training data... just a "simulator" which is a object that would return a sample from those curves regarding the arm_id:

    if you have a simulator, you are training on some VW data somewhere. could you provide that dataset as a gist?

    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys, I was checking the Slates formulation out of curiosity, and it got me thinking. I could swear streaming services like Netflix used slates for recommendations. I know they personalize both the title recommendation and the thumbnail for each title. Thus, it seemed like the perfect use case for Slates (here, the title would be a slot and the image another slot): there is a single global reward (play or not), and the action set is disjointed.
    However, when trying to visualize how this would work in VW, I noticed that it probably wouldn't. What made me think this is that Slates predicts for all slots at once, and therefore there is no way we could select first the title, then pre-filter the possible thumbnails for that tile, and then make a prediction for the thumbnail slot.
    Am I missing something here? What are some use-cases of Slates for personalization using VW? The only one that comes into mind is "whole page optimization".
    Thanks!
    2 replies
    Jui Pradhan
    @JuiP
    Hi everyone, I was looking at the estimators repository issue: VowpalWabbit/estimators#1, we already have an implementation of ips estimator in Python. My question is why is "convert current IPS estimator to Python" mentioned as a Goal for this project? Can someone please clarify?
    2 replies
    pushpendre
    @pushpendre

    Hi I was wondering if I could get a pointer to the implementation of the --cb k --cb_type dr in the source code? Basically I am trying to understand the parameters that are learnt at the end of off-policy CB training in VW. E.g. I did

    vw --cb 3 --cb_type ips  -f cb.model -d train.txt --invert_hash readable_ips.model
    vw --cb 3 --cb_type dm  -f cb.model -d train.txt --invert_hash readable_dm.model
    vw --cb 3 --cb_type dr  -f cb.model -d train.txt --invert_hash readable_dr.model

    and the dr model obviously contains parameters equal to ips+ dm but I want to know exactly what is the linear regression formula that is being implemented in dr.

    3 replies
    CP500
    @CP500
    Hi everyone, Just a newbie question on CATS. Does it give you a PMF when you call predict?
    vw = pyvw.vw("––cats_pdf 7 –bandwidth 0.1 –min_value 0 –max_value 1")
    ex = vw.parse('ca | c1:0.5 c2:1.3', labelType=8)
    vw.predict(ex)
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hey guys, I would like to know if anyone found an appropriate way of calculating feature importance after training a model?
    I tried using sklearn/eli5 permutation methods but neither properly worked.
    Then, I decided to code my own, where I first train a model, then do permutation importance in a held out set. I'm a bit concerned as to whether the results are significant, mainly because of all interactions created on the fly with VW. I should mention I am aware of the multilinearity/correlation problem, and this is not my biggest concern.
    Does it even make sense to calculate feature importance in VW? (I assume so because this was one of the topics from the VW presentation at: https://slideslive.com/38942331/vowpal-wabbit
    Thanks!
    olgavrou
    @olgavrou
    @CP500 cats pdf should give you a pdf (probability density function) and not a pmf (probability mass function) as cats is predicting in a continuous action space. PDF is in the form of (left:right:pdf_value) triples so you could check that the pdf integrates to 1 by doing (left - right) pdf_value + (left - right) pdf_value for all the returned triples
    7 replies
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hello everyone!
    Does anyone know where (and if) I can find the notebook for the Estimators library? I would like to use this lib, but there's not much documentation/examples on how to do so.
    Also, does it make sense to use this lib for CCBs?
    Thanks!
    Jack Gerrits
    @jackgerrits
    @Favoreto_B_twitter that repo is very much still a work in progress - so docs/examples are not there yet unfortunately. For CCB the approach that has been taken so far is to do CFE on the first slot only - so in this context I think it does make sense to use it. But it would need some adapting and I am not positive here.
    9 replies
    Max Pagels
    @maxpagels_twitter
    Is there an estimate on when the pypi version of pyvw will have CATS support? 8.9.0 doesn't support CATS labels
    Max Pagels
    @maxpagels_twitter
    I'm fiddling around with CATs, and have a simple setup with a fixed context. Per round, I ask for an action (range 0-100) and calculate a cost that is zero at 50, otherwise quadratically the absolute distance from 50 in either direction. If tried gridsearching a whole mess of bandwith, epsilon and learning rate values, but the learning is just all over the place. I would have expected the system to converge to an optimal prediction of 50.0 per round pretty easily since the context is always fixed. Instead, it either bounces around or gets stuck on some non-optimal values around 40. Any tips?
    olgavrou
    @olgavrou
    Hi @maxpagels_twitter what is the parameter you pass to --cats? have you experimented with that at all? For cats I would try different combinations of number of discrete actions used by the algorithm (passed in to the --cats arg) and bandwidths (bandwidth being a property of the continuous range). e.g. I would try a grid of num_actions [8, 16, 32, 64, 128, 256, 1024] and e.g. bandwidths [1, 2, 4, 6, 8, 10, 14, 20]. For different number of discrete actions you might need more data for CATS to converge to something sensible. CATS label support in pyvw should be available in the next release (coming soon-ish, we don't want to wait another year for the next vw release). Let me know if you get better results from CATS or not :)
    Max Pagels
    @maxpagels_twitter
    I tried gridsearching a whole mess of options, including a bunch of action counts, and can get relatively close to an optimum, but the hyperparams seem to be super important to get just right or the learning is way off. But I'll experiment further and report back
    3 replies
    Bernardo Favoreto
    @Favoreto_B_twitter
    Hello guys!
    Can someone help me understand why propensities scores are important when training a CB?
    I've been thinking about this lately and just couldn't wrap my head around a good explanation...
    Let's take the epsilon-greedy, for example. When we train a CB model with epsilon-greedy, the pmf output is always the same (just the indexes change). This makes me assume that propensities scores aren't supposed to teach the CB how to output probabilities. Moreover, I believe they are used for "importance weighting", i.e., prob(new_policy)/prob(logging_policy), but isn't this only for when we use IPS? I think I'm missing something quite obvious here...
    Also, when we offline train a new model using logged CB data, how is the new CB able to achieve better performance than the logging policy? I mean, it's an excellent thing, but I would like to understand how that is possible.
    Thanks!
    George Fei
    @georgefei
    Hi all, I have a few questions related to contextual bandit evaluation:
    1.How do I compare the performance of different policies’ decisions using --eval? Do I look at the average loss in the output? If the costs in the input data are all negative and a lower cost is better, does a lower average loss mean one policy is better? What does average loss represent?
    2.How do I interpret the output of --explore_eval? More specifically update count, violation count, and final multiplier (what variables do they correspond to in the algorithm on slide 9 of https://pdfs.semanticscholar.org/presentation/f2c3/d41ef70df24b68884a5c826f0a4b48f17095.pdf). Do I also look at the average loss to compare different exploration algo + hyperparameter combinations?

    3.In order to use -explore_eval I have to convert my data from cb format to cb_adf format since the cb format is not supported when using -explore_eval. For the example data with two arms below, are the two ways to represent the data equivalent?:

    2:10.02:0.5 | x0:0.47 x1:0.84 x2:0.29
    1:8.90:0.5 | x0:0.51 x1:0.65 x2:0.67

    shared | x0:0.47 x1:0.84 x2:0.29
    | a1
    0:10.02:0.5 | a2

    shared | x0:0.51 x1:0.65 x2:0.67
    0:8.90:0.5 | a1
    | a2

    Wes
    @wmelton
    Hello all - ive been evaluating Microsoft Personalizer for our company, which i have largely assumed is VW under the hood with MS specific tech/service written on top of it.
    My question is this - within a given namespace, does the order of features or their names matter? Im assuming yes, but the VW documentation out there doesnt make it super clear how to handle a situation where two given documents have the same keywords in them, but after tokenization, the keywords are not in the same order due to variance in the number of keywords found in each document. Appreciate guidance there.
    finally, i referenced Personalizer only because it sparked this train of thought largely because the documentation for it leverages only rhe JSON format of input data, but seems to neglect any instruction with regards to variation in keyword order if your features are keywords extracted from a document. Thanks!
    18 replies
    Max Pagels
    @maxpagels_twitter
    @georgefei did you already get an answer to your questions? I'd be very interested in them, too
    1 reply
    Particularly, I imagine lots of folks do evaluation by gridsearching learning with cb_type ips/dm/dr and choosing the one with the best reported loss. Isn't that wrong, especially considering dm is biased? --eval throws an error if you use DM.
    George Fei
    @georgefei
    related to my third question above about whether the same data in cb and cb_adf formats are going to yield the same result. Not sure about the vw implementation but in the contextual bandit bake-off paper, the reward estimation is formulated differently for each case:
    image.png
    49 replies
    Max Pagels
    @maxpagels_twitter
    I think there is a very clear need for a policy evaluation tutorial on vowpalwabbit.org. I'd be happy to write one, assuming someone can help answer questions as they arise, since I have a couple of outstanding ones myself. Would folks find this valuable?
    10 replies
    Max Pagels
    @maxpagels_twitter

    OPE PR: VowpalWabbit/vowpalwabbit.github.io#193

    @lalo @olgavrou et al, Note that I will need expert advice on this. There is a checklist that needs to be confirmed to absolute certainty or, if untrue, commented on to provide me with the correct interpretation.

    Max Pagels
    @maxpagels_twitter
    @pmineiro would be also be a very good additional reviewer