## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
MochizukiShinichi
@MochizukiShinichi
Happy holidays everyone! Could anyone please share some knowledge on how to read the logs for contextual bandits in VW? Specifically, 1. is the 'average loss' represent average reward value estimated counterfactually? 2. Is it calculated on a holdout set? 3. If 1 holds, where can I find out the loss of the underlying classifier/oracle? Thanks in advance!
jonpsy
@jonpsy:matrix.org
[m]
Hello VW team, can someone please explain this format in CCB
ccb shared | s_1 s_2
ccb action | a:1 b:1 c:1
ccb action | a:0.5 b:2 c:1
ccb action | a:0.5
ccb action | c:1
ccb slot  | d:4
ccb slot 1:0.8:0.8,0:0.2 0,1,3 | d:7
jonpsy
@jonpsy:matrix.org
[m]
:point_up: Edit: Hello VW team, can someone please explain this format in CCB
ccb shared | s_1 s_2
ccb action | a:1 b:1 c:1
ccb action | a:0.5 b:2 c:1
ccb action | a:0.5
ccb action | c:1
ccb slot  | d:4
ccb slot 1:0.8:0.8,0:0.2 0,1,3 | d:7
feeding this to the algorithm prints this
jonpsy
@jonpsy:matrix.org
[m]
[warning] Unlabeled example in train set, was this intentional?, I didn't understand
What exactly is the slot argument? ccb slot | d:4 what does it mean? Does it mean this is the fourth slot? In L7, what's happening? + the error term. Would be really grateful for an explanation, thanks
jonpsy
@jonpsy:matrix.org
[m]
Thanks a ton, I was also going through your answer in stackoverflow. I'll be sure to update you on this.
jonpsy
@jonpsy:matrix.org
[m]
@bassmang: actually could you review my VowpalWabbit/vowpal_wabbit#3546. We can discuss the details over there, let me know if you'd like that
andy-soft
@andy-soft
Hello VW team, I am using VW from C#, for NLP tasks, and there are some examples I could not reproduce because of the constant incompatibilities across versioning of VW, some papers published in 2015 claim the VW is an excellent and fast POS tagger and also claims to be an ultra-fast and precise NERC Named Entity Recognizer Classifier. I've implemented the experiments and it is not true, the F1 scores obtained were significantly lower than the published, and only in English as soon as you switch to other languages like Spanish, the thing renders unusable.
¿Has anyone experienced issues like this? I am very disappointed because of this.
I am using it currently for NLP Intent detection, (OAA Classifier on POS-tagged + morphological analyzed text)
Although it runs smooth and fine, the F1 score must be always re-calculated afterward, and the loss is tricky, you can get a really low loss, and a bad F1 and vice-versa! Precision-Recall behavior is also an issue.
12 replies
jonpsy
@jonpsy:matrix.org
[m]

In CCB, what's the difference between the weight of an example vs it's cost. I thought weight is just an inverse of cost?

I tried feeding the training data from CCB page, and the weight seems to be equal to the example counter, is that intentional?

jonpsy
@jonpsy:matrix.org
[m]
Okay, I recollect in the CB example cost was just the inverse of reward.
Priyanshu Agarwal
@priyanshuone6
Did anyone get a chance to look at my comment
3 replies
andreacimino
@andreacimino
Question regarding Conditional Contextual Bandit (CCB).
As it stands now, by looking at the source code, seems that the selection of the action on a slot does not take into account the previous action selected as "features". I will try to explain me better:
Suppose that there are N slots and M actions (M >= N).
The algorithms choose in slot N_1 action M_1, then a decision on N_2 must be taken, and other actions are similar to M_1.
There is a high chance that in slot M_2 an action similar to M_1 will be taken.
I would like to pass as "context" the decision made at the previous step, to promote "diversity".
I am not an expert, but I would like to know if someone has some experience regarding that.
jonpsy
@jonpsy:matrix.org
[m]
@jackgerrits: Hey, thanks for the detailed review. Would you mind reviewing the PR sometime soon? I think its done, we could merge it today/tommorow :-)
jeanjean
Hey everyone I have a quick question. Just started using the VW library and managed to extract the audit logs for the CB explore model. I was wondering what the scale for the weights for the features is? Does the model assign them at random, why are there negative values? Couldn’t find anything meaningful in the documentation. Would appreciate any assistance.
4 replies
jonpsy
@jonpsy:matrix.org
[m]
@bassmang: Hey, just saw you updated the master branch with some major changes for labels.
I see that you've used Union [ example, Costs] instead of using kwargs, should I go by that method too then?
Also we should check in from_example(..) if example type is what it claims to be. No?
4 replies
jonpsy
@jonpsy:matrix.org
[m]
:point_up: Edit: Also we should check in from_example(..) if example labelType is what it claims to be. No?
MochizukiShinichi
@MochizukiShinichi
Hi team, my invert_hash output from cb_adf always has one additional line for each feature without a feature name, what does this mean? Example below
Version 8.10.1
Id
Min label:-1
Max label:1
bits:18
lda:0
0 ngram:
0 skip:
options: --cb_adf --cb_type dr --csoaa_ldf multiline --csoaa_rank
Checksum: 4264491651
event_sum 0
action_sum 0
:0
s^age:7950:0.108916
:7951:0.242307
s^year:39846:-1.6944
:39847:-1.82654
jonpsy
@jonpsy:matrix.org
[m]
hey, do we have meet today? haven't been seeing agenda for a while
2 replies
jonpsy
@jonpsy:matrix.org
[m]

I had a few questions if you guys don't mind.

a) Are you guys conducting RLOS this year? Not that I'd stop contributing if you aren't, but it certainly gives motivation to contribute :)

b) On the topic of AutoML, I read the wiki and some merged PRs. I’d like to contribute, how can I help since its quite volatile right now, my PRs maybe slow so I don’t want to end up being the bottleneck. Other project which interests me: Python model introspection, CB in python.

c) I’m currently in a 6 month internship, but I think RLOS allows for this to be part-time? Would that be okay?

mustaphabenm
@mustaphabenm

Hello everyone, I'm trying a toy example to try and understand the model weights in the context of CATS algorithm
Training data : ca 1.23:-1:0.7 | a:1
Test data : | a:1
The vw command vw --cats 4 --bandwidth 1 --min_value 0 --max_value 32 --d train.vw --invert_hash m.ih -f model.vw --noconstant
the output is :

Version 8.11.0
Id
Min label:-1
Max label:0
bits:18
lda:0
0 ngram:
0 skip:
options: --bandwidth 1 --binary --cats 4 --cats_pdf 4 --cats_tree 4 --cb_explore_pdf --get_pmf --max_value 32 --min_value 0 --pmf_to_pdf 4 --sample_pdf --tree_bandwidth 0 --random_seed 2147483647
Checksum: 2730770910
:1
initial_t 0
norm normalizer 0.357143
t 1
sum_loss 0
sum_loss_since_last_dump 0
dump_interval 2
min_label -1
max_label 0
weighted_labeled_examples 1
weighted_labels 1
weighted_unlabeled_examples 0
example_number 1
total_features 1
total_weight 0.357143
sd::oec.weighted_labeled_examples 1
current_pass 1
a:108232:-0.190479 0.714286 1
a[1]:108233:-0.190479 0.714286 1

Bernardo Favoreto

Hey guys, just trying out the new VW release (it looks awesome, by the way)!
I was wondering what's the proper way of using --automl. I tried using --cb_explore_adf --automl 5 --oracle_type one_diff and it didn't give me any warning, is that correct?
Moreover, is there a way for me to know which interactions the model found to be the best after training with --automl?

Thank you!

12 replies
Max Pagels
Fantastic work by the VW team on the 9.0 release, congrats!
Bernardo Favoreto

I just tried to use another experimental feature, experimental_full_name_interactions, but wasn't able to.
Scenario: I've changed one of the namespaces in my dataset to begin with the same letter (changed Session to Usession - just for testing purposes, and I also have the User namespace).
Then, I tried running the following command:
vw --cb_explore_adf -c train.dat --passes 5 experimental_full_name_interactions Usession|User -f regular.vw
And got the following output:
User: command not found
Is this a bug or am I doing something wrong?

Thanks

18 replies
George Fei
@georgefei
Hey team, quick questions: why do I get 4 lines per feature in the invert_hash output if I use the legacy cb option?
Version 9.0.0
Id
Min label:-2
Max label:0.189197
bits:18
lda:0
0 ngram:
0 skip:
options: --cb 2 --cb_force_legacy --cb_type dr --csoaa 2 --random_seed 123
Checksum: 955399906
:1
initial_t 0
norm normalizer 508
t 32
sum_loss -32.2039
sum_loss_since_last_dump 0
dump_interval 64
min_label -2
max_label 0.189197
weighted_labeled_examples 32
weighted_labels 0
weighted_unlabeled_examples 0
example_number 32
total_features 128
total_weight 127
sd::oec.weighted_labeled_examples 32
current_pass 1
l1_state 0
l2_state 1
d:20940:0.0795723 0.676124 1
d[1]:20941:-0.253962 17.0199 1
d[2]:20942:0.0793786 0.305578 1
d[3]:20943:-0.253947 8.55648 1
e:69020:0.0795723 0.676124 1
e[1]:69021:-0.253962 17.0199 1
e[2]:69022:0.0793786 0.305578 1
e[3]:69023:-0.253947 8.55648 1
a:108232:-0.253785 17.065 1
a[1]:108233:0.0793965 0.706623 1
a[2]:108234:-0.253947 8.55648 1
a[3]:108235:0.0793786 0.305578 1
b:129036:-0.253785 17.065 1
b[1]:129037:0.0793965 0.706623 1
b[2]:129038:-0.253947 8.55648 1
b[3]:129039:0.0793786 0.305578 1
f:139500:0.0795723 0.676124 1
f[1]:139501:-0.253962 17.0199 1
f[2]:139502:0.0793786 0.305578 1
f[3]:139503:-0.253947 8.55648 1
Constant:202096:-0.238701 17.7411 1
Constant[1]:202097:-0.238162 17.7265 1
Constant[2]:202098:-0.238136 8.86206 1
Constant[3]:202099:-0.238136 8.86206 1
c:219516:-0.253785 17.065 1
c[1]:219517:0.0793965 0.706623 1
c[2]:219518:-0.253947 8.55648 1
c[3]:219519:0.0793786 0.305578 1
34 replies
Kwame Porter Robinson
@robinsonkwame
If anyone knows of a better space to ask please me know, but I'm a PhD student interested in funding continued development of a current VW branch that's been sitting since 2020
3 replies
Ryan Angi
@rangi513

I was attempting to use the VW estimators library for some bias correction on a pandas dataframe before building a Q function (model) outside of VW, but I noticed there is no Doubly Robust or MTR methods in this python library? Is that intentional or are they named as something else and I am missing them?

Also a small Usage snippet in the README might be useful. I've been reading through the basic-usage.py, but it is somewhat difficult to figure out with just this script.

Tobias S
@Tobias2020_gitlab

Hello VW community,

We are evaluating to use of Vowpal Wabbit for our recommender system (multi-arm-bandit). We want to show different images (combinatorial) and predict with which the user will interact.
Going through the documentation, Vowpal Wabbit supports with Slates the combinatorial setup. For the reward, it is stated that:
"A single, global, reward is produced that signals the outcome of an entire slate of decisions. There’s a linearity assumption on the impact of each action on the observed reward."
I.e. the semi-feedback we need is not supported by default.

Our question: Is there a way to work with semi-feedback and Slots in Vowpal Wabbit?

Thank you :)

2 replies
Rajan
@rajan-chari
Hi Tobais, What do you mean by semi-feedback?
Debraj Maji
@snnipetr

Hello all, I am trying to install Vowpal Wabbit on my local machine and have succesfully built and also ran the tests without any failures. However whenever I try to run make install it gives an error.

make: *** No rule to make target 'install'.  Stop.

I am unable to resolve it . Any help would be appreciated.

5 replies
musram
@musram
I have doubt regarding the continuous contextual bandit.
I train the model as bandwidth = 1000
num_actions = 20
vw = pyvw.vw("--cats " + str(num_actions) + " --bandwidth " + str(bandwidth) + " -d data/cats.acpx --min_value 0 --max_value 20000 --json --chain_hash --coin --epsilon 0.2 -f saved_model2.model --save_resume -q :: --quiet")
I want to generate the model file uisng invert-hash and i use pyvw.vw(" -d data/cats.acpx -t -i saved_model2.model --invert_hash model2.humanreadable"). But i am getting ouput in the model2.humanreadable as ['Version 8.11.0\n', 'Id \n', 'Min label:0\n', 'Max label:0\n', 'bits:18\n', 'lda:0\n', '0 ngram:\n', '0 skip:\n', 'options: --bandwidth 1000 --binary --cats 20 --cats_pdf 20 --cats_tree 20 --cb_explore_pdf --coin --epsilon 0.200000002980232 --get_pmf --max_value 20000 --min_value 0 --pmf_to_pdf 20 --quadratic :: --sample_pdf --tree_bandwidth 1 --random_seed 11542015123243797559\n', 'Checksum: 1406756192\n', ':0\n']. Is something wrong in the way I pass the params?
David Chanin
@chanind
Is there a way to convert the .json format to the standard vw format? I want to verify that I'm using the JSON format correctly with cb_explore_adf, but I can't find any examples of using JSON with namespaces in cb_explore_adf
6 replies
also is --json the same as --dsjson?
musram
@musram
I have taken this example from https://github.com/VowpalWabbit/jupyter-notebooks/blob/master/cats_tutorial.ipynb. With the command vw = pyvw.vw("--cats_pdf " + str(num_actions) + " --bandwidth " + str(bandwidth) + " --min_value 0 --max_value 100 --json --chain_hash --coin --epsilon 0.2 -q :: ")
I printed the actions and probs which are like [(0.0, 0.5625, 0.0020000000949949026), (0.5625, 2.5625, 0.4020000100135803), (2.5625, 100.0, 0.0020000000949949026)]. The last value in the each tuple corresponds to pdf. Now how do you sample using this pdf? If I am not wrong to sample from pdf, we sample from x ~ unif(0,1) and then cdf-inv(x) will be the sample. Is similar thing is done here? If yes then there should be a parameter for this pdf. for eg guassian has mean and variance. How to get that?
2 replies
David Chanin
@chanind
I'm trying to do off-policy evaluation for a contextual bandit, as described here: https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/off_policy_evaluation.html. However, when I follow this guide, it says the average loss is negative. What does it mean to have a negative loss? I thought loss can only be positive? Does a larger negative number mean better results or worse results if the loss is negative?
8 replies
Atharv Sonwane
@threewisemonkeys-as

Hello. I had a question about the Compiler Optimisation project for RLOS Fest.

I wanted to clarify the aims for the project. From the project description: "We will develop RL agents using VowpalWabbit" and "Implement VW agents for CompilerGym".

Now the tasks in CompilerGym such as Optimisation Phase Ordering are multi-state MDPs which are usually tackled with full RL approaches (as opposed to Contextual Bandits).
The closest method I could find to full RL in VW is learning to search. However these are all imitation style learning algorithms which require an oracle as a reference policy to learn from which is not available in the Compiler Optimisation case.

One approach that came to mind is to modify the reward and observation space of CompilerGym so that each observation would contain context from previous states and reward would be reward to go to termination. This would allow for the use of Contextual Bandits but would be a little awkward compared to using a full RL agent.

The other approach I thought of was using core VW classification algorithms to learn a Q-function within a larger RL agent. If this is the case then is there a particular reason for using VW algorithms (such as for scale and performance) as opposed to neural nets or other ML models?

In general I am just trying to get an idea of the expected approach since CompilerGym hosts full-RL problems whereas VW is targetted towards contextual bandits.

Jack Gerrits
@jackgerrits
Applications close on April 4 for this year's RLOS Fest. If you are a student or know a student that would be interested. Consider applying! https://www.microsoft.com/en-us/research/academic-program/rl-open-source-fest/
Raj Gupta
@rajuthegr8

Hello. I want to work on the project "Introduce Feature Engineering Language in VowpalWabbit" for RLOS Fest.

I have completed part 1 and 2 of the screening exercise and I wanted to clarify something about part 3. In the DataBase of DataRows where each row is is map<string,float>

Can i assume the key values will be disjoint for the different rows?

What will be the number of keys in a row and the number of rows with respect to the length of the queries? I am asking because this part asks for some ideas about how to optimize the function select() but the constraints are a little vague.

3 replies
Raj Gupta
@rajuthegr8
Hi everyone,
I have submitted my application for the project "Introduce Feature Engineering Language in VowpalWabbit" . I hope I am one of the students who gets selected and look forward to working with the team behind Vowpal Wabbit.
Thank you
Saahil Ali
@programmer290399

Hello Everyone!!
I have applied for the "Native CSV parsing" project in RLOSF,
One thing I noticed related to the screening exercise was that the output for the second example had some discrepancy, I
am not sure if it is the case or not, it'd be really great if someone can confirm that. I think shouldn't the first line of output should have C:1 rather than C:2 ?

Anyways, I hope I too get selected this year...
Best of luck to @threewisemonkeys-as , @rajuthegr8, and all other applicants 👍

Bernardo Favoreto
Hey guys!
Thinking about Contextual Bandits (and variations thereof) in production, how could one identify when it's appropriate to reset a model (i.e., start training from scratch)? My premise is that the model gradually becomes less sensitive to new data, so resetting from time to time seems appropriate (correct?).
I'm not sure if this should be based on a given period (e.g., every week), amount of data, or something else.
I know this is somewhat subjective but I would love to hear if anyone has any thoughts on the subject.
6 replies
Cyprien Courtot
@c.courtot_gitlab
Hello - for contextual bandits with continuous actions (CATS) - the cost doesn't have to be between [0,1] necessarily, does it?
2 replies
Ryan Angi
@rangi513
Is there a way to disable off policy evaluation completely for a contextual bandit when you specify --cb_type? I'm trying to estimate the performance difference (average cumulative regret) in a simulation environment between ips, dr, and a biased policy given the same exploration algorithm. The only work-around way I have thought of to do this is to set the probability to 1 for every training example and use ips. Any ideas?