I'm NG Sai, final year UG @ IIIT Sri City. I got to know about Microsoft RLOS programme via LinkedIn. My experience with open source includes contributing to C++ ML libraries such as: shogun, tensorflow-lite support and, mlpack where I've done GSoC'21 and currently serve as a member. My github.
I came across the Safe Contextual Bandits. Is this topic taken for this summer or will it be available for the year 2022. My forte is implementing algorithms from research papers so I wanted to inquire about this.
Thanks in advance!
vw -h) but didn't find and it's neither in the documentation. I'm interesting in knowing the values for gamma_scale (I believe I saw in the presentation that it's set to 1000 but would be good to confirm) and gamma_exponent.
vw.get_arguments()on the model also doesn't show the default values.
ccb shared | s_1 s_2 ccb action | a:1 b:1 c:1 ccb action | a:0.5 b:2 c:1 ccb action | a:0.5 ccb action | c:1 ccb slot | d:4 ccb slot 1:0.8:0.8,0:0.2 0,1,3 | d:7
[warning] Unlabeled example in train set, was this intentional?, I didn't understand
ccb slot | d:4what does it mean? Does it mean this is the fourth slot? In L7, what's happening? + the error term. Would be really grateful for an explanation, thanks
In CCB, what's the difference between the weight of an example vs it's cost. I thought weight is just an inverse of cost?
I tried feeding the training data from CCB page, and the weight seems to be equal to the example counter, is that intentional?
Union [ example, Costs]instead of using
kwargs, should I go by that method too then?
exampletype is what it claims to be. No?
Version 8.10.1 Id Min label:-1 Max label:1 bits:18 lda:0 0 ngram: 0 skip: options: --cb_adf --cb_type dr --csoaa_ldf multiline --csoaa_rank Checksum: 4264491651 event_sum 0 action_sum 0 :0 s^age:7950:0.108916 :7951:0.242307 s^year:39846:-1.6944 :39847:-1.82654
I had a few questions if you guys don't mind.
a) Are you guys conducting RLOS this year? Not that I'd stop contributing if you aren't, but it certainly gives motivation to contribute :)
b) On the topic of AutoML, I read the wiki and some merged PRs. I’d like to contribute, how can I help since its quite volatile right now, my PRs maybe slow so I don’t want to end up being the bottleneck. Other project which interests me: Python model introspection, CB in python.
c) I’m currently in a 6 month internship, but I think RLOS allows for this to be part-time? Would that be okay?
Thanks for your answers
Hello everyone, I'm trying a toy example to try and understand the model weights in the context of CATS algorithm
Training data :
ca 1.23:-1:0.7 | a:1
Test data :
The vw command
vw --cats 4 --bandwidth 1 --min_value 0 --max_value 32 --d train.vw --invert_hash m.ih -f model.vw --noconstant
the output is :
Version 8.11.0 Id Min label:-1 Max label:0 bits:18 lda:0 0 ngram: 0 skip: options: --bandwidth 1 --binary --cats 4 --cats_pdf 4 --cats_tree 4 --cb_explore_pdf --get_pmf --max_value 32 --min_value 0 --pmf_to_pdf 4 --sample_pdf --tree_bandwidth 0 --random_seed 2147483647 Checksum: 2730770910 :1 initial_t 0 norm normalizer 0.357143 t 1 sum_loss 0 sum_loss_since_last_dump 0 dump_interval 2 min_label -1 max_label 0 weighted_labeled_examples 1 weighted_labels 1 weighted_unlabeled_examples 0 example_number 1 total_features 1 total_weight 0.357143 sd::oec.weighted_labeled_examples 1 current_pass 1 a:108232:-0.190479 0.714286 1 a:108233:-0.190479 0.714286 1
Can anyone please help to use the weights to get the final results ? Thank you
Hey guys, just trying out the new VW release (it looks awesome, by the way)!
I was wondering what's the proper way of using
--automl. I tried using
--cb_explore_adf --automl 5 --oracle_type one_diff and it didn't give me any warning, is that correct?
Moreover, is there a way for me to know which interactions the model found to be the best after training with
I just tried to use another experimental feature,
experimental_full_name_interactions, but wasn't able to.
Scenario: I've changed one of the namespaces in my dataset to begin with the same letter (changed Session to Usession - just for testing purposes, and I also have the
Then, I tried running the following command:
vw --cb_explore_adf -c train.dat --passes 5 experimental_full_name_interactions Usession|User -f regular.vw
And got the following output:
User: command not found
Is this a bug or am I doing something wrong?
Version 9.0.0 Id Min label:-2 Max label:0.189197 bits:18 lda:0 0 ngram: 0 skip: options: --cb 2 --cb_force_legacy --cb_type dr --csoaa 2 --random_seed 123 Checksum: 955399906 :1 initial_t 0 norm normalizer 508 t 32 sum_loss -32.2039 sum_loss_since_last_dump 0 dump_interval 64 min_label -2 max_label 0.189197 weighted_labeled_examples 32 weighted_labels 0 weighted_unlabeled_examples 0 example_number 32 total_features 128 total_weight 127 sd::oec.weighted_labeled_examples 32 current_pass 1 l1_state 0 l2_state 1 d:20940:0.0795723 0.676124 1 d:20941:-0.253962 17.0199 1 d:20942:0.0793786 0.305578 1 d:20943:-0.253947 8.55648 1 e:69020:0.0795723 0.676124 1 e:69021:-0.253962 17.0199 1 e:69022:0.0793786 0.305578 1 e:69023:-0.253947 8.55648 1 a:108232:-0.253785 17.065 1 a:108233:0.0793965 0.706623 1 a:108234:-0.253947 8.55648 1 a:108235:0.0793786 0.305578 1 b:129036:-0.253785 17.065 1 b:129037:0.0793965 0.706623 1 b:129038:-0.253947 8.55648 1 b:129039:0.0793786 0.305578 1 f:139500:0.0795723 0.676124 1 f:139501:-0.253962 17.0199 1 f:139502:0.0793786 0.305578 1 f:139503:-0.253947 8.55648 1 Constant:202096:-0.238701 17.7411 1 Constant:202097:-0.238162 17.7265 1 Constant:202098:-0.238136 8.86206 1 Constant:202099:-0.238136 8.86206 1 c:219516:-0.253785 17.065 1 c:219517:0.0793965 0.706623 1 c:219518:-0.253947 8.55648 1 c:219519:0.0793786 0.305578 1
I was attempting to use the VW estimators library for some bias correction on a pandas dataframe before building a Q function (model) outside of VW, but I noticed there is no Doubly Robust or MTR methods in this python library? Is that intentional or are they named as something else and I am missing them?
Also a small
Usage snippet in the README might be useful. I've been reading through the basic-usage.py, but it is somewhat difficult to figure out with just this script.
Hello VW community,
We are evaluating to use of Vowpal Wabbit for our recommender system (multi-arm-bandit). We want to show different images (combinatorial) and predict with which the user will interact.
Going through the documentation, Vowpal Wabbit supports with Slates the combinatorial setup. For the reward, it is stated that:
"A single, global, reward is produced that signals the outcome of an entire slate of decisions. There’s a linearity assumption on the impact of each action on the observed reward."
I.e. the semi-feedback we need is not supported by default.
Our question: Is there a way to work with semi-feedback and Slots in Vowpal Wabbit?
Thank you :)
Hello all, I am trying to install Vowpal Wabbit on my local machine and have succesfully built and also ran the tests without any failures. However whenever I try to run make install it gives an error.
make: *** No rule to make target 'install'. Stop.
I am unable to resolve it . Any help would be appreciated.
Hello. I had a question about the Compiler Optimisation project for RLOS Fest.
I wanted to clarify the aims for the project. From the project description: "We will develop RL agents using VowpalWabbit" and "Implement VW agents for CompilerGym".
Now the tasks in CompilerGym such as Optimisation Phase Ordering are multi-state MDPs which are usually tackled with full RL approaches (as opposed to Contextual Bandits).
The closest method I could find to full RL in VW is learning to search. However these are all imitation style learning algorithms which require an oracle as a reference policy to learn from which is not available in the Compiler Optimisation case.
One approach that came to mind is to modify the reward and observation space of CompilerGym so that each observation would contain context from previous states and reward would be reward to go to termination. This would allow for the use of Contextual Bandits but would be a little awkward compared to using a full RL agent.
The other approach I thought of was using core VW classification algorithms to learn a Q-function within a larger RL agent. If this is the case then is there a particular reason for using VW algorithms (such as for scale and performance) as opposed to neural nets or other ML models?
In general I am just trying to get an idea of the expected approach since CompilerGym hosts full-RL problems whereas VW is targetted towards contextual bandits.