Hi, I am trying to get acquainted with Trax through the quickstart notebooks and wanted to use trax/rl. How do I use ppo_trainer with a custom gym env? I looked at trax/rl/ppo_trainer_test.py for reference.
(Issues similar to this: https://colab.research.google.com/drive/1TnYMIt7Zm-iCN-Az3jeO8QoIpQ7YvHiD)
Also I eventually want to build a simple DQN which can use transformer_decoder as the model, how should I go about, does transformer always expect inputs as in trax/rl/supervised/inputs.py ? How do I include states and actions both in the training stream? Any guidance/resources would be very helpful, TIA.
Hi @PranavMahajan25 - thanks for trying it out! We'd be very interested in taking the colab as an example of RL once it works.
I ran the colab and it doesn't error out for me! So looks like you got it working (maybe don't use the same object with a different net since there may have been some caching of weights it looks like?)
PS: I like the idea of starting with the test and modifying it in place to get what you want :)
Re: states and actions both in traning stream @koz4k added some code for doing similar things in the rl/ directory, mostly related to
simple.py. That part of the code is under heavy development, we want to try something similar as well.
PPS: The colab uses trax 1.0.0, maybe upgrade to the latest 1.2.2?
PAD_AMOUNTare global constants that are constructed by loading a txt file of Crime and Punishment. You can instead load multiple txt files, apply the tokenizer, and then generate corresponding token
idsfor each one
Hi, thank you for making Reformer and Trax available.
I have a question regarding the TPU Crime and Punishment example. The language model obviously learns made-up words - scandlchedness , raggong, innatummed , quisten... Some great words there, but...
Is this an artifact of the hashing, or what do you think causes it?
@nkitaev I'm using this to feed the multiple text files. Do you think I can tweak any of the hyparameters in the parse_config to run the model longer than half an hour without running into memory issues?
def my_inputs(n_devices): while True: file = random.choice(os.listdir('files')) with GFile('/files/' + file) as f: text = f.read() IDS = TOKENIZER.EncodeAsIds(text)
MultifactorSchedulecontrol the learning rate schedule, which only affects how long training takes and not how much memory is used. You can try running with a little more warmup steps, and more
steps_per_cyclein the cyclic cosine schedule.
my_inputswill let you feed in your own data, and you can tune the model hyperparameters as well