Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Afroz Mohiuddin
    @afrozenator

    @johngrabner - There aren't any examples right now of image to text (that is what your task is right? we have image to label using Reformer/Transformer), but one way to proceed ahead would be to add your dataset to TFDS (this should be easy, it has very nice documentation) and then pose it as a text generation problem with the image as input.

    But Trax is a library of deep learning models that allows you to do these kind of things.

    johngrabner
    @johngrabner
    Adding my data to TFDS is the idea. I just need to wait for the prof who did the original transcribing to publish. By "pose it as a text generation problem with the image as input" you mean pose it to the community using the data entered into TFDS?
    any models of image+label to label? (ie: one letter at a time)
    Afroz Mohiuddin
    @afrozenator

    By "pose it as a text generation problem with the image as input" you mean pose it to the community using the data entered into TFDS?

    No, I meant how do you want to solve it

    I just meant to clarify what the input and output were
    johngrabner
    @johngrabner
    input is an image of shape (1024, 128, 1), output shape (128, 72) = (max string length, alphabet size).
    Lukasz Kaiser
    @lukaszkaiser
    @friesel : as Afroz says, Trax can use T2T Problem instances directly. Just set dataset_name in the gin config. Maybe start with some existing config that uses T2T data, like https://github.com/google/trax/blob/master/trax/configs/transformer_lm1b_8gb_testing.gin for LM or https://github.com/google/trax/blob/master/trax/configs/transformer_wmt_ende_16gb_adafactor_testing.gin for translation. Then, in the config file, import your Problem class and just change the inputs line.
    Let us know if this helps please, we really need to clarify how to run on T2T stuff!
    For decoding, just do as in the intro colab (last cell does inference): https://github.com/google/trax/blob/master/trax/intro.ipynb
    Lukasz Kaiser
    @lukaszkaiser
    @dimeldo : yes - usually setting n_hashes to 8 suffices for Reformer to match Transformer (see the Reformer paper for details). We often run with 4 or even 2 hashes as it's faster and for many problems sufficient. Reversibility (without LSH attention) matches standard Transformer each time we tried.
    Phil Wang
    @lucidrains
    hi! thank you for your work on Transformers and Reformers. I had some questions about default hyperparameters set for Reformer. going through the code, I notice that the rotations for LSH could be sampled either randomly or based on the data _data_rotation_farthest. although it was not mentioned in the paper, I was wondering if choosing either of the two made a difference
    I also had the same question for self._allow_duplicate_attention and hard_k
    The deduplicating attention logic is especially memory hungry in a port I am writing for pytorch, so I am wondering how much of it really matters for learning, final accuracy, etc
    Afroz Mohiuddin
    @afrozenator
    Hi @lucidrains - We tried _data_rotation_farthest after submitting the paper, checkout the gin configs under trax/configs some of the reformer configs use _data_rotation_farthest (enwik8 doesn't since _data_rotation_farthest works slightly worse than random for some reason, on the other tasks it works better than random. Maybe @lukaszkaiser and @nikitakit can tell you more about the other settings.
    Phil Wang
    @lucidrains
    @afrozenator thank you Afroz, for sharing your results on _data_rotation_farthest
    nkitaev
    @nkitaev
    Hi all, Nikita here (I think I'm signed in to my other github account at the moment)
    Data-dependent hashing (e.g. _data_rotation_farthest) seems to help a bit for imagenet64 but it performs a bit worse on enwik8. There are also some open questions about how one would sample from such a model, because inference starts with an empty sequence and no way to initialize the data-dependent hash. I would say this option is still in the research/experimental phase.
    nkitaev
    @nkitaev
    Restricting attention to the top-k values (hard_k) never yielded any demonstrable benefits in our experiments. At the start of the project we had this idea that if one could identify the top-k attention targets for each query, then doing sparse attention within the top-k only would be faster than dense attention. The problem is that modern hardware is designed for dense computation to such a large degree that the sparse version usually ends up being slower (sometimes substantially slower).
    We kept de-duplication enabled for the paper because it matches the motivation and mathematical derivations that we present, but I have no evidence that it actually makes a difference for accuracy. These days I tend to turn it off because it slows down training. Same thing for the option that restricts attention across adjacent buckets.
    Phil Wang
    @lucidrains
    thank you Nikita. I will turn off those settings and keep a watchful eye for any hard figures in the final paper
    thanks again for all your hard work
    Pranav Mahajan
    @PranavMahajan25

    Hi, I am trying to get acquainted with Trax through the quickstart notebooks and wanted to use trax/rl. How do I use ppo_trainer with a custom gym env? I looked at trax/rl/ppo_trainer_test.py for reference.
    (Issues similar to this: https://colab.research.google.com/drive/1TnYMIt7Zm-iCN-Az3jeO8QoIpQ7YvHiD)

    Also I eventually want to build a simple DQN which can use transformer_decoder as the model, how should I go about, does transformer always expect inputs as in trax/rl/supervised/inputs.py ? How do I include states and actions both in the training stream? Any guidance/resources would be very helpful, TIA.

    Afroz Mohiuddin
    @afrozenator

    Hi @PranavMahajan25 - thanks for trying it out! We'd be very interested in taking the colab as an example of RL once it works.

    I ran the colab and it doesn't error out for me! So looks like you got it working (maybe don't use the same object with a different net since there may have been some caching of weights it looks like?)

    PS: I like the idea of starting with the test and modifying it in place to get what you want :)

    Re: states and actions both in traning stream @koz4k added some code for doing similar things in the rl/ directory, mostly related to simple.py. That part of the code is under heavy development, we want to try something similar as well.

    PPS: The colab uses trax 1.0.0, maybe upgrade to the latest 1.2.2?

    -

    Suraj Patil
    @patil-suraj
    Hello, is it possible to use Reformer for question answering task ? The input could be a whole chapter of the book. Is Reformer suited for this kind of task ?
    Pranav Mahajan
    @PranavMahajan25
    Thanks for your reply! @afrozenator. I would love to contribute to such an example, if it works out well.
    You were right, the error was because I used the same object with a different net. I'll explore the functions related to training streams and mixing streams from SimPLe. Thanks again!
    Afroz Mohiuddin
    @afrozenator
    Hi @patil-suraj - you could probably concatenate <document text>, <query> and <answer> with a special token, and mask the loss to only consider the answer tokens. So ultimately the target would look like <document text><sep1><query><sep2><answer> and set the loss mask to only operate on the answer tokens. Does it make sense? @nkitaev is making a Reformer encoder and things will look like seq2seq with that.
    Also scroll a little up where we discuss almost the same thing wrt summarization and see what Lukasz had to say as well
    Christopher Beitel
    @cwbeitel
    Just wanted to chime in to say we are really excited to be making use of Trax in the Project Clarify (https://github.com/projectclarify/clarify) codebase and would like to invite anyone with experience with Trax or Jax and interest in mentoring to come give a tutorial session at one of our upcoming hackathon/training days (Jan 25th and Feb 29): https://forms.gle/oFWkN7UuAxS7NUGJ9 Especially you @afrozenator, didn't have your email to send you an invite. Looking forward to adding value to Trax as we get up to speed with using it.
    NGamma
    @NGamma
    Hi! I'm running the Reformer example, incredible results on text generation. Could you point me in the right direction on how to modify the colab so it can take a dataset with many example .txt files of same text length 0.5M tokens on a single TPU?
    nkitaev
    @nkitaev
    Thanks! If you're running the colab example, you can modify the my_inputs to yield a different example each time.
    Right now IDS and PAD_AMOUNT are global constants that are constructed by loading a txt file of Crime and Punishment. You can instead load multiple txt files, apply the tokenizer, and then generate corresponding token ids for each one
    NGamma
    @NGamma
    @nkitaev I think I got it. I'll try that! Thank you!
    Suraj Patil
    @patil-suraj
    @afrozenator Yes it makes sense. Looking forward to the Reformer Encoder. Will try this approach out and if successful will post a tutorial about it. Thank you!
    Also I've recently came across Trax and it seems very interesting and clean. But I'm wondering why do we need another DL library when we already have TF2.0, also what are the advantages of Trax over TF2.0 and what are the design goals for Trax. @afrozenator @lukaszkaiser would like to know your thoughts on this
    AlDante
    @AlDante

    Hi, thank you for making Reformer and Trax available.

    I have a question regarding the TPU Crime and Punishment example. The language model obviously learns made-up words - scandlchedness , raggong, innatummed , quisten... Some great words there, but...

    Is this an artifact of the hashing, or what do you think causes it?

    nkitaev
    @nkitaev
    The hyperparameters in that demo are designed to run in about half an hour, not to yield optimal results
    It's very close to a character-level language model (only 320 basic tokens -- just enough to cover characters, character pairs, and maybe a few frequent words). It's also only training for 600 steps with minimal regularization.
    For comparison, BERT trains for 1 million steps, with a fair bit of dropout, on more diverse data
    AlDante
    @AlDante
    @nkitaev Thank you very much for clarifying.
    One more question, if I may - are there any examples using Reformer for question answering?
    Jérémie C. Wenger
    @jchwenger
    I'm also very interested in using the Reformer for text generation. What would be your advice for feeding in textual data which is longer than the example given in the Colab? So far I ran into errors trying to modify the architecture even slightly (to test the limit of the TPU memory), and went for the option of selecting a random slice of the same length as the original input, even dropping the padding mechanism entirely. Is it how you would do it on a large corpus?
    I'm also curious to know how easy it would be to train on several TPUs: would it work out of the box?
    Suraj Patil
    @patil-suraj
    @AlDante Hey, nice to that you are also interested in question answering. I am also trying to work out a demo with Reformer. Let me know if you come with anything. I will post the colab here if I finish it successfully.
    AlDante
    @AlDante
    @patil-suraj Hi Suraj, sure, it would be great to share experiences.
    Mahesh Bhole
    @mahesh21aug_twitter
    @AlDante, @patil-suraj I am also looking for QA for whole document rather than paragraph :)
    AlDante
    @AlDante
    @mahesh21aug_twitter Join the club :-)
    Jérémie C. Wenger
    @jchwenger
    I'd also like to add that the combination of LSH and reversible layers was mind-blowing, and bearing huge promise! Especially given the amount of work I've seen people doing in order to fit recent language models in memory.
    NGamma
    @NGamma
    def my_inputs(n_devices):
      while True:
        file = random.choice(os.listdir('files'))
        with GFile('/files/' + file) as f:
          text = f.read()
        IDS = TOKENIZER.EncodeAsIds(text)
    @nkitaev I'm using this to feed the multiple text files. Do you think I can tweak any of the hyparameters in the parse_config to run the model longer than half an hour without running into memory issues?
    Suraj Patil
    @patil-suraj
    @nikitakit @NGamma also how to batchify my_inputs ?
    Suraj Patil
    @patil-suraj
    Sorry, a better question is how to create batches for feeding the TPU, as in the original Reformer example each TPU core is running one example, so instead of that can we run batches of different examples