Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Lukasz Kaiser
    @lukaszkaiser
    Welcome to the Trax gitter chat. We're here to help you use Trax!
    Phillip Bock
    @friesel
    Hello Lukasz, I am a fan of t2t and using it extensively. Now with the migration to trax I am a bit confused of where this heads to. Any dosumentation? I also asked this question on the github-issues-page to no avail though.
    do I miss the place where tutorials or guidance are published or is it just still too early? Thx a lot
    Phillip
    Lukasz Kaiser
    @lukaszkaiser
    Hi Phillip! When TensorFlow moved to TF2 we spent some time porting T2T -- but both layers and models heavily rely on variable scope, so it became clear we needed a complete rewrite to make it really TF2-compatible (not just using tf.compat.v1). That when Trax started - since we needed a rewrite, we re-did it all from scratch, focusing way more on making the code clear and documented this time.
    We do not have a proper migration doc yet, as not all T2T models are re-implemented. What model do you care for most? It should be possible to migrate Transformer use-cases now. What was hparams in T2T becomes a gin-config in Trax, the T2T Problem name just gets a "t2t_" prefix.
    Maybe let us know your concrete use-case and we'll help and base a doc on that?
    Afroz Mohiuddin
    @afrozenator
    Hi @friesel - T2T is always going to be there, but for TF 1 only as Lukasz said. Trax and T2T are obviously not at feature parity, but by and large (atleast for now) things are much easy to do in Trax, partly because of JAX, but partly also because this is a new and small codebase -- we would obviously love the advice to prioritize things.
    Vladimir Prelovac
    @vprelovac_gitlab
    Hi all. Just checking out the text gen colab demo. Wondering how would you approach summarization problem on Crime and Punishment with Reformer?
    Afroz Mohiuddin
    @afrozenator
    @vprelovac_gitlab - you could probably concatenate <text> and <summary> with a special token, and mask the loss to only consider the summary tokens. So in summary the target would look like <text><summarize><summary> and set the loss mask to only operate on the summary tokens. Does it make sense? Nikita, the first author is making a Reformer encoder and things will look like seq2seq with that
    Vladimir Prelovac
    @vprelovac_gitlab
    @nikitakit would be curious to learn more about that work. Hope the code gets published. Excited about possiblities of summarzing really long documents!
    Lukasz Kaiser
    @lukaszkaiser
    You can summarize already with Reformer in Trax. As Afroz says: just make inputs of the form <text>[doc]<summary>[summary] and run ReformerLM with loss mask that is 0 on [doc] and 1 on [summary]. A friend played with it on the TFDS scientific papers dataset and it does generate reasonable summaries (even if it was a little repetitive at first try).
    Lukasz Kaiser
    @lukaszkaiser
    :)
    You're in the right place :). Reformer can handle context 500000 on 8GB. Have you checked out our colabs?
    Phillip Bock
    @friesel

    Thx for your replies @lukaszkaiser @afrozenator .

    What we currently do is use t2t-transformer to read technical documents (German text) and output generated summaries (same vocabulary). We pretrain on German webtext, domain-specific tesxt and wikipedia. We use the summary-problem in t2t and have the whole suite running (t2t-datagen, trainer, export and serve).

    We do that in a generative way (not squad-style with start and end token, but really generated text given some text-input).

    I checked into trax and didnt find problem.py (to start with) and the structure of the library was different from what I know from t2t. So currently I am not even sure how to start, especially given python3 and tf2 (and that TPUs on GCP will need to replace our GPUs as it seems).

    So any generative same-language-transformer(reformer) example under trax would be awesome.

    Here is what we do to run t2t currently, starting with a text_problems.Text2TextProblem:

    
    def t2t(args, config):
      model = config.get('Main', 'model')
      problem = config.get('Main', 'problem')
      hparams_set = config.get('Main', 'hparams_set')
      hparams = config.get('Main', 'hparams')
      decode_hparams = config.get('Main', 'decode_hparams')
      data_dir = config.get('Main', 'data_dir')
      tmp_dir = config.get('Main', 'data_dir')
      output_dir = config.get('Main', 'output_dir')
      usr_dir = config.get('Main', 'usr_dir')
      train_steps = config.getint('Main', 'train_steps')
      eval_steps = config.getint('Main', 'eval_steps')
      local_eval_frequency = config.getint('Main', 'local_eval_frequency')
      eval_throttle_seconds = config.getint('Main', 'eval_throttle_seconds')
    
      if args.mode == "train":
        from tensor2tensor.bin.t2t_trainer import FLAGS, main
        FLAGS.t2t_usr_dir = usr_dir
        FLAGS.model = model
        FLAGS.problem = problem
        FLAGS.hparams_set = hparams_set
        FLAGS.hparams = hparams
        FLAGS.train_steps = train_steps
        FLAGS.eval_steps = eval_steps
        FLAGS.local_eval_frequency = local_eval_frequency
        FLAGS.eval_throttle_seconds = eval_throttle_seconds
        FLAGS.data_dir = data_dir
        FLAGS.output_dir = output_dir
        main(sys.argv)
      elif args.mode == "datagen":
        from tensor2tensor.bin.t2t_datagen import FLAGS, main
        FLAGS.t2t_usr_dir = usr_dir
        FLAGS.problem = problem
        FLAGS.data_dir = data_dir
        FLAGS.tmp_dir = tmp_dir
        main(sys.argv)
      elif args.mode == "decode":
        from tensor2tensor.bin.t2t_decoder import FLAGS, main
        FLAGS.t2t_usr_dir = usr_dir
        FLAGS.model = model
        FLAGS.problem = problem
        FLAGS.hparams_set = hparams_set
        FLAGS.hparams = hparams
        FLAGS.decode_hparams = decode_hparams
        FLAGS.data_dir = data_dir
        FLAGS.output_dir = output_dir
        main(sys.argv)
      elif args.mode == "export":
        from tensor2tensor.serving.export import FLAGS, main
        FLAGS.t2t_usr_dir = usr_dir
        FLAGS.model = model
        FLAGS.problem = problem
        FLAGS.hparams_set = hparams_set
        FLAGS.hparams = hparams
        FLAGS.decode_hparams = decode_hparams
        FLAGS.data_dir = data_dir
        FLAGS.output_dir = output_dir
        main(sys.argv)
      elif args.mode == "serve":
        serving_model_name = output_dir.split("/")[-1]
        cmd = [
          "/usr/bin/docker",
          "run",
          "--runtime=nvidia",
          "-p",
          "8500:8500",
          "--mount",
          "type=bind,source=%s/export,target=/models/%s" % (os.path.expanduser(output_dir), serving_model_name),
          "-e",
          "MODEL_NAME=%s" % (serving_model_name),
          "-it",
          "tensorflow/serving:latest-gpu",
          #"--max_batch_size=64"
        ]
        print(("%s" % ' '.join(cmd)))
        subprocess.call(cmd)
      else:
        print("Unknown mode")
    
      tf.app.run()

    Thx a lot
    Phillip

    Phillip Bock
    @friesel
    @lukaszkaiser Maybe the code that friend of yours produced to play around with the TFDS scientific papers dataset (or an example based upon it) could really help, given he same-language-summarized (generative I guess) as well. That might save you a lot of work (as I am sure your work load given tf2 is pretty hefty) and still give me (and those with similar challenges) a good base to work myself through.
    Phúc Lê
    @lkhphuc
    Hi all. As I see from the code there's currently no layers for TransposeConvolution right? Or is that a configuration of the Conv class that I don't know of yet.
    Would you be interested in a PR for TransposeConv and a small AutoEncoder if this is missing?
    Afroz Mohiuddin
    @afrozenator

    @dimeldo - The Reformer architecture, which is implemented in Trax, has a few experiments that you can check out - https://arxiv.org/pdf/2001.04451.pdf

    The novelty is being able to train over longer sequences, so the authors pushed that aspect, I'm sure making it deeper (the reformer is already more memory efficient than other Transformers), doing MLM pre-training (coming, hopefully soon) will push/reach the SOTA in those tasks.

    @friesel - Let me ask the person who did this to maybe make a Colab and share.

    Re: problem.py
    Trax can (and does) consume T2T problems, so in a trax gin config, just do inputs.dataset_name = 't2t_<whatever your problem name is >' and trax should get that as the input.

    Afroz Mohiuddin
    @afrozenator
    @friesel - The one thing we don't have right now (but would like to) are decoding from the model (or do we have that @lukaszkaiser ?) and all the export/serve methods (since these are JAX models, theoretically they can run on TF and we can hopefully get these export/serve flows to work. But they aren't there yet

    @dimeldo - The Reformer architecture, which is implemented in Trax, has a few experiments that you can check out - https://arxiv.org/pdf/2001.04451.pdf

    The novelty is being able to train over longer sequences, so the authors pushed that aspect, I'm sure making it deeper (the reformer is already more memory efficient than other Transformers), doing MLM pre-training (coming, hopefully soon) will push/reach the SOTA in those tasks.

    @nikitakit the first author can comment more here.

    johngrabner
    @johngrabner
    Can trax be used for offline handwriting recognition? I have written a basic Keras data generator and cnn-lstm-ctc model (ie: I'm a newbie) and it gets about 50% accuracy on my 50K word ancient text sample. Excellent for grouping unlabeled words, but sucks in terms of transcription. I started reading about attention to improve accuracy and stumbled on Trax. So is Trax suitable for transcription?
    Afroz Mohiuddin
    @afrozenator

    @johngrabner - There aren't any examples right now of image to text (that is what your task is right? we have image to label using Reformer/Transformer), but one way to proceed ahead would be to add your dataset to TFDS (this should be easy, it has very nice documentation) and then pose it as a text generation problem with the image as input.

    But Trax is a library of deep learning models that allows you to do these kind of things.

    johngrabner
    @johngrabner
    Adding my data to TFDS is the idea. I just need to wait for the prof who did the original transcribing to publish. By "pose it as a text generation problem with the image as input" you mean pose it to the community using the data entered into TFDS?
    any models of image+label to label? (ie: one letter at a time)
    Afroz Mohiuddin
    @afrozenator

    By "pose it as a text generation problem with the image as input" you mean pose it to the community using the data entered into TFDS?

    No, I meant how do you want to solve it

    I just meant to clarify what the input and output were
    johngrabner
    @johngrabner
    input is an image of shape (1024, 128, 1), output shape (128, 72) = (max string length, alphabet size).
    Lukasz Kaiser
    @lukaszkaiser
    @friesel : as Afroz says, Trax can use T2T Problem instances directly. Just set dataset_name in the gin config. Maybe start with some existing config that uses T2T data, like https://github.com/google/trax/blob/master/trax/configs/transformer_lm1b_8gb_testing.gin for LM or https://github.com/google/trax/blob/master/trax/configs/transformer_wmt_ende_16gb_adafactor_testing.gin for translation. Then, in the config file, import your Problem class and just change the inputs line.
    Let us know if this helps please, we really need to clarify how to run on T2T stuff!
    For decoding, just do as in the intro colab (last cell does inference): https://github.com/google/trax/blob/master/trax/intro.ipynb
    Lukasz Kaiser
    @lukaszkaiser
    @dimeldo : yes - usually setting n_hashes to 8 suffices for Reformer to match Transformer (see the Reformer paper for details). We often run with 4 or even 2 hashes as it's faster and for many problems sufficient. Reversibility (without LSH attention) matches standard Transformer each time we tried.
    Phil Wang
    @lucidrains
    hi! thank you for your work on Transformers and Reformers. I had some questions about default hyperparameters set for Reformer. going through the code, I notice that the rotations for LSH could be sampled either randomly or based on the data _data_rotation_farthest. although it was not mentioned in the paper, I was wondering if choosing either of the two made a difference
    I also had the same question for self._allow_duplicate_attention and hard_k
    The deduplicating attention logic is especially memory hungry in a port I am writing for pytorch, so I am wondering how much of it really matters for learning, final accuracy, etc
    Afroz Mohiuddin
    @afrozenator
    Hi @lucidrains - We tried _data_rotation_farthest after submitting the paper, checkout the gin configs under trax/configs some of the reformer configs use _data_rotation_farthest (enwik8 doesn't since _data_rotation_farthest works slightly worse than random for some reason, on the other tasks it works better than random. Maybe @lukaszkaiser and @nikitakit can tell you more about the other settings.
    Phil Wang
    @lucidrains
    @afrozenator thank you Afroz, for sharing your results on _data_rotation_farthest
    nkitaev
    @nkitaev
    Hi all, Nikita here (I think I'm signed in to my other github account at the moment)
    Data-dependent hashing (e.g. _data_rotation_farthest) seems to help a bit for imagenet64 but it performs a bit worse on enwik8. There are also some open questions about how one would sample from such a model, because inference starts with an empty sequence and no way to initialize the data-dependent hash. I would say this option is still in the research/experimental phase.
    nkitaev
    @nkitaev
    Restricting attention to the top-k values (hard_k) never yielded any demonstrable benefits in our experiments. At the start of the project we had this idea that if one could identify the top-k attention targets for each query, then doing sparse attention within the top-k only would be faster than dense attention. The problem is that modern hardware is designed for dense computation to such a large degree that the sparse version usually ends up being slower (sometimes substantially slower).
    We kept de-duplication enabled for the paper because it matches the motivation and mathematical derivations that we present, but I have no evidence that it actually makes a difference for accuracy. These days I tend to turn it off because it slows down training. Same thing for the option that restricts attention across adjacent buckets.
    Phil Wang
    @lucidrains
    thank you Nikita. I will turn off those settings and keep a watchful eye for any hard figures in the final paper
    thanks again for all your hard work
    Pranav Mahajan
    @PranavMahajan25

    Hi, I am trying to get acquainted with Trax through the quickstart notebooks and wanted to use trax/rl. How do I use ppo_trainer with a custom gym env? I looked at trax/rl/ppo_trainer_test.py for reference.
    (Issues similar to this: https://colab.research.google.com/drive/1TnYMIt7Zm-iCN-Az3jeO8QoIpQ7YvHiD)

    Also I eventually want to build a simple DQN which can use transformer_decoder as the model, how should I go about, does transformer always expect inputs as in trax/rl/supervised/inputs.py ? How do I include states and actions both in the training stream? Any guidance/resources would be very helpful, TIA.

    Afroz Mohiuddin
    @afrozenator

    Hi @PranavMahajan25 - thanks for trying it out! We'd be very interested in taking the colab as an example of RL once it works.

    I ran the colab and it doesn't error out for me! So looks like you got it working (maybe don't use the same object with a different net since there may have been some caching of weights it looks like?)

    PS: I like the idea of starting with the test and modifying it in place to get what you want :)

    Re: states and actions both in traning stream @koz4k added some code for doing similar things in the rl/ directory, mostly related to simple.py. That part of the code is under heavy development, we want to try something similar as well.

    PPS: The colab uses trax 1.0.0, maybe upgrade to the latest 1.2.2?

    -

    Suraj Patil
    @patil-suraj
    Hello, is it possible to use Reformer for question answering task ? The input could be a whole chapter of the book. Is Reformer suited for this kind of task ?
    Pranav Mahajan
    @PranavMahajan25
    Thanks for your reply! @afrozenator. I would love to contribute to such an example, if it works out well.
    You were right, the error was because I used the same object with a different net. I'll explore the functions related to training streams and mixing streams from SimPLe. Thanks again!
    Afroz Mohiuddin
    @afrozenator
    Hi @patil-suraj - you could probably concatenate <document text>, <query> and <answer> with a special token, and mask the loss to only consider the answer tokens. So ultimately the target would look like <document text><sep1><query><sep2><answer> and set the loss mask to only operate on the answer tokens. Does it make sense? @nkitaev is making a Reformer encoder and things will look like seq2seq with that.