Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Vincent Nguyen
    @vince62s
    oh "not working" as "not working at all" ?
    srush
    @srush
    yes, that's what it means
    Vincent Nguyen
    @vince62s
    ok ok
    srush
    @srush
    haha, sorry, just give us a bit of patience
    it is non-trivial to get this working, and it seems to be quite sensitive to small details
    but if you want to try it out, the link I posted works on the parsing data
    it is a tad faster than RNN, but uses 3-4x more memory
    Vincent Nguyen
    @vince62s
    I'll wait, got many other stuff to train :)
    Yu-Hsiang Huang
    @jadore801120
    Hi @srush , you are right. There seems no need for convolution.
    And thanks for the PTB data!
    Vincent Nguyen
    @vince62s
    Sasha, any good results from your training ?
    srush
    @srush
    it's still a bit behind RNN on summary
    but I stopped tweaking, trying to get my code checked in
    Vincent Nguyen
    @vince62s
    Sasha, do you have any good results on the pytorch transformer model ?
    srush
    @srush
    yeah... we are starting to get some
    it seems to get worse ppl, but produce relatively good results
    Vincent Nguyen
    @vince62s
    Vincent Nguyen
    @vince62s
    @srush is the py-onmt transformer "working" ?
    srush
    @srush
    yes, but not yet documented
    Vincent Nguyen
    @vince62s
    did you run some known baselines ?
    srush
    @srush
    yeah, we have been running on parsing
    (and some internal stuff)
    Vincent Nguyen
    @vince62s
    any wmt like the paper ?
    srush
    @srush
    I haven't yet, if you have the data though, happy to run it
    honestly I was just too lazy to get that setup.
    Vincent Nguyen
    @vince62s
    I sent you the link. This is official data wmt16 (cleaned) + Rico's back translation, so this is only to run from DE to EN. With Lua onmt, I get 38.28 on newstest2016.
    srush
    @srush
    but this is different than the data they ran on right?
    Vincent Nguyen
    @vince62s
    "they" ? this is the same as Rico's run and the other guy vhoang
    srush
    @srush
    in the Attention is all you need paper
    Vincent Nguyen
    @vince62s
    oh no
    then don't take the back translations and it's ok
    srush
    @srush
    okay, I'll give it a try
    Vincent Nguyen
    @vince62s
    but then do the other way EN to DE
    I am running the TF version EN to DE
    srush
    @srush
    oh great, we can compare
    Vincent Nguyen
    @vince62s
    but there are some weird things regarding the learning rate
    srush
    @srush
    yeah, it's quite strange
    but I implemented theirs exactly if you tell me the hyperparams you use
    Vincent Nguyen
    @vince62s
    the "noam" decay , you took the paper version I saw
    srush
    @srush
    yeah
    Vincent Nguyen
    @vince62s
    in the TF version, it is multiplied by 5000, go figure
    I asked why
    srush
    @srush
    it's even more complicated than that
    the default is multiply by 5000 by they also have a problem specific multiplier
    Vincent Nguyen
    @vince62s
    do you recall where you saw that ?
    Vincent Nguyen
    @vince62s
    Also, I don't know if you handle batches in number of sentences (as onmt) or in number of tokens, but with the TF transformer I observe the opposite vs onmt: the bigger the batchsize, the better in terms of results.
    srush
    @srush
    we do sentences but we can switch, do they do tokens?
    Vincent Nguyen
    @vince62s
    yes they do.
    srush
    @srush
    I'll put that on the roadmap
    cemyr
    @cemyr_gitlab
    Hey guys..great to be here