Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Paul Tardy
    @pltrdy
    which, to my mind, are related to the "pointer" thing
    srush
    @srush
    I see, I will have to look closer
    Paul Tardy
    @pltrdy
    src: Nallapati, 2016
    (+) Nallapati is using LVT, not sure if it's in ONMT atm
    srush
    @srush
    this is the dataset though not the model
    anyway we can use see's preprocessing now
    I forked a version here
    Paul Tardy
    @pltrdy
    It sounds ambiguous to me. Not sure how the "@entity" things are produced
    srush
    @srush
    either way, See ignores this
    Both the dataset’s published results
    (Nallapati et al., 2016, 2017) use the anonymized
    version of the data, which has been pre-processed
    to replace each named entity, e.g., The United Nations,
    with its own unique identifier for the example
    pair, e.g., @entity5. By contrast, we operate
    directly on the original text (or non-anonymized
    version of the data),2 which we believe is the favorable
    problem to solve because it requires no
    pre-processing.
    She also has a simple seq2seq baseline seq-to-seq + attn baseline (50k vocab)
    Paul Tardy
    @pltrdy
    Sure.
    Plus, her work is clearly explained, comes with discussions + blog post
    and implementation
    srush
    @srush
    okay, cool. so I am going to start a run with her params (but sgd training)
    srush
    @srush
    interesting they truncate source sentences (we should have that as an option)
    Paul Tardy
    @pltrdy
    did you get interesting results? I must say I'm puzzled
    srush
    @srush
    it's training now
    why are you puzzled?
    I'm training the 50k baseline then implemented her features in opennmt-py
    Paul Tardy
    @pltrdy
    For some reason her training script does not uses the GPU. It is recongnized, the process alocate memory on it, but no load. Minor problem tho
    srush
    @srush
    oh I see, that is annoying
    tensorflow seems to magically know to use the gpu
    Paul Tardy
    @pltrdy
    In theory yes, it is straightforward. It looks like the device is instead hardcoded, which to my mind is against tensorflow's principles (at least add a kind of -gpu flag or even leave the user set his CUDA_VISIBLE_DEVICE)
    Strange. But solved.
    srush
    @srush
    okay, I'll send over my results this afternoon after 15 epochs
    srush
    @srush
    do they report perp?
    I'm down to ~16
    Paul Tardy
    @pltrdy
    you're talking about opennmt-py training?
    srush
    @srush
    yes
    Paul Tardy
    @pltrdy
    abisee's script does not log epochs nor ppl
    srush
    @srush
    that's weird
    okay, well I'll just check rouge
    Paul Tardy
    @pltrdy
    the script is just looping over data infinitely and output loss
    srush
    @srush
    research code
    Paul Tardy
    @pltrdy
    Is your model using coverage attention?
    srush
    @srush
    no, I'm just doing the 50k baseline
    Paul Tardy
    @pltrdy
    ok
    srush
    @srush
    but I plan to implement all that stuff
    Paul Tardy
    @pltrdy
    great
    is the dev already on github?
    srush
    @srush
    I added some features, but I haven't started on coverage atten yet
    I want to do copying first
    srush
    @srush
    my outputs are pretty terrible at the moment
    Paul Tardy
    @pltrdy
    hey @srush , have you got interesting results?
    srush
    @srush
    yes, let's see
    we were able to replicate the baseline results (we think)
    and I have copying mostly implemented
    coverage should not be so bad