It sounds ambiguous to me. Not sure how the "@entity" things are produced
srush
@srush
either way, See ignores this
Both the dataset’s published results (Nallapati et al., 2016, 2017) use the anonymized version of the data, which has been pre-processed to replace each named entity, e.g., The United Nations, with its own unique identifier for the example pair, e.g., @entity5. By contrast, we operate directly on the original text (or non-anonymized version of the data),2 which we believe is the favorable problem to solve because it requires no pre-processing.
She also has a simple seq2seq baseline seq-to-seq + attn baseline (50k vocab)
Paul Tardy
@pltrdy
Sure.
Plus, her work is clearly explained, comes with discussions + blog post
and implementation
srush
@srush
okay, cool. so I am going to start a run with her params (but sgd training)
srush
@srush
interesting they truncate source sentences (we should have that as an option)
Paul Tardy
@pltrdy
did you get interesting results? I must say I'm puzzled
srush
@srush
it's training now
why are you puzzled?
I'm training the 50k baseline then implemented her features in opennmt-py
Paul Tardy
@pltrdy
For some reason her training script does not uses the GPU. It is recongnized, the process alocate memory on it, but no load. Minor problem tho
srush
@srush
oh I see, that is annoying
tensorflow seems to magically know to use the gpu
Paul Tardy
@pltrdy
In theory yes, it is straightforward. It looks like the device is instead hardcoded, which to my mind is against tensorflow's principles (at least add a kind of -gpu flag or even leave the user set his CUDA_VISIBLE_DEVICE)
Strange. But solved.
srush
@srush
okay, I'll send over my results this afternoon after 15 epochs
srush
@srush
do they report perp?
I'm down to ~16
Paul Tardy
@pltrdy
you're talking about opennmt-py training?
srush
@srush
yes
Paul Tardy
@pltrdy
abisee's script does not log epochs nor ppl
srush
@srush
that's weird
okay, well I'll just check rouge
Paul Tardy
@pltrdy
the script is just looping over data infinitely and output loss
srush
@srush
research code
Paul Tardy
@pltrdy
Is your model using coverage attention?
srush
@srush
no, I'm just doing the 50k baseline
Paul Tardy
@pltrdy
ok
srush
@srush
but I plan to implement all that stuff
Paul Tardy
@pltrdy
great
is the dev already on github?
srush
@srush
I added some features, but I haven't started on coverage atten yet
I want to do copying first
srush
@srush
my outputs are pretty terrible at the moment
Paul Tardy
@pltrdy
hey @srush , have you got interesting results?
srush
@srush
yes, let's see
we were able to replicate the baseline results (we think)