Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    srush
    @srush
    since the input is bigger
    Mattia Di Gangi
    @mattiadg
    ok
    Have you tried without context gate?
    srush
    @srush
    tried what?
    Mattia Di Gangi
    @mattiadg
    tried to train a model without context gate and then translating with that model?
    srush
    @srush
    oh, yes
    Mattia Di Gangi
    @mattiadg
    ok, perfect
    srush
    @srush
    so for language modeling
    Mattia Di Gangi
    @mattiadg
    Yes, I need to use also language models
    srush
    @srush
    why is it necessary to modify anything?
    Mattia Di Gangi
    @mattiadg
    what do you mean?
    srush
    @srush
    couldn't we just add "null attention"
    and then add sampling to the translate.py?
    Mattia Di Gangi
    @mattiadg
    well, yes, that's a language model
    srush
    @srush
    sure, but there is a lot code changes
    would love to just have it be simple
    (sorry, have to run, be back shortly)
    Mattia Di Gangi
    @mattiadg
    ok, I also need to leave, so I write my impressions and then I'll come back
    srush
    @srush
    sounds great
    Mattia Di Gangi
    @mattiadg
    The language model can be a model with a decoder with null attention and no encoder. Most of the code to modify would be in the data management part, because it should take monolingual data and then create batches where the source side has the BOS symbol and the target side has the EOS symbol. But, the target is needed only after the forward step for computing the loss. I should check the new code because I don't know how the new Batch class works.
    I mean, if it can be easily adapted for that.
    srush
    @srush
    hmm
    so now we have "truncated_decoding"
    so I think you could just put all the data maybe #batch size examples
    with <eos> sprinkled throughout
    and then have just one symbol <blank> on the source side
    I don't think you would need to change any of the preprocess code
    Mattia Di Gangi
    @mattiadg
    sorry, I went for dinner
    I live in Europe, so it's quite late here
    Mattia Di Gangi
    @mattiadg
    Then, the main thing to add the language model would be to create another Model class instead of NMTModel that ignores the source side of the batch.
    srush
    @srush
    no, I don't think you even need to do that
    just have an option "no_attention" that skips the attention step
    and then just structure your data as 256 lines of multisentence target each aligned with a single source word
    and set truncated_decoder to 100 or so
    that will replicate an LM pretty much exactly
    Mattia Di Gangi
    @mattiadg
    well, this way it shouldn't require many modifications
    but something in the preprocess.py should be done, at least to relax the required constraint of the two parameters -src_train and -tgt_train
    srush
    @srush
    why, it just adds complications
    Mattia Di Gangi
    @mattiadg
    what data would you pass? it needs two files with the same number of lines. Should we create another file with one word per line?
    srush
    @srush
    yeah, that was what I was suggesting above
    just the symbol <blank> on each line
    Mattia Di Gangi
    @mattiadg
    ok, it seems fine
    I think I can work on it by tomorrow
    srush
    @srush
    no hurry, just want to keep the code simple when possible
    Mattia Di Gangi
    @mattiadg
    ok perfect
    thank you for your time
    Mattia Di Gangi
    @mattiadg
    Hi, I still have not implemented it both because I am working for some deadlines now, and also because of the big number of issues that are coming. I think this is not the right time to add features.
    srush
    @srush
    okay, sounds good
    sathiyan7987
    @sathiyan7987
    guys i need AI related project for my final project can you please help me
    srush
    @srush
    build a robot