Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • May 05 2021 16:41
    guillaumekln transferred #580
  • May 05 2021 16:22
    hangcao1004 opened #580
  • Apr 16 2021 02:57
    raymondhs closed #400
  • May 29 2020 07:14
    guillaumekln closed #579
  • May 29 2020 07:14
    guillaumekln commented #579
  • May 28 2020 20:37
    Roweida-Mohammed edited #579
  • May 28 2020 20:36
    Roweida-Mohammed opened #579
  • Feb 19 2020 16:08
    codecov-io commented #578
  • Feb 19 2020 16:08

    guillaumekln on master

    Updating the intel-mkl URL. (#5… (compare)

  • Feb 19 2020 16:08
    guillaumekln closed #578
  • Feb 19 2020 16:08
    guillaumekln commented #578
  • Feb 19 2020 15:59
    arturgontijo opened #578
  • Feb 12 2020 17:58
    melindaloubser1 closed #553
  • Feb 12 2020 17:58
    melindaloubser1 commented #553
  • Dec 13 2019 08:43
    guillaumekln transferred #574
  • Dec 13 2019 08:43
    guillaumekln transferred #577
  • Dec 13 2019 08:14
    tkngoutham edited #577
  • Dec 13 2019 08:13
    tkngoutham opened #577
  • Dec 13 2019 06:37
    tkngoutham commented #574
  • Oct 09 2019 11:36

    guillaumekln on master

    Add CTranslate2 Change project cards title (compare)

chiting765
@chiting765
It is a pretty narrow domain, the vocabulary is not very small, it has about 40K - 50K vocabulary in total without bpe
and the BLEU score is for the whole test data
zeng
@xjtu-zeng
Hi everyone. I have a question about the StdRNNDecoder, why the rnn and attn can be seperated? The next hidden state needs the context computed by attn. I am confused
@jsenellart
Vincent Nguyen
@vince62s
Just in case one did not notice, but there is a huge performance difference between Cuda 8.0.61 and 8.0.61 patch 2 for the best (I saw about 25% difference)
Jean Senellart
@jsenellart
Registration for first OpenNMT workshop is open! Check here: http://workshop-paris-2018.opennmt.net :)...
ykasimov
@ykasimov
Hi. There is no support for copy attention yet in Python version, right?
Konstantin Glushak
@gsoul
Which of python versions did you mean?
ykasimov
@ykasimov
do you mean python version?
Konstantin Glushak
@gsoul
no, OpenNMT version: OpenNMT-py or OpenNMT-tf?
ykasimov
@ykasimov
ah, sorry. OpenNMT-py. Forgot that there is tf version
Konstantin Glushak
@gsoul
I’m not sure, but perhaps it’s better to ask this question in OpenNMT-py channel? https://gitter.im/OpenNMT/OpenNMT-py
ykasimov
@ykasimov
Thanks.
Konstantin Glushak
@gsoul
np
Ratish Puduppully
@ratishsp

In GlobalAttention.lua, we have the following lines of code
local softmaxAttn = nn.SoftMax() softmaxAttn.name = 'softmaxAttn'

Why don't we set softmaxAttn as an output of nn.gModule like return nn.gModule(inputs, {contextOutput, softmaxAttn(attn)})

Jean Senellart
@jsenellart
what for?
it is not used later
but we name it, so that we can find it by traversing the graph
Ratish Puduppully
@ratishsp
Ok. I was trying to understand the design difference between the two: when should we set it as an output of nn.gModule and when should we not.
Jean Senellart
@jsenellart
the gModule is very powerful but also very complicated - you can not easily tweak it
Ratish Puduppully
@ratishsp
Ok.
I guess if it is output of nn.gModule, then we should manage its backpropagation with gradients too.
Jean Senellart
@jsenellart
yes exactly - for the attention, we are just accessing for visualization of the state
Ratish Puduppully
@ratishsp
Thanks @jsenellart for the details.
sathiyan7987
@sathiyan7987
guys i need AI related project for my final project can you please help me
stribizhev
@stribizhev
Hi, I am using OpenNMT0.9 Lua version on an AWS server with Tesla K80 GPU and Ubuntu 16.04 OS. I ran two trainings in the background and they seem to have been working fine (I used -log_file option, and checked if the log is growing), but once I ran a release model command, at first, I got a segmentation fault core dump message, and when I ran it the second time, the two trainings I mentioned exited. The error log says THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-1460/cutorch/lib/THC/generic/THCStorage.c line=32 error=39 : uncorrectable ECC error encountered /torch/install/bin/luajit: cuda runtime error (39) : uncorrectable ECC error encountered at /tmp/luarocks_cutorch-scm-1-1460/cutorch/lib/THC/generic/THCStorage.c:32. Do I have to be cautious with GPU usage? Is there any documentation/known ways to handle multiple processes using a GPU?
stribizhev
@stribizhev
Ok, I found http://opennmt.net/OpenNMT/issues/ page, and it seems I can try to disable the caching CUDA memory allocator, but won't it make trainings last much longer (as on CPU)? As for the reducing network size, if my corpus is 1,760K segments, aren't the default values OK ( word embeddings size: 500, structure: cell = LSTM; layers = 2; rnn_size = 500; dropout = 0.3)?
Ayushi Dalmia
@ayushidalmia
What does input_feed = true do during training ? I am not able to understand the architecture diagram if it is set to true.
Ayushi Dalmia
@ayushidalmia
Also, if my encoder_type is "rnn" do the value of the following parameters matter:
brnn_merge
Guillaume Klein
@guillaumekln
With input feeding, the output of the attention layer at t is concatenated to the word embedding input at t + 1. cf. https://arxiv.org/abs/1508.04025
The brnn_merge option only applies to bidirectional layer, which the rnn encoded type does not have.
@stribizhev Disabling the CUDA caching memory allocator will have a small impact on training time, at most 10% I would say (I did not measure it though). Regarding the network size, people frequently add these options : -encoder_type brnn -layers 4 -rnn_size 1000.
Ayushi Dalmia
@ayushidalmia
Thanks!
stribizhev
@stribizhev
@guillaumekln Hi Guillaume, thank you for feedback. I have kept the CUDA CMA enabled and the training completed in 1.5 days. The frequently used options are use frequently, I think, because these are the ones used in WMT16 training tutorial. The main problem for me with the network settings is I can't find any guidelines on how to calculate the size. I see that people mostly rely on gut feeling here. http://forum.opennmt.net/t/issues-when-running-the-english-german-wmt15-training/228/19?u=wiktor.stribizew and http://forum.opennmt.net/t/how-should-i-choose-parameters/994 prove that it is not so evident. I used -layers 4 -rnn_size 1000 -encoder_type brnn -word_vec_size 600 for the next training (ENJA), with 4M segments, and Epoch 1 has been training for ~2 days. Maybe it is OK, but it also takes a lot of GPU memory, and it constantly crashes out with "out of memory" errors.
stribizhev
@stribizhev
Ah, as for the out of memory, it turned out that after nohup background process crashed, /torch/install/bin/luajit process for that training was still working. Strange.
Ayushi Dalmia
@ayushidalmia
@guillaumekln I have a torch model trained using OpenNMT. Can I load it in OpenNMT tensorflow?
Guillaume Klein
@guillaumekln
No, the two projects have very different internals.
Ayushi Dalmia
@ayushidalmia
Ok. Thanks @guillaumekln.
Jacker
@jackeri
Any ideas why I am
getting error loading module 'tools.restserver.restserver' from file 'tools/restserver/restserver': cannot read tools/restserver/restserver: Is a directory
when I am trying to startup the rest server
Guillaume Klein
@guillaumekln
Hello! Which version of the code are you using?
SantoshSanas
@SantoshSanas
What will happen if I am not use tokenization in OpenNMT
Jean Senellart
@jsenellart
Hi Santosh - nothing bad! you can use external tokenization, but you do need to have detokenization too
SantoshSanas
@SantoshSanas
Thanks Jean!!!
Shivani Poddar
@shivanipoddariiith
Hi, I was hoping to add context vectors for each source sentence to condition the decoder output in the baseline Encoder-Decoder Architecture for ONMT. The task to integrate these in the TextDataset is seemingly non-trivial. Does anyone have ideas on what would be a simplified design choice to achieve this?
Thanks a lot
fgaim
@fgaim
@guillaumekln In regards to target side word features (factored MT), the docs for OpenNMT (lua) say the decoder predicts the features of the decoded sentence. So, is the prediction loss being backpropagated along with translation loss, and hence eventually improve both the translation and feature prediction tasks? OR does the translation in any way benefit from target side features in the current implementation? Thanks!
Guillaume Klein
@guillaumekln

is the prediction loss being backpropagated along with translation loss, and hence eventually improve both the translation and feature prediction tasks?

Yes, the losses are summed together before backprogpagation.

stribizhev
@stribizhev
@guillaumekln @jsenellart Is there a way to combine two BPE models? The reason is I have a generic model with its BPE model, then I train an incremental model using new corpora for which I create a new BPE model. When running incremental training, I use -update_vocab merge. If the vocabulary is merged, I guess I should also merge the BPE models to use when translating, right?
Vincent Nguyen
@vince62s
don't create a new bpe model, just tokenize with the first one.
Ayushi Dalmia
@ayushidalmia
Hi, Is there a way to generate the attention map for a given torch model?