Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Siddarth Sampangi
@ssampang
and theres no equivalent module to change the view on my target
elbamos
@elbamos
@ssampang are you using a convert module? or failing to set the correct view on the target when you load the batch?
Siddarth Sampangi
@ssampang
@elbamos when you ask about the convert module, are you referring to the output_module parameter I can set? and I believe I'm setting the views alright. everything runs if I don't use feedback. furthermore, I found this line in confusion.lua:
local tgt = batch:targets():forward('b')
so I don't think the view I set is used right?
elbamos
@elbamos
no i mean, in the dpnn package, there's a module nn.Convert. it takes two parameters, which are view definitions, like 'bhwc', 'bcwh'. You put one as the first layer of your network with parameters (ds:ioShapes(), whateveryouassumedhwneyoumadeyournet)
i remember having similar problems with Confusion; I think perhaps i was one-hot encoding when it was expecting class id's, or i was using class id's when it was expecting one-hot? I remember it took about 45 minutes to fix
Siddarth Sampangi
@ssampang
sorry, I'm familiar with the nn.Convert module, but I guess I don't understand where exactly you're telling me to add it
to the network itself?
elbamos
@elbamos
yes, at the start. and take a close look at the examples in dp that use Confusion - are they assuming one-hot, or class id's?
Siddarth Sampangi
@ssampang
hmm, but how would changing the network input help with the Confusion? Isn't the confusion calculated at the very end, after the criterions? also, the example I'm basing my code off of uses class id's - only difference is it doesn't operate on sequences
elbamos
@elbamos
you're right, but on the other hand, dp is weird. i'm running through in my head all the things i had to fix when i first started using dp and got confusion to work
cjviper
@cjviper
what batch sizes and image sizes do people on here use when training CNNs? I'm finding I have to use batch sizes of 32 or less with image sizes of 128x128 for a training set size of around 140k otherwise I get out of memory errors. Hardware:Tesla K80 with 64GB RAM
Siddarth Sampangi
@ssampang
@cjviper is your network really large?
and are you running out of cpu memory or gpu memory?
Im guessing gpu?
cjmcmurtrie
@cjmcmurtrie
Is the implementation of nn.SoftMaxTree() in nnx confirmed and known to be correct? I don't quite understand what's happening there. Firstly, nn.TreeNLLCriterion() does not return meaningful gradients(it distributes -1 along the batch size). This makes sense so long as the gradients do not influence nn.SoftMaxTree() gradients. However, I see that they do greatly influence what comes out of nn.SoftMaxTree()...
What's more, how does nn.SoftMaxTree() actually make a prediction? The only way to use it current (as far as I can tell) is to pass in a table of targets. However, when you are not training (i.e. when you are evaluating or actually using the model to predict), you obviously do not have targets.
Is it possible to produce a toy example that uses nn.SoftMaxTree() with the following characteristics: (1) it is wrapped in a sequencer, (2) training improvements are demonstrated over several epochs with batches of sequences, and (3) evaluation is demonstrated by generating a sequence after training
Such an example would go a long way to confirming that the implementation works, and would not take long to write for someone who is familiar with the module
cjviper
@cjviper
@ssampang the network has 4 convolution layers (3x3 kernels) with 64 channels and 2x2 max pooling after each. Yes, I'm running on GPU.
elbamos
@elbamos
Can anyone point me toward a code example of using dp where the targets are a table of tensors instead of a single tensor?
Wesam Sakla
@Supersak80
Hi all. I need your help. I've run some experiments using the experiment wrapper in dp nn. Now, I want to load a saved experiment and run my test data tensors through the trained CNN model and capture the the outputs (features) in the last layer of the CNN, prior to the softmax. I would like to save these CNN feature vector tensors so that I can visualize their t-SNE embedding coordinates. Can anyone post some quick snippets of pseudo-code that will allow me to generate these feature vectors?
cjviper
@cjviper
@Supersak80 which part are you stuck on? load the experiment with torch.load, get the model, load an image, forward it through the model with model:forward then inspect the .output of the appropriate layer with model:get(n).ouput where n is the layer number.
Sheetal Giri
@sheetalgiri
hey everyone ! can any one tell me how effective CTC is on a small dataset?
Rudra Murthy
@murthyrudra
Hi, any help in how to nest a Sequencer module inside a Sequencer module [Nested Recurrence]? You have inner sequencer running over characters and extracting character features for every word. The outer sequencer runs over words. I am trying to implement this paper Multi-Task Cross Lingual Sequence tagging from Scratch
I have tried this model. But getting error during backpropagation
elbamos
@elbamos
Has anyone tried to benchmark GPU ram consumption using DP vs. training an analogous model without the framework?
Wesam Sakla
@Supersak80
@cjviper thank you!
elbamos
@elbamos
I'm trying to continue an experiment. I got to 2000 epochs training with accUpdate, and now I'd like to continue but with accUpdate turned off so I can experiment with momentum and gradient clipping. When I do that, I get this error:
/usr/local/share/lua/5.1/nn/Linear.lua:99: invalid arguments: CudaTensor number CudaTensor CudaTensor expected arguments: *CudaTensor~2D* [CudaTensor~2D] [float] CudaTensor~2D CudaTensor~2D | *CudaTensor~2D* float [CudaTensor~2D] float CudaTensor~2D CudaTensor~2D stack traceback:
It comes out of module:backward() and, tracing back, seems to imply that gradInput is of the wrong type. I'm using the stock callback function (mostly). Can anyone suggest how to track this down? I'm trying to avoid spending a day diving into the dirty bits of where dp, dpnn, and nn intersect.
Soumith Chintala
@soumith
@elbamos ouch. you can just go into a debugger and see at which layer this occurs and track the buffers coming from before and after are. I recommend mobdebug https://github.com/pkulchenko/MobDebug , really simple to use and set breakpoints.
elbamos
@elbamos
hey thanks! I'd given up on using torch with a real debugger
Lior Uzan
@ghostcow
@elbamos did you remove the nn.Convert() layer or something? I had similar issues when I removed mine by accident
it probably has nothing to do with it though, because my trouble was with the forward pass.
elbamos
@elbamos
@ghostcow I did not. I suspect it may relate to a change I made in using Serial layers. Either way, I was able to jumpstart the network into starting to learn again -- now at epoch 2660, the validation loss has dropped by 30%, and I have some pathways to keep it learning for another 1500 epochs or so if it starts to stall again. So the emergency has passed :)
I will say, though, I think its dpnn, but something in the framework gobbles up a lot of GPU-RAM if you turn off in-place updates.
I should not be blowing out RAM on a network with three inception layers on a 12GB titan just because I turn momentum on.
cjviper
@cjviper
hi - on the subject of memory issues, when training CNNs using the convolutionneuralnetwork script from dp, at the end of an epoch I'm noticing big jumps in memory usage on the GPU. It starts out at ~350Mb during the epoch, then as soon as the 1st epoch completes, it jumps to over 2Gb. Before I go digging, does anyone know what happens at the end of an epoch that could cause such a large spike in memory usage? I know the validation set is evaluated at the end of an epoch, but I tested on a much reduced validation set size with only a handful of images, and still got the same memory jump.
elbamos
@elbamos
@cjviper that's what I'm seeing as well (not the same model though). I traced it to the callback. If I switch to accUpdate, the issue goes away.
cjviper
@cjviper
@elbamos what are you training? CNN?
elbamos
@elbamos
accUpdate should be more memory efficient, but I'm seeing about 4x the GPU RAM consumption with accUpdate off.
yeah I have cnn modules, but I don't know that that's it. I'm using cudnn so dp isn't providing the cnn code.
cjviper
@cjviper
I think my problem was different. Something to do with the sampling method used for the validation set - it uses crops of the main image (i.e. TL, TR, BL, BR + center) which increases the data set size.
https://github.com/nicholas-leonard/dp/blob/master/examples/convolutionneuralnetwork.lua#L213
I changed the above line to reduce the batch size in the validator by a factor of 10 and it solved my problem.
Justin Payan
@justinpayan
Could anyone suggest tutorials or resources for learning how to implement tree-structured recursive neural networks (i.e., not flat recurrent neural networks) in dp and torch? Thank you!
elbamos
@elbamos
Justin Payan
@justinpayan
@elbamos Wow, thank you!
Nilesh Kulkarni
@nileshkulkarni

Hey,
I am trying to run this example recurrentlanguagemodel.lua using the penn tree dataset
When I train it with cuda flag as false it runs perfectly file.
But on trying to run with cuda it gets a segmentation fault.

Following is my log
http://pastebin.com/PH9R0QMF

Any debugging helps would be great. How to go about solving this.

Thanks,
Nilesh

arunpatala
@arunpatala
Hi is there a way to do data augmentation (such as rotation,crop etc) with dpnn? Any example code would also be helpful? Thanks