Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Siddarth Sampangi
@ssampang
@elbamos thanks for the tip. I think I found something like .output:size()
but havent tried it yet
elbamos
@elbamos
@ssampang no i mean there's a function that does exactly what you want. I takes a minibatch as a parameter, runs the minibatch through your net, then calls size() on the output
Siddarth Sampangi
@ssampang
@elbamos hmm I googled around but can't find it in any of the well-known packages. would appreciate a link if you have the time
I just tried the trick I mentioned above and it didn't seem to work
th> a = nn.Linear(2,2)
th> a.output:size()
[torch.LongStorage of size 0]
elbamos
@elbamos
@ssampang this is starting to feel like help with a college project
Siddarth Sampangi
@ssampang
@elbamos haha sorry, I've just recently switched over to torch
so I'm still learning some of the basics
havent had to debug much or reshape stuff until now
Siddarth Sampangi
@ssampang
anyone familiar with the confusion module (used for feedback)?
my output is a 3D tensor (batch size x sequence length x #classes), and my target is (batch size x sequence length)
and the module seems to read my ouput with the wrong view and cause an error
I can fix it using the output_module, but it also seems to narrow my target to only look at the first element in the sequence
Siddarth Sampangi
@ssampang
and theres no equivalent module to change the view on my target
elbamos
@elbamos
@ssampang are you using a convert module? or failing to set the correct view on the target when you load the batch?
Siddarth Sampangi
@ssampang
@elbamos when you ask about the convert module, are you referring to the output_module parameter I can set? and I believe I'm setting the views alright. everything runs if I don't use feedback. furthermore, I found this line in confusion.lua:
local tgt = batch:targets():forward('b')
so I don't think the view I set is used right?
elbamos
@elbamos
no i mean, in the dpnn package, there's a module nn.Convert. it takes two parameters, which are view definitions, like 'bhwc', 'bcwh'. You put one as the first layer of your network with parameters (ds:ioShapes(), whateveryouassumedhwneyoumadeyournet)
i remember having similar problems with Confusion; I think perhaps i was one-hot encoding when it was expecting class id's, or i was using class id's when it was expecting one-hot? I remember it took about 45 minutes to fix
Siddarth Sampangi
@ssampang
sorry, I'm familiar with the nn.Convert module, but I guess I don't understand where exactly you're telling me to add it
to the network itself?
elbamos
@elbamos
yes, at the start. and take a close look at the examples in dp that use Confusion - are they assuming one-hot, or class id's?
Siddarth Sampangi
@ssampang
hmm, but how would changing the network input help with the Confusion? Isn't the confusion calculated at the very end, after the criterions? also, the example I'm basing my code off of uses class id's - only difference is it doesn't operate on sequences
elbamos
@elbamos
you're right, but on the other hand, dp is weird. i'm running through in my head all the things i had to fix when i first started using dp and got confusion to work
cjviper
@cjviper
what batch sizes and image sizes do people on here use when training CNNs? I'm finding I have to use batch sizes of 32 or less with image sizes of 128x128 for a training set size of around 140k otherwise I get out of memory errors. Hardware:Tesla K80 with 64GB RAM
Siddarth Sampangi
@ssampang
@cjviper is your network really large?
and are you running out of cpu memory or gpu memory?
Im guessing gpu?
cjmcmurtrie
@cjmcmurtrie
Is the implementation of nn.SoftMaxTree() in nnx confirmed and known to be correct? I don't quite understand what's happening there. Firstly, nn.TreeNLLCriterion() does not return meaningful gradients(it distributes -1 along the batch size). This makes sense so long as the gradients do not influence nn.SoftMaxTree() gradients. However, I see that they do greatly influence what comes out of nn.SoftMaxTree()...
What's more, how does nn.SoftMaxTree() actually make a prediction? The only way to use it current (as far as I can tell) is to pass in a table of targets. However, when you are not training (i.e. when you are evaluating or actually using the model to predict), you obviously do not have targets.
Is it possible to produce a toy example that uses nn.SoftMaxTree() with the following characteristics: (1) it is wrapped in a sequencer, (2) training improvements are demonstrated over several epochs with batches of sequences, and (3) evaluation is demonstrated by generating a sequence after training
Such an example would go a long way to confirming that the implementation works, and would not take long to write for someone who is familiar with the module
cjviper
@cjviper
@ssampang the network has 4 convolution layers (3x3 kernels) with 64 channels and 2x2 max pooling after each. Yes, I'm running on GPU.
elbamos
@elbamos
Can anyone point me toward a code example of using dp where the targets are a table of tensors instead of a single tensor?
Wesam Sakla
@Supersak80
Hi all. I need your help. I've run some experiments using the experiment wrapper in dp nn. Now, I want to load a saved experiment and run my test data tensors through the trained CNN model and capture the the outputs (features) in the last layer of the CNN, prior to the softmax. I would like to save these CNN feature vector tensors so that I can visualize their t-SNE embedding coordinates. Can anyone post some quick snippets of pseudo-code that will allow me to generate these feature vectors?
cjviper
@cjviper
@Supersak80 which part are you stuck on? load the experiment with torch.load, get the model, load an image, forward it through the model with model:forward then inspect the .output of the appropriate layer with model:get(n).ouput where n is the layer number.
Sheetal Giri
@sheetalgiri
hey everyone ! can any one tell me how effective CTC is on a small dataset?
Rudra Murthy
@murthyrudra
Hi, any help in how to nest a Sequencer module inside a Sequencer module [Nested Recurrence]? You have inner sequencer running over characters and extracting character features for every word. The outer sequencer runs over words. I am trying to implement this paper Multi-Task Cross Lingual Sequence tagging from Scratch
I have tried this model. But getting error during backpropagation
elbamos
@elbamos
Has anyone tried to benchmark GPU ram consumption using DP vs. training an analogous model without the framework?
Wesam Sakla
@Supersak80
@cjviper thank you!
elbamos
@elbamos
I'm trying to continue an experiment. I got to 2000 epochs training with accUpdate, and now I'd like to continue but with accUpdate turned off so I can experiment with momentum and gradient clipping. When I do that, I get this error:
/usr/local/share/lua/5.1/nn/Linear.lua:99: invalid arguments: CudaTensor number CudaTensor CudaTensor expected arguments: *CudaTensor~2D* [CudaTensor~2D] [float] CudaTensor~2D CudaTensor~2D | *CudaTensor~2D* float [CudaTensor~2D] float CudaTensor~2D CudaTensor~2D stack traceback:
It comes out of module:backward() and, tracing back, seems to imply that gradInput is of the wrong type. I'm using the stock callback function (mostly). Can anyone suggest how to track this down? I'm trying to avoid spending a day diving into the dirty bits of where dp, dpnn, and nn intersect.
Soumith Chintala
@soumith
@elbamos ouch. you can just go into a debugger and see at which layer this occurs and track the buffers coming from before and after are. I recommend mobdebug https://github.com/pkulchenko/MobDebug , really simple to use and set breakpoints.
elbamos
@elbamos
hey thanks! I'd given up on using torch with a real debugger
Lior Uzan
@ghostcow
@elbamos did you remove the nn.Convert() layer or something? I had similar issues when I removed mine by accident
it probably has nothing to do with it though, because my trouble was with the forward pass.