Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
elbamos
@elbamos
@ThorJonsson it sounds like something, somewhere, is expecting a table of tensors instead of a tensor, or expecting a table of tensors instead of a tensor.
cjviper
@cjviper
anyone ever used dp with cudnn? (instead of cunn)
elbamos
@elbamos
@cjviper Yes.
There’s no effect on dp at all. You just use cudnn to define the model, and treat it like any other model.
cjviper
@cjviper
thanks @elbamos - what sort of speed improvement does it give over cunn?
elbamos
@elbamos
@cjviper - According to soumith's testing, a lot. I haven't timed it, but did notice an improvement.
elbamos
@elbamos
is there a simple way to use a dp.Preprocess() as the argument (effectively) to the dp.Sampler_ppf ?
elbamos
@elbamos
@ghostcow to answer your question about ideas for dp, I had an exchange about that with NL a few weeks ago. He has a new package dataload for handling data. So one thing that I'd say with dp, is to merge the dataset/dataload concepts. That means eliminating the Batch and View classes, which NL pointed out don't do very much at present. I'd simplify the Observer/Feedback/Mediator system by hiding more of the internals. I think Preprocess and Sampler.ppf can be a single concept (especially after eliminating Views). I'd like to see a logger that outputs to disc in a standard format, like json, so it can be read to support a training monitor similar to TensorBoard. Experiment could support distributed training to explore the hyperparameter space (a bunch of my own work, which involves reading data out of hdf5, would support that). etc. etc. Point is, I think dp is a good foundation, and I think we could build from it.
Siddarth Sampangi
@ssampang
is this room also intended for questions on the rnn library? or is it strictly for dpnn?
simopal6
@simopal6
is there any additional documentation for Optimizer other than the neural network tutorial? It's not clear to me how to make the switch from optim
Lior Uzan
@ghostcow
@elbamos eliminating Batch and View sounds great I would love to help. lets continue this talk offline?
Rudra Murthy
@murthyrudra
Hi everyone, i am currently implementing a sequence labeler using rnn library. I have two modules one a sequencer module around FASTLSTM and other a linear layer layer followed by softmax. Both the modules are enclosed in Sequential containers. How should i combine parameters from both the modules so that i can use it with optim package. Currently i am using the following code from Oxford CS ML. Is this the right way to go?
simopal6
@simopal6
@murthyrudra you can join the two networks through a nn.SelectTable(-1), which gets the last item in the table returned by the sequencer (ideally, the RNN state at the last time step) and feed it to the linear layer, so that you have everything in a single container and don't need to do any parameter combination
Rudra Murthy
@murthyrudra
@simopal6 thanks
Lior Uzan
@ghostcow

hey everyone, i'm getting two weird results when using the alexnet.lua script to fine tune a model on some data. (it is a simple classification experiment and i will share my script gladly)

  1. even with nThreads=0 and me manually setting torch.manualSeed(1) cutorch.manualSeed(1) I'm getting different avgErr results each time I run the experiment. this shouldn't happen.
  2. using nThreads=2 i get significantly worse results than with nThreads=0. This also shouldn't happen AFAIK these threads are used for data loading and that's it.

Does anyone have any ideas where to start looking for the bugs? I'm open to suggestions!

elbamos
@elbamos
@ghostcow Is it possible that you aren't initializing a tensor somewhere, so randomness is sneaking in?
Lior Uzan
@ghostcow
I guess I would have to check properly with a debugger
elbamos
@elbamos
I'd start with the alexnet script from dp exactly as downloaded, confirm that its deterministic, and then start putting your changes incrementally, testing after each to see when it stops being deterministic
Siddarth Sampangi
@ssampang
anyone know of an easy way to print out the input/output sizes of a module?

for example, I would like to change

asd = nn.ParallelTable()
asd:add(nn.Linear(2,2))
asd:add(nn.Linear(4,4))
asd:add(nn.JoinTable(1,2))

to something like

asd = nn.ParallelTable()
asd:add(nn.Linear(2,2))
asd:add(nn.Linear(4,4))
asd:printOutputShape()
asd:add(nn.JoinTable(1,2))

because I easily get confused about how the shape of my input gets changed as it goes through my network

elbamos
@elbamos
@ssampang there's a getOutputSize() or something like that function in the dpnn package. Its a decorator that modified nn.Module, so it works on anything
Siddarth Sampangi
@ssampang
@elbamos thanks for the tip. I think I found something like .output:size()
but havent tried it yet
elbamos
@elbamos
@ssampang no i mean there's a function that does exactly what you want. I takes a minibatch as a parameter, runs the minibatch through your net, then calls size() on the output
Siddarth Sampangi
@ssampang
@elbamos hmm I googled around but can't find it in any of the well-known packages. would appreciate a link if you have the time
I just tried the trick I mentioned above and it didn't seem to work
th> a = nn.Linear(2,2)
th> a.output:size()
[torch.LongStorage of size 0]
elbamos
@elbamos
@ssampang this is starting to feel like help with a college project
Siddarth Sampangi
@ssampang
@elbamos haha sorry, I've just recently switched over to torch
so I'm still learning some of the basics
havent had to debug much or reshape stuff until now
Siddarth Sampangi
@ssampang
anyone familiar with the confusion module (used for feedback)?
my output is a 3D tensor (batch size x sequence length x #classes), and my target is (batch size x sequence length)
and the module seems to read my ouput with the wrong view and cause an error
I can fix it using the output_module, but it also seems to narrow my target to only look at the first element in the sequence
Siddarth Sampangi
@ssampang
and theres no equivalent module to change the view on my target
elbamos
@elbamos
@ssampang are you using a convert module? or failing to set the correct view on the target when you load the batch?
Siddarth Sampangi
@ssampang
@elbamos when you ask about the convert module, are you referring to the output_module parameter I can set? and I believe I'm setting the views alright. everything runs if I don't use feedback. furthermore, I found this line in confusion.lua:
local tgt = batch:targets():forward('b')
so I don't think the view I set is used right?
elbamos
@elbamos
no i mean, in the dpnn package, there's a module nn.Convert. it takes two parameters, which are view definitions, like 'bhwc', 'bcwh'. You put one as the first layer of your network with parameters (ds:ioShapes(), whateveryouassumedhwneyoumadeyournet)
i remember having similar problems with Confusion; I think perhaps i was one-hot encoding when it was expecting class id's, or i was using class id's when it was expecting one-hot? I remember it took about 45 minutes to fix
Siddarth Sampangi
@ssampang
sorry, I'm familiar with the nn.Convert module, but I guess I don't understand where exactly you're telling me to add it
to the network itself?
elbamos
@elbamos
yes, at the start. and take a close look at the examples in dp that use Confusion - are they assuming one-hot, or class id's?
Siddarth Sampangi
@ssampang
hmm, but how would changing the network input help with the Confusion? Isn't the confusion calculated at the very end, after the criterions? also, the example I'm basing my code off of uses class id's - only difference is it doesn't operate on sequences
elbamos
@elbamos
you're right, but on the other hand, dp is weird. i'm running through in my head all the things i had to fix when i first started using dp and got confusion to work
cjviper
@cjviper
what batch sizes and image sizes do people on here use when training CNNs? I'm finding I have to use batch sizes of 32 or less with image sizes of 128x128 for a training set size of around 140k otherwise I get out of memory errors. Hardware:Tesla K80 with 64GB RAM
Siddarth Sampangi
@ssampang
@cjviper is your network really large?
and are you running out of cpu memory or gpu memory?