Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
cjmcmurtrie
@cjmcmurtrie
Such an example would go a long way to confirming that the implementation works, and would not take long to write for someone who is familiar with the module
cjviper
@cjviper
@ssampang the network has 4 convolution layers (3x3 kernels) with 64 channels and 2x2 max pooling after each. Yes, I'm running on GPU.
elbamos
@elbamos
Can anyone point me toward a code example of using dp where the targets are a table of tensors instead of a single tensor?
Wesam Sakla
@Supersak80
Hi all. I need your help. I've run some experiments using the experiment wrapper in dp nn. Now, I want to load a saved experiment and run my test data tensors through the trained CNN model and capture the the outputs (features) in the last layer of the CNN, prior to the softmax. I would like to save these CNN feature vector tensors so that I can visualize their t-SNE embedding coordinates. Can anyone post some quick snippets of pseudo-code that will allow me to generate these feature vectors?
cjviper
@cjviper
@Supersak80 which part are you stuck on? load the experiment with torch.load, get the model, load an image, forward it through the model with model:forward then inspect the .output of the appropriate layer with model:get(n).ouput where n is the layer number.
Sheetal Giri
@sheetalgiri
hey everyone ! can any one tell me how effective CTC is on a small dataset?
Rudra Murthy
@murthyrudra
Hi, any help in how to nest a Sequencer module inside a Sequencer module [Nested Recurrence]? You have inner sequencer running over characters and extracting character features for every word. The outer sequencer runs over words. I am trying to implement this paper Multi-Task Cross Lingual Sequence tagging from Scratch
I have tried this model. But getting error during backpropagation
elbamos
@elbamos
Has anyone tried to benchmark GPU ram consumption using DP vs. training an analogous model without the framework?
Wesam Sakla
@Supersak80
@cjviper thank you!
elbamos
@elbamos
I'm trying to continue an experiment. I got to 2000 epochs training with accUpdate, and now I'd like to continue but with accUpdate turned off so I can experiment with momentum and gradient clipping. When I do that, I get this error:
/usr/local/share/lua/5.1/nn/Linear.lua:99: invalid arguments: CudaTensor number CudaTensor CudaTensor expected arguments: *CudaTensor~2D* [CudaTensor~2D] [float] CudaTensor~2D CudaTensor~2D | *CudaTensor~2D* float [CudaTensor~2D] float CudaTensor~2D CudaTensor~2D stack traceback:
It comes out of module:backward() and, tracing back, seems to imply that gradInput is of the wrong type. I'm using the stock callback function (mostly). Can anyone suggest how to track this down? I'm trying to avoid spending a day diving into the dirty bits of where dp, dpnn, and nn intersect.
Soumith Chintala
@soumith
@elbamos ouch. you can just go into a debugger and see at which layer this occurs and track the buffers coming from before and after are. I recommend mobdebug https://github.com/pkulchenko/MobDebug , really simple to use and set breakpoints.
elbamos
@elbamos
hey thanks! I'd given up on using torch with a real debugger
Lior Uzan
@ghostcow
@elbamos did you remove the nn.Convert() layer or something? I had similar issues when I removed mine by accident
it probably has nothing to do with it though, because my trouble was with the forward pass.
elbamos
@elbamos
@ghostcow I did not. I suspect it may relate to a change I made in using Serial layers. Either way, I was able to jumpstart the network into starting to learn again -- now at epoch 2660, the validation loss has dropped by 30%, and I have some pathways to keep it learning for another 1500 epochs or so if it starts to stall again. So the emergency has passed :)
I will say, though, I think its dpnn, but something in the framework gobbles up a lot of GPU-RAM if you turn off in-place updates.
I should not be blowing out RAM on a network with three inception layers on a 12GB titan just because I turn momentum on.
cjviper
@cjviper
hi - on the subject of memory issues, when training CNNs using the convolutionneuralnetwork script from dp, at the end of an epoch I'm noticing big jumps in memory usage on the GPU. It starts out at ~350Mb during the epoch, then as soon as the 1st epoch completes, it jumps to over 2Gb. Before I go digging, does anyone know what happens at the end of an epoch that could cause such a large spike in memory usage? I know the validation set is evaluated at the end of an epoch, but I tested on a much reduced validation set size with only a handful of images, and still got the same memory jump.
elbamos
@elbamos
@cjviper that's what I'm seeing as well (not the same model though). I traced it to the callback. If I switch to accUpdate, the issue goes away.
cjviper
@cjviper
@elbamos what are you training? CNN?
elbamos
@elbamos
accUpdate should be more memory efficient, but I'm seeing about 4x the GPU RAM consumption with accUpdate off.
yeah I have cnn modules, but I don't know that that's it. I'm using cudnn so dp isn't providing the cnn code.
cjviper
@cjviper
I think my problem was different. Something to do with the sampling method used for the validation set - it uses crops of the main image (i.e. TL, TR, BL, BR + center) which increases the data set size.
https://github.com/nicholas-leonard/dp/blob/master/examples/convolutionneuralnetwork.lua#L213
I changed the above line to reduce the batch size in the validator by a factor of 10 and it solved my problem.
Justin Payan
@justinpayan
Could anyone suggest tutorials or resources for learning how to implement tree-structured recursive neural networks (i.e., not flat recurrent neural networks) in dp and torch? Thank you!
elbamos
@elbamos
Justin Payan
@justinpayan
@elbamos Wow, thank you!
Nilesh Kulkarni
@nileshkulkarni

Hey,
I am trying to run this example recurrentlanguagemodel.lua using the penn tree dataset
When I train it with cuda flag as false it runs perfectly file.
But on trying to run with cuda it gets a segmentation fault.

Following is my log
http://pastebin.com/PH9R0QMF

Any debugging helps would be great. How to go about solving this.

Thanks,
Nilesh

arunpatala
@arunpatala
Hi is there a way to do data augmentation (such as rotation,crop etc) with dpnn? Any example code would also be helpful? Thanks
elbamos
@elbamos
@arunpatala its absolutely possible, I do it.
arunpatala
@arunpatala
Any pointers on how to approach that ? @elbamos
elbamos
@elbamos
Be careful with allocating your buffers if you plan to multithread.
cjviper
@cjviper
@arunpatala I perform data augmentation explicitly in advance, by using graphics magick commands. I wrote simple shell scripts that perform the initial resizing along with the crops/rotations etc.
Sanuj Sharma
@sanuj
@elbamos do you know how to change when the model will be saved while training a neural net? Currently it happens when there is minimum validation error.
elbamos
@elbamos
Yes, I do. And no, it doesn't.
Sanuj Sharma
@sanuj
@elbamos can you point me to the code that controls the saving of model. And what's the default criteria of saving of the model and how do i get to know about it?
elbamos
@elbamos
@sanuj Read the documentation for the Observer class and for the ErrorMinima class and it superclassers and subclasses.
Sanuj Sharma
@sanuj
Thanks @elbamos . That was helpful
Sanuj Sharma
@sanuj
Due to limited RAM i have multiple data-sets for training. I want to change the dataset after each epoch but the lua garbage collector is not able to clear the old dataset to make way for the newer one.
code looks like:
function loadData(train_data, validate_data)
    ds = nil
    train = nil
    valid = nil
    train_target = nil
    valid_target = nil
    train_input = nil
    valid_input = nil
    n_valid = nil
    n_train = nil
    nuclei_train = nil
    nuclei_valid = nil
    nuclei_train = torch.load(train_data)
    nuclei_valid = torch.load(validate_data)
    nuclei_train.data = nuclei_train.data:double()
    nuclei_valid.data = nuclei_valid.data:double()
    n_valid = (#nuclei_valid.label)[1]
    n_train = (#nuclei_train.label)[1]

    train_input = dp.ImageView('bchw', nuclei_train.data:narrow(1, 1, n_train))
    train_target = dp.ClassView('b', nuclei_train.label:narrow(1, 1, n_train))
    valid_input = dp.ImageView('bchw', nuclei_valid.data:narrow(1, 1, n_valid))
    valid_target = dp.ClassView('b', nuclei_valid.label:narrow(1, 1, n_valid))

    train_target:setClasses({0, 1, 2})
    valid_target:setClasses({0, 1, 2})

    -- 3. wrap views into datasets

    train = dp.DataSet{inputs=train_input, targets=train_target, which_set='train'}
    valid = dp.DataSet{inputs=valid_input, targets=valid_target, which_set='valid'}

    -- 4. wrap datasets into datasource

    ds = dp.DataSource{train_set=train, valid_set=valid}
    ds:classes{0, 1, 2}
end
while true do
    train_data = '/home/sanuj/Projects/nuclei-net-data/fine-tune/1/train.t7'
    validate_data = '/home/sanuj/Projects/nuclei-net-data/fine-tune/1/validate.t7'
    loadData(train_data, validate_data)
    print 'Using data-set 1.'
    train_data = '/home/sanuj/Projects/nuclei-net-data/fine-tune/2/train.t7'
    validate_data = '/home/sanuj/Projects/nuclei-net-data/fine-tune/2/validate.t7'
    print 'Using data-set 2.'
    loadData(train_data, validate_data)
    xp:run(ds)
end
Sanuj Sharma
@sanuj
it is able to clear ds if i don't call xp:run(ds) between two loadData calls but the xp:run(ds) adds more references to ds i guess which stops the garbage collector to clear it. Don't know how to fix it. @elbamos can you help me with this?
elbamos
@elbamos
instead of changing datasets after each epoch, what you want to do is write a subclass of DataSet that produces the data you want after each epoch. handling memory considerations of moving training data in and our of ram is a responsibility of the DataSet and DataSource objects
Sanuj Sharma
@sanuj
@elbamos how will the subclass change the dataset after every epoch? does it need to subscribe to some event to get notified?
elbamos
@elbamos
are you asking me how it will know whne one epoch ends and another begins?
it could subsribe if you wanted to do it that way, but there's an easier way: depending on the Sampler you use, the system will decide what rows to ask for in what order. So you tell the dataset to tell dp there are however many rows there actually are, and when that number of rows has been processed, it won't ask for any more and the epoch will be over. if different parts of your dataset have different numbers of rows, then the simplest thing is to just ignore what is one epoch and what is another. you just produce batches in what order you want them processed, and when it runs out from one place it starts taking batches from someplace else. then, the length of an epoch is just however often you want to see feedback reports
Sanuj Sharma
@sanuj
i think i would have to understand the internals of dp
elbamos
@elbamos
no just study the way imagedataset etc are implemented