Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Feb 24 2018 05:26
    @agibsonccc banned @harsh1795
  • Apr 18 2017 22:39
    @agibsonccc banned @hbb21st
  • Feb 28 2017 21:37
    @raver119 banned @dims12
  • Nov 28 2016 13:15
    User @agibsonccc unbanned @cloakedstudios_twitter
  • Nov 28 2016 13:02
    @agibsonccc banned @cloakedstudios_twitter
  • Jul 20 2016 10:38
    @agibsonccc banned @404430928
  • Jun 30 2016 06:54
    @agibsonccc banned @kumarVishwesh
raver119
@raver119
android is a linux in disguise
nothing really special there in terms of memory management
a bit more strict environment due to natural limitations, sure
but nothing major
Alex Black
@AlexDBlack
@davids91 skip the LocalResponseNormalization layer
it's an outdated technique, no modern CNN uses it anymore
if you need normalization, use batch norm instead
otherwise looks reasonable... other than kernel sizes, I still don't know what they are actually being set to (what are the values of your width and height variables?)
a bit of weight decay (like 1e-4) probably wouldn't hurt either, and may help
davids91
@davids91
All right, I'll exclude it, I was starting to experiment with SELU and WeightInit.NORMAL :) Thanks for the tip!
As for the kernel size: width and height are currently 128 x 128, I am trying this rectangular kernel because the 3D stereoscopy is a pixel correspondence problem with Stereoscopic images, and my theory that the focus needs to be horizontal
a bit outdated but tells the essence right
davids91
@davids91
Hallo! :) Is there a need to use BatchNormalization with SELUs?
Alex Black
@AlexDBlack
if the results of the SNN paper are to believed, then no
https://arxiv.org/pdf/1706.02515.pdf
but, you can check the activations mean/std in the UI (model tab)
if it's working as expected, they should converge to stay around mean zero, std dev 1
davids91
@davids91
well they are not converging in my experience :D at least the performance dropped significantly without normalization
but actually it is to be expected to have worse inintial performance, I'm more worried that if the parameters are taking care of normalization, there isn't going to be much capacity for the feature computations
loleif
@loleif
hey
gitterBot
@raver120
@loleif Welcome! Here's a link to Deeplearning4j's Gitter Guidelines, our documentation and other DeepLearning resources online. Please explore these and enjoy! https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j/GITTER_GUIDELINES.md
loleif
@loleif
so what do i do here
davids91
@davids91
@loleif If you have any questions realted to finetuning your Deep learning model, this is the relevant channel to popst questions :)
raguenets
@montardon
Hi, I'm trying to train Unet model with images from ISBI 2015 challenge.
https://gist.github.com/montardon/1dd63ad960dc7a4c201b5fdc6045c96b
Setting a very small learning rate of 0.0000005f, I can get non NaN score. But it does not converge at all.
UI looks weird.
Alex Black
@AlexDBlack

your learning rate is too high, activations (and probably gradients too) are exploding
you can see that simply from the update/parameter ratios chart
anything above about -3 there is bad (too high LR)
that score looks off too, even for a segmentation model
post the output of ComputationGraph.summary() too

            INDArray arr = loader.asMatrix(imageFile);
            arr = arr.divi(255.0f);
            imageBatch.putRow(index++, arr);

that looks wrong. the divi(255) is fine, but asMatrix should give you 4D IIRC whereas putRow is for rows
use INDArray.put(NDArrayIndex.point(index++), NDArrayIndex.all(), NDArrayIndex.all(), NDArrayIndex.all())

raguenets
@montardon
raguenets
@montardon
Should I try to set updater on the model ?
raguenets
@montardon
I'm using zoo Unet model with well known image set. How comes I can not train it easily ? I tried many combinations for normalization (the usual ones). My learning rate is small , but smaller is under float precision. I'm stuck and do not know how to go on.
s1nned
@s1nned
Hi @AlexDBlack, here are the informations about the strange cycles you asked for: no. of examples 20370
minibatchsize 70
grafik.png
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(320) .weightInit(WeightInit.XAVIER) .updater(new Adam(1e-2)) .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue) .gradientNormalizationThreshold(0.5) .list() .layer(new LSTM.Builder() .activation(Activation.TANH) .nIn(4) .nOut(8) .build()) .layer(new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT) .activation(Activation.SOFTMAX) .nIn(8) .nOut(numLabelClasses) .build()) .build();
s1nned
@s1nned
the data is sequential (that's why i used rnn config)
Alex Black
@AlexDBlack
hm... learning rate is a little high (average of around -3 on the bottom left chart would be ideal)
configuration looks fine (other than being a little small, you could go larger with that number of examples I think)
otherwise - only guess is there's some sort of regularity in the examples? like every second minibatch is alternatively longer/shorter than the previous
or non-shuffled data or something... that's really the only idea I have right now
s1nned
@s1nned
with "average of around -3 on the bottom left chart.." you mean to set the lr to 1e-3 right?
go larger with more layers or just no. of hidden nodes?
s1nned
@s1nned
is it absolutely necessary to shuffle the data? because i know that some of the examples are labeled over a certain period of time (start time - end time) and when i shuffle the data, its scattered and the algo does not recognize any dependencies?
grafik.png
this pic shows 2500 epochs (previous was 200)
raguenets
@montardon
Hi, any hints to train my model ? I think the question is specific to DL4J. I gave code, link to image samples. I read the book, trained on Coursera . Everything required to ask question about this new to me library.
raver119
@raver119
answer is trivial
DL is still an art
raguenets
@montardon
image.png
@raver119 Here the artist.
siddadel
@siddadel
@AlexDBlack I have 15,000 images each 104 x 132 that I am training using a similar neural network as the DL4J example, org.deeplearning4j.examples.convolution. AnimalClassification. I tried both AlexNet and LeNet batch size 128, 100 epochs, but in both instances the score plateaus at 1.5. The accuracy is limited to 33-37%.
Can you provide some advice and some literature that I could read and use the UIServer graphs to tune/re-architecture my network?
siddadel
@siddadel
Forgot to add "please" ^^ :)
siddadel
@siddadel
Thanks @saudet .
Alex Black
@AlexDBlack
@siddadel hm... there's a few bad design choices there
like you first increase then decrease depth as you go through the net? it should increase as spatial dimensions decrease (so that activations h x w x depth is about constant through the net)
your achitecture seems to be based on alexnet... which is old at this point
cut the LRN (it's an obsolete technique), reduce the kernels to 3x3 (or 5x5 at most). If you need to stabilize activations, use batch norm instead
weight init - just use xavier, not manual distributions like that
cut the gradient norm completely
unless your dataset is huge, those fuly connected layers are too large
I don't know how well you have tuned the learning rate either, definitely check that in the ui
siddadel
@siddadel
@AlexDBlack - This is really helpful. I am new to neural networks so I haven't done much tuning. This will be a good start. Will go through the literature linked above and your comments and get back.
s1nned
@s1nned
with "average of around -3 on the bottom left chart.." you mean to set the lr to 1e-3 right?
go larger with more layers or just no. of hidden nodes?
is it absolutely necessary to shuffle the data? because i know that some of the examples are labeled over a certain period of time (start time - end time) and when i shuffle the data, its scattered and the algo does not recognize any dependencies? @AlexDBlack
Alex Black
@AlexDBlack

with "average of around -3 on the bottom left chart.." you mean to set the lr to 1e-3 right?

no. I mean look at the chart. If it's higher than -3 on average, decrease the learning rate. If it's lower, increase it. That's usually a good starting point

go larger with more layers or just no. of hidden nodes?

both. For that number of examples, I'd probably bump it up to something like 2x LSTM layers of size 32 or 64

is it absolutely necessary to shuffle the data?

no. It can sometimes add a percent or two to accuracy vs. not shuffling, but it's not critical
obviously for time series you just want to change the order in which those series are presented to the net, not shuffle within the time steps or anything

gharick
@gharick
hello guys, please a simple question, I have a trained network which is ready and loaded to use it for prediction
tryed output, feedforward , predit and i get an exception that tells expecting rank(2) array and recieved rank 1 array !
gharick
@gharick
figured out