Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 12:35
    AlexDBlack commented #8298
  • 11:56
    AlexDBlack labeled #8298
  • 11:56
    AlexDBlack labeled #8298
  • 11:56
    AlexDBlack commented #8298
  • 11:00
    AlexDBlack opened #8308
  • 10:19
    AlexDBlack closed #8275
  • 10:19
    AlexDBlack closed #8237
  • 10:19
    AlexDBlack closed #8286
  • 09:25
    AlexDBlack closed #8066
  • 09:25
    AlexDBlack commented #8066
  • 09:16
    AlexDBlack closed #8294
  • 09:16
    AlexDBlack synchronize #8294
  • 08:25
    guigautier commented #8298
  • 08:12
    cqiaoYc commented #8253
  • 03:26
    AlexDBlack synchronize #8294
  • 00:36
    saudet assigned #8307
  • Oct 22 23:28
    willishf commented #8307
  • Oct 22 23:25
    rafmontano commented #8217
  • Oct 22 22:29
    LyesSp commented #8295
  • Oct 22 21:46
    eraly commented #8298
Alex Black
@AlexDBlack
@bastienjalbert no custom loss import yet unfortunately, but your next best option may be to implement a SameDiff loss/output layer, see SameDiffMSELossLayer and SameDiffMSEOutputLayer (in unit tests) here
https://github.com/SkymindIO/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-core/src/test/java/org/deeplearning4j/nn/layers/samediff/testlayers
at least you don't have to define backprop manually that way
@Maleeha456 you don't want to depend on deeplearning4j-examples, we don't release that onto maven central or anything
instead, you want to depend on the things that are used in the examples - deeplearning4j-nlp for example
marioarrigonineri
@marioarrigonineri

Hello, I cannot upgrade form beta3 to beta5.. in my POM. I have

    <dl4j.version>1.0.0-beta3</dl4j.version>
    <cuda.version>10.0</cuda.version>
    <cudnn.version>7.4</cudnn.version>
    <javacpp-presets.cuda.version>1.4.4</javacpp-presets.cuda.version>
    <nd4j.version>${dl4j.version}</nd4j.version>
    <nd4j.backend>nd4j-cuda-${cuda.version}</nd4j.backend>
    ...

    <dependency>
        <groupId>org.bytedeco.javacpp-presets</groupId>
        <artifactId>cuda</artifactId>
        <version>${cuda.version}-${cudnn.version}-${javacpp-presets.cuda.version}</version>
        <classifier>windows-x86_64-redist</classifier>
    </dependency>
    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-cuda-${cuda.version}-platform</artifactId>
        <version>${nd4j.version}</version>
    </dependency>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-cuda-${cuda.version}</artifactId>
        <version>${dl4j.version}</version>
    </dependency>

and it works fine, although in JavaCPP cache folder I see both
cuda-10.0-7.4-1.4.4-windows-x86_64-redist.jar
and
cuda-10.0-7.3-1.4.3-windows-x86_64.jar
(strange)

when I move to
<dl4j.version>1.0.0-beta5</dl4j.version>
I get an error at suntime:
Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
Backend shold be the same and I have only a CUDA9.0 installed on my PC (to be used in TF)

Alex Black
@AlexDBlack
@marioarrigonineri so, two things
(a) can you post the full stack trace/output in a !gist?
(b) are you using eclipse IDE by any chance?
gitterBot
@raver120
To use gist: paste your code/exception/large output log into https://gist.github.com, click 'Create Secret Gist' and paste URL link here
marioarrigonineri
@marioarrigonineri
Alex Black
@AlexDBlack
one common issue we see with eclipse IDE is when people change versions in pom.xml, their eclipse project doesn't update
i.e., the eclipse project and pom.xml aren't properly linked, info from pom.xml is converted to eclipse internal project format only once on initial project creation
usually recreating/reimporting the project into eclipse can fix that
beyond that, I can't see anything obviously wrong with your pom.xml
octocode686
@octocode686
@marioarrigonineri just right click on your project in project view, do Maven --> Update Project and it should work
javadev-berlin
@javadev-berlin

@AlexDBlack

@javadev-berlin that was a community contribution
but I believe the limitation there is the reshaping for the output layer
in a fully convolutional network (which vgg16 is not) different image sizes should work out of the box

thanks ... Is any fully CNN inside the zoomodel package ( 1.0.0.beta2) ? I see that following models are there:

AlexNet
Darknet19
FaceNetNN4Small2
InceptionResNetV1
LeNet
NASNet
ResNet50
SimpleCNN
SqueezeNet
TextGenerationLSTM
TinyYOLO
UNet
VGG16
VGG19
Xception
YOLO2

I have tried almost all of them, setting set height/width to 448 ( or whatever its set as the default input shape in the class ) , they do not work either ... I get either a null point exception at "private INDArray flatten(INDArray x) {" or UnsupportedOperationException: Pretrained IMAGENET weights are not available for this model.

Alex Black
@AlexDBlack
@javadev-berlin you can tell a net is fully convolutional when it has a global pooling layer as one of the last layers before the output/loss layer
that global pooling layer handles the transition between 4d CNN activations and 2d activations in a size-independent manner, instead of reshaping which is size specific in most CNNs
IIRC darknet19, nasnet, squeezenet, and xception are fully convolutional. Not all will have pretrained models (that's what the error you got means)
I can't remember what fully convolutional nets will do with variable input size, they might (incorrectly) conclude no pretrained net is available for that size. Feel free to open an !issue about that and we can handle it
also FYI 1.0.0-beta5 is the most recent release, might be worth upgrading - https://deeplearning4j.org/release-notes.html
gitterBot
@raver120
To file an issue, please click 'New Issue' at https://github.com/deeplearning4j/deeplearning4j/issues and provide as much details on your problem, as possible
Polly13
@Polly1358941334_twitter
@AlexDBlack
Hello. I sent a message earlier and It might have been lost among other messages, but I would like to recall it.
https://gitter.im/deeplearning4j/deeplearning4j?at=5d931338086a72719e94b4f5
I would be happy if you will pay attention to it. Thanks in advance.
Oliver Rausch
@orausch
@AlexDBlack I'm getting some weird behavior on my GPU machine. I'm finding that the propagation time through samediff (i.e. calling sd.outputSingle) gets almost 10x slower if I load an array in a different way
I suspect this is because of some hidden lazy synchronization mechanism of Nd4j, and that my array is not actually loaded onto the GPU when I call Nd4j.create(data)
If this is true, is there any way I can force it to synchronize?
In particular, when I was loading up the batch arrays by calling Nd4j.create(dims..) and then filling it with putRow(Nd4j.create(float[]...)) propagation took ~300-700 ms
But when I load the batch array using Nd4j.create(float[]...) (functionally equivalent), the loading time is faster but the propagation time is now 1.5-3 s
Oliver Rausch
@orausch
In general, is there anyway to debug the performance of the samediff sd.outputSingle call? I also get quite high variance disregarding this issue, and I'd like to look into why
raguenets
@montardon
Hi, I'm trying to import a Tensorflow .pb model. I'm getting a RuntimeException.
Error at [/home/jenkins/workspace/deeplearning4j-deeplearning4j-1.0.0-beta5-linux-x86_64-centos6-cuda-10.1/libnd4j/include/ops/declarable/generic/nn/batchnorm.cpp:200:0]:
BATCHNORM_NEW op: wrong shape of mean array, expected is [32], but got [0] instead !
Exception in thread "main" java.lang.RuntimeException: Op [batchnorm_new] execution failed
I'm using this model with Python and it gives me results.
raguenets
@montardon
It looks like it is model dependent. :+(.
ChrisYohann
@ChrisYohann

Hello everyone, I'm currently facing an issue using Spark and DL4J.
I'm trying to fit a neural network using Spark and to debug my training, I've attached a StatsListener to my model to send training stats to the UI.
The Server UI is in a different JVM as advised in the tutorial when running Spark Applications. The problem is that when I use the SharedTrainingMaster, no information is sent to the UI. However, it works with the ParameterAveragingMaster implementation.
This is my code :

 val trainingMaster = new SharedTrainingMaster.Builder(voidConf, 1000)
    .batchSizePerWorker(100) //Batch size for training     //Update threshold for quantization/compression. See technical explanation page
    .workersPerNode(1)      // equal to number of GPUs. For CPUs: use 1; use > 1 for large core count CPUs// or MeshBuildMode.PLAIN for < 32 nodes
    .collectTrainingStats(true)
    .storageLevel(StorageLevel.MEMORY_AND_DISK_SER)
    .build()

  //val trainingMaster = new ParameterAveragingTrainingMaster.Builder(1000).build()

  //Create the SparkDl4jMultiLayer instance
  val sparkNet: SparkDl4jMultiLayer = new SparkDl4jMultiLayer(sc, model, trainingMaster)
  sparkNet.setCollectTrainingStats(true)

  val remoteUIRouter: StatsStorageRouter = new RemoteUIStatsStorageRouter("http://192.168.1.11:9192")

  sparkNet.setListeners(remoteUIRouter, Collections.singletonList[TrainingListener](new StatsListener(null, 1)))

After some digging, it seems that my issue is the exact duplicate of this : eclipse/deeplearning4j#5835 which was solved a year ago.

I precise that I don't have this problem when I run spark in standalone mode. Only when I am in cluster/yarn mode.

Does anyone have an hint on this ? Thanks in advance

ari62
@ari62
Hi Everyone, I created an end to end Kaggle competition project using dl4j. It is available here if anyone would like to use it as reference: https://github.com/ari99/housing_prices_dl4j
Susan Eraly
@eraly
@Polly1358941334_twitter The getOutputType is for the output shape. So your lambda1 layer doesn't change input shape. The input shape into lambda1 is 3d as can be seen from your keras model summary. And this is why you have to set it as recurrent with size 16. You could also just pass the input type to the output i.e. instead of " return InputType.recurrent(16);" do "return inputType"
Susan Eraly
@eraly
As for your second lambda layer - I believe there is a bug in the time distributed layer implementation which is related to the behaviour you are seeing.
Susan Eraly
@eraly
@Polly1358941334_twitter In fact I am not entirely sure time distributed is supported. According to https://deeplearning4j.org/docs/latest/keras-import-supported-features#layers it's not.
Susan Eraly
@eraly
@Polly1358941334_twitter Issue for time distributed dense eclipse/deeplearning4j#8278
marioarrigonineri
@marioarrigonineri
@octocode686 hi.. i tried to export and import project and to update maven.. I hace the same error :(
cqiaoYc
@cqiaoYc
@AlexDBlack please take some time to check my code, thank you very much! eclipse/deeplearning4j#8253
Polly13
@Polly1358941334_twitter
@eraly Hello. Pleased to hearing from you.
1) What does recurrent(16) mean in this case? (to be consistent it should be connected to Recurrent layes, but it doesn't look like that).
2) That'‎s right. But the argument of TimeDistributedLayer is Dense and results are in agreement with Keras model.
Alex Black
@AlexDBlack
@ChrisYohann what DL4J version are you on?
if it's not 1.0.0-beta4 or (better yet) 1.0.0-beta5, try upgrading
otherwise - open an !issue and we'll take a look
gitterBot
@raver120
To file an issue, please click 'New Issue' at https://github.com/deeplearning4j/deeplearning4j/issues and provide as much details on your problem, as possible
Alex Black
@AlexDBlack
@montardon hard to say without the model. But that doesn't look like any issue we know about... Are you able to share the model?
if so, open an issue and I'd be happy to take a look
Alex Black
@AlexDBlack
@orausch that sounds off... how you create the array shouldn't make much difference
I mean sure, you have host->device memory copy, but that should be very quick, not hundreds of milliseconds or seconds (unless it's a very huge array - many hundreds of MB or larger)
if you can provide a way to reproduce it in an !issue, we'll take a look
as for debugging performance - we don't yet have an op profiler built into samediff
but, a custom Listener (extending BaseListener) could easily be used to implement a basic profiler
in preOpExecution method, start your timer; in opExceution method, finish it... time between those two is op execution duration
a java profiler (like yourkit or visualvm) might also provide some insights if it's not due to op execution only
gitterBot
@raver120
To file an issue, please click 'New Issue' at https://github.com/deeplearning4j/deeplearning4j/issues and provide as much details on your problem, as possible
javadev-berlin
@javadev-berlin

Hi guys,
I am trying to transfer the style of a room to another room ( picture ) , using the NeuralStyleTransfer example. You can see the results at images ( 1st image is the style, 2nd the target, 3rd the result after 900 iterations )

The problem is that I only want the furniture style to be transferred to the target room ( furniture to be put on the ground ) , but the style is being transferred to the walls too ... Is there a way to avoid the style being transferred to the walls and ceiling ?

Thanks ....

Alex Black
@AlexDBlack
@Polly1358941334_twitter 1) if the convolutional layer is CNN1d, then InputType.recurrent does make sense
it should really be called InputType.timeSeries or InputType.sequence or something instead
if it's a CNN2D layer, then yeah, not sure why it'd help or work with that...
2) getOutputType - it's the type of activations out, given the input. Most layers return the same type out as in, except for size (so an LSTM might get recurrent(nIn) in, and return recurrent(nOut) out)
3) Depends what's after. Also depends on if the nIn in set in the layer after. The way it works is that any manually set nIn won't be overridden by setInputType. If no nIn is set, then it's taken from the input type
@javadev-berlin not directly
my first guess is that you'd have to use some sort of a mask - either hand drawn, or from the output of a (separate) semantic segmentation net
that would tell you where the floor is (which pixels specifically) and then only those would be modified
I wouldn't be surprised if someone, somewhere, has already done/tried exactly that, might be worth searching for
strolling-coder
@strolling-coder
Apologies if my question is not related to DL4J until now I worked with Eclipse and Maven not with Android Studio, so I might have just an isssue with some settings.
I set up a fresh installation of Android Studio 3.5 on OpenSUSE 15.1 and I created an empty Java project. I followed the instructions on the Guide to add dl4j dependencies (1.0.0-beta4). Then in a unit test I tried to create and init a very simple net, but I got a:
java.lang.UnsatisfiedLinkError: no ndicpu in java.library.path
I saw many comments online that pointed to library conflicts, but I thought that in my case they were simply missing at runtime. I tried to change the scope of some dependencies from implementation to testImplementation. I tried to change the test runner from Platform to Gradle, but it didn't help. I still suspect that those library are missing because I stooped it at a breakpoint and I checked with jconsole, the library path is just the basic one.
What direction should I go on to investigate the issue? Can it be just some missing configuration?
Alex Black
@AlexDBlack
@strolling-coder have you seen this page?
https://deeplearning4j.org/docs/latest/deeplearning4j-android
strolling-coder
@strolling-coder
@AlexDBlack Of course, that's the guide I mentioned. I tried all the scope variations, now I am using implementation because compile was deprecated, but the result is the same
Alex Black
@AlexDBlack
is there more to the stack trace? (like a 'caused by')
maybe post the full output / stack trace in a !gist
gitterBot
@raver120
To use gist: paste your code/exception/large output log into https://gist.github.com, click 'Create Secret Gist' and paste URL link here
strolling-coder
@strolling-coder
Here is the build gradle I used (latest attempt) and the error trace:
BTW I tried also RunWith(Junit4.class) but I had to remove it, because it didn't compile
Alex Black
@AlexDBlack
you're trying to run this in IDE, right? if so, you'll need the nd4j-native binaries (can't run android/arm binaries on win/linux/mac and x86 cpu)
or within the emulator? (not sure how our native libraries will work with emulator tbh)
Polly13
@Polly1358941334_twitter
@AlexDBlack Thanks.
1) I see. That layer is CNN1d.
2) Understand
3) «Depends what's after.» - For example you have Embedding layer after. In argument of feedforward(2) you can set any integer value and model will work well.
It might be a bug that was noted here. https://gitter.im/deeplearning4j/deeplearning4j?at=5d9d1afa0e67130aae3b30dd
strolling-coder
@strolling-coder
@AlexDBlack Yes I was trying to run the test in the IDE, now I also tried to create an instrumented test and run it in the emulator, in the second case I get the error "more than one file was found with OS independent path"
Susan Eraly
@eraly
@Polly1358941334_twitter Input type Recurrent(16) means you have a 3d array with arbitrary batch size and time steps with 16 features each.