These are chat archives for deeplearning4j/deeplearning4j/earlyadopters

6th
Jun 2016
rawhazard
@rawhazard
Jun 06 2016 09:11
hello
I get a fatal error with rc3.10 SNAPSHOT when training a CNN with spark on 10 vCPUs
I didn't get the error last week when I tried using spark with only 2 cpus
Paul Dubs
@treo
Jun 06 2016 09:13
are you actually on master?
i.e. when have you rebuild everything the last time? You are probably reporting the bad luck bug :)
rawhazard
@rawhazard
Jun 06 2016 09:14
last week if I'm not wrong
like on monday or tuesday something like that
Paul Dubs
@treo
Jun 06 2016 09:15
it was fixed 2 days ago
rawhazard
@rawhazard
Jun 06 2016 09:15
oh ok
so I have to build everything again from scratch?
Paul Dubs
@treo
Jun 06 2016 09:15
you could wait for some hours until the rc3.10 release lands
rawhazard
@rawhazard
Jun 06 2016 09:16
today?
Paul Dubs
@treo
Jun 06 2016 09:16
as far as I know it is being prepared for release at the moment
Alex Black
@AlexDBlack
Jun 06 2016 09:17
right, should be out today
rawhazard
@rawhazard
Jun 06 2016 09:17
ok
I'll wait a while
thanks
rawhazard
@rawhazard
Jun 06 2016 09:36
btw, I only get the fatal error when training with spark
Adam Gibson
@agibsonccc
Jun 06 2016 12:05
After this - we're going to work on the versioning a bit..suggestions?
I kinda wanted to wait till a 1.0 but it'd be good to get something a bit more sane in place
Paul Dubs
@treo
Jun 06 2016 12:09
I guess the next release should be 0.5 and the rc's should be used for actual rc's?
Adam Gibson
@agibsonccc
Jun 06 2016 12:09
that's part of my trouble yes
I'm thinking of migrating to just 0.4 maybe?
then just try to prevent the 0.5 from getting out of control?
lol
Paul Dubs
@treo
Jun 06 2016 12:10
just 0.4 would be fine for me as well
Adam Gibson
@agibsonccc
Jun 06 2016 12:10
yeah ok
Paul Dubs
@treo
Jun 06 2016 12:10
I don't care about numbers really, just their relationship :D
Adam Gibson
@agibsonccc
Jun 06 2016 12:10
ok so we'll let one more release through with debugging
I'll likely be getting some canova love in to this next release and some stuff related to unit tests then I say we cut it
now that the CI is up it should be easier to track changes
I'll hopefully get osx/windows taken care of this week
at least running some sort of a compiler
Paul Dubs
@treo
Jun 06 2016 12:15
An easily automated publishing would be great
Adam Gibson
@agibsonccc
Jun 06 2016 12:15
BY FAR the goal
I've been working on a mix of that and some streaming stuff
(real time learning :D)
Paul Dubs
@treo
Jun 06 2016 12:16
-SNAPSHOT builds could just track master, so people don't need to build from source
Adam Gibson
@agibsonccc
Jun 06 2016 12:16
ideally yes
Yeah that's just harder to setup for the different OSes
that's the main goal
Paul Dubs
@treo
Jun 06 2016 12:19
And on everything else I guess that following semver is pretty useful.
rawhazard
@rawhazard
Jun 06 2016 14:14
sorry guys I have a question
the last time i compiled from source
now I want to use the brand new rc3.10, how should I do that? apparently, just changing the pom from rc3.10-SNAPSHOT to rc3.10 is not enough
probably is not correct at all
Paul Dubs
@treo
Jun 06 2016 14:16
actually, it should be enough
rawhazard
@rawhazard
Jun 06 2016 14:16
but my IDE says this: Missing artifact org.deeplearning4j:deeplearning4j-core:jar:0.4-rc3.10
and the same for a nd4j e dl4j-ui
Paul Dubs
@treo
Jun 06 2016 14:19
make sure you aren't using some kind of local or caching repository
rawhazard
@rawhazard
Jun 06 2016 14:20
yes dependencies should be correct. (this is my pom https://gist.github.com/rawhazard/7bd92d03d88f40acbecda892e5dd18a4)
not sure about the local or caching repo though
uhm
Paul Dubs
@treo
Jun 06 2016 14:26
checkout the examples and try to see if they run for you
rawhazard
@rawhazard
Jun 06 2016 14:37
I get the same error with the examples
looks like eclipse is looking locally for the dependencies...
Alex Black
@AlexDBlack
Jun 06 2016 14:39
re-run mvn eclipse:eclipse after updating your pom
Adam Gibson
@agibsonccc
Jun 06 2016 14:40
Try running mvn clean install on your project from the command lin
rawhazard
@rawhazard
Jun 06 2016 14:50
mvn clean install gave me a build success
Paul Dubs
@treo
Jun 06 2016 14:51
great, that means your problem is somewhere withing eclipse, try what @AlexDBlack said
rawhazard
@rawhazard
Jun 06 2016 14:54
where should I run that? in my project or?
Paul Dubs
@treo
Jun 06 2016 14:55
in your project
rawhazard
@rawhazard
Jun 06 2016 14:56
didn't help
and something weird happened in the build path
tons of jar are there now
rawhazard
@rawhazard
Jun 06 2016 15:13
I built a new project in eclipse, it's okay now
Dror370
@Dror370
Jun 06 2016 16:22
Hi All sorry for interrupting, I wander if in order to run the samples on GPU I need OpenBlast to be install
Adam Gibson
@agibsonccc
Jun 06 2016 16:22
@Dror370 I'm a bit confused
I think we've already told you you don't need it
you don't need that OR jcuda
Please read what we tell you carefully
and don't try to read in between the lines
There's literally nothing else you need but cuda installed
We have everything else
raver119
@raver119
Jun 06 2016 16:27
GPU doesn't uses OpenBLAS. cuBLAS is used for gpu :)
Dror370
@Dror370
Jun 06 2016 16:34
Thanks, Just passing trough the full install and it's come up again
Adam Gibson
@agibsonccc
Jun 06 2016 16:34
Right so I was wondering what openblast was...
that makes sense
I'm not sure what's hard here though
For building you could try using the dev tools I guess
from red hat
failing all else
Paul Dubs
@treo
Jun 06 2016 16:35
@Dror370 still not sure what you are actually trying to achieve?
Dror370
@Dror370
Jun 06 2016 16:38
trying to run on NVidia titanZ, the examples run well on cpu but not on GPU, even after updating the pom.xml and updating to cuda 7.5
Paul Dubs
@treo
Jun 06 2016 16:39
if you are still seeing things about openblas, you are not using rc3.10
Dror370
@Dror370
Jun 06 2016 16:39
The GPU doesn't go up check NVidia-smi, and still java process doesn't show
either it will crash and tell you that there is no backend available OR it will use your GPU
there is no fallback to CPU in nd4j-cuda-7.5
Dror370
@Dror370
Jun 06 2016 16:41
Thanks, I am checking
Dror370
@Dror370
Jun 06 2016 16:56

Still not see the process on NVIDIA-SMI, and CPU- 7 cores are all up to 100%, This is my POM ::

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<prerequisites>
<maven>3.3.9</maven>
</prerequisites>

<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-examples</artifactId>
<version>0.4-rc0-SNAPSHOT</version>

<name>DeepLearning4j Examples</name>
<description>Examples of training different data sets</description>
<properties>
    <!--<nd4j.backend>nd4j-native</nd4j.backend>-->
    <nd4j.backend> nd4j-cuda-7.5</nd4j.backend>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <shadedClassifier>bin</shadedClassifier>
    <java.version>1.8</java.version>
    <nd4j.version>0.4-rc3.9</nd4j.version>
    <dl4j.version>0.4-rc3.9</dl4j.version>
    <canova.version>0.0.0.15</canova.version>
    <guava.version>19.0</guava.version>
    <jfreechart.version>1.0.13</jfreechart.version>
    <maven-shade-plugin.version>2.4.3</maven-shade-plugin.version>
    <exec-maven-plugin.version>1.4.0</exec-maven-plugin.version>
</properties>

<repositories>
    <repository>
        <id>snapshots-repo</id>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
        <releases>
            <enabled>false</enabled>
        </releases>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
    </repository>
</repositories>

<distributionManagement>
    <snapshotRepository>
        <id>sonatype-nexus-snapshots</id>
        <name>Sonatype Nexus snapshot repository</name>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    </snapshotRepository>
    <repository>
        <id>nexus-releases</id>
        <name>Nexus Release Repository</name>
        <url>http://oss.sonatype.org/service/local/staging/deploy/maven2/</url>
    </repository>
</distributionManagement>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-native</artifactId>
            <version>${nd4j.version}</version>
        </dependency>
        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-cuda-7.5</artifactId>
            <version>${nd4j.version}</version>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-nlp</artifactId>
        <version>${dl4j.version}</version>
    </dependency>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-core</artifactId>
        <version>${dl4j.version}</version>
    </dependency>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-ui</artifactId>
        <version>${dl4j.version}</version>
    </dependency>
    <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
        <version>${guava.version}</version>
    </dependency>
    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>${nd4j.backend}</artifactId>
    </dependency>
    <dependency>
        <artifactId>canova-nd4j-image</artifactId>
        <groupId>org.nd4j</groupId>
        <version>${canova.version}</version>
    </dependency>
    <dependency>
        <artifactId>canova-nd4j-codec</artifactId>
        <groupId>org.nd4j</groupId>
        <version>${canova.version}</version>
    </dependency>
    <!-- Used in the RegressionMath
Dror370
@Dror370
Jun 06 2016 17:06
Idea made Auto dependencies update, after pom.xml changing and, only after that , the GPU get into function
Paul Dubs
@treo
Jun 06 2016 17:07
so it is working now, great :)
Dror370
@Dror370
Jun 06 2016 17:07
Sorry All, Thanks for the patience- Adam and Paul - Sorry for my putting my frustration on you all,
raver119
@raver119
Jun 06 2016 17:08
don't forget to update to 3.10
it's faster and expected to be less buggy :)
Dror370
@Dror370
Jun 06 2016 17:11
Sure thanks, blood remarks- in IDEA, after pom.xml changing, make dependency synchronization, and remember JDK change must include Module src configuring, otherwise smart operators <> not work especially with the update to 1.8.
Paul Dubs
@treo
Jun 06 2016 19:04
Now that rc3.10 is out, is there any reason not to merge deeplearning4j/libnd4j#229 ?
Adam Gibson
@agibsonccc
Jun 06 2016 19:05
No do it
Adam Gibson
@agibsonccc
Jun 06 2016 19:12
So I have a canova/kafka/spark streaming setup now
Not sure what to do with this yet
We can pipe/stream stuff in to dl4j now
Paul Dubs
@treo
Jun 06 2016 19:15
now integrate it with riemann :P
Adam Gibson
@agibsonccc
Jun 06 2016 19:16
google search sees .rb extension closes tab*
Paul Dubs
@treo
Jun 06 2016 19:16
try again, you should see .clj extension :D
Adam Gibson
@agibsonccc
Jun 06 2016 19:16
Sees .clj tab Looks for java interface closes tab
Paul Dubs
@treo
Jun 06 2016 19:16
the talk directly on the frontpage of riemann.io is pretty entertaining
Adam Gibson
@agibsonccc
Jun 06 2016 19:16
:P
Paul Dubs
@treo
Jun 06 2016 19:17
I know, given that camel integrates with pretty much everything, you can people to use it instead :D
Adam Gibson
@agibsonccc
Jun 06 2016 19:21
yesssss
Ben Wellner
@wellner
Jun 06 2016 22:39
I have a simple convnet example slightly modified from the examples that works fine with nd4j-native backend, but when I run using a GPU (Tesla K80) I get errors: error] (UniGC thread 3) java.lang.RuntimeException: java.lang.IllegalStateException: Can't allocate [DEVICE] special buffer memory! 18:36:38.678 [UniGC thread 2] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new context... 18:36:38.679 [UniGC thread 2] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for device [0]... [error] (UniGC thread 2) java.lang.RuntimeException: java.lang.IllegalStateException: Can't allocate [DEVICE] allocation buffer memory! java.lang.RuntimeException: java.lang.IllegalStateException: Can't allocate [DEVICE] special buffer memory! at org.nd4j.jita.allocator.context.impl.BasicContextPool.acquireContextForDevice(BasicContextPool.java:137) at org.nd4j.jita.handler.impl.CudaZeroHandler.getCudaContext(CudaZeroHandler.java:1064) at org.nd4j.jita.handler.impl.CudaZeroHandler.getDeviceContext(CudaZeroHandler.java:1054) at org.nd4j.jita.allocator.impl.AtomicAllocator.getDeviceContext(AtomicAllocator.java:165) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:76) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:104) at org.nd4j.jita.handler.impl.CudaZeroHandler.purgeDeviceObject(CudaZeroHandler.java:941) at org.nd4j.jita.allocator.impl.AtomicAllocator.purgeDeviceObject(AtomicAllocator.java:408) at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:581)
Adam Gibson
@agibsonccc
Jun 06 2016 22:44
Looks like it's running out of memory
java.lang.IllegalStateException: Can't allocate [DEVICE] allocation buffer memory!
Have you looked at nvidia-smi to verify this?