These are chat archives for deeplearning4j/deeplearning4j/earlyadopters

5th
Jun 2016
Patrick Skjennum
@Habitats
Jun 05 2016 01:56
blob
what's with the pointer allocations?
Adam Gibson
@agibsonccc
Jun 05 2016 01:57
hmm the only thing we did was change the internal pointer references from longs to pointers
Before we just mark this as bad I'd like to actually see if this affects the numbers or not
We're down to the wire in terms of optimizations being made at this point
Patrick Skjennum
@Habitats
Jun 05 2016 03:44
@agibsonccc @saudet dl4j now uses 60gb memory, with heap being only 20gb, where 6gb is in use... and this is with the CMS GC
and forcing gc does not reduce memory usage
Adam Gibson
@agibsonccc
Jun 05 2016 03:45
up from what?
Patrick Skjennum
@Habitats
Jun 05 2016 03:45
and the program eventually crashes with oom/segfault
Adam Gibson
@agibsonccc
Jun 05 2016 03:45
up from what?
Patrick Skjennum
@Habitats
Jun 05 2016 03:47
it never goes above 40 total on my windows machine, but the unix instances are going crazy
Adam Gibson
@agibsonccc
Jun 05 2016 03:47
Well what I'm trying to figure out here is if our recent changes were the cause?
Answer my first question here first
What was the baseline a few days ago?
Patrick Skjennum
@Habitats
Jun 05 2016 03:47
read what i just wrote?
Adam Gibson
@agibsonccc
Jun 05 2016 03:48
Well no
I meant a few days ago
Like is this relative to something?
Or is this just the first time you're checking?
Patrick Skjennum
@Habitats
Jun 05 2016 03:48
no it never crashed with oom running this job, i don't have exact numbers
but it didn't crash and burn
Adam Gibson
@agibsonccc
Jun 05 2016 03:49
You still haven't given me an actual answer
I'm specifically asking for what it was a few days ago
Patrick Skjennum
@Habitats
Jun 05 2016 03:54
and i'm specifcially telling you i don't have exact numbers
Adam Gibson
@agibsonccc
Jun 05 2016 03:54
Right but why wasn't this discussed a few days ago?
If there were memory problems you should have said something earlier
Patrick Skjennum
@Habitats
Jun 05 2016 03:55
yeah i didn't fucking know until now
Adam Gibson
@agibsonccc
Jun 05 2016 03:55
You were doing all this benchmarking o_0
It just surprises me because you had the profiler open the whole time
I mean we can dig in to it but we'd need to really zoom in on this
Patrick Skjennum
@Habitats
Jun 05 2016 03:56
yeah on my local machine, where this problem doesn't exist
it only happens on my unix vms
Adam Gibson
@agibsonccc
Jun 05 2016 03:57
the joyent ones?
Patrick Skjennum
@Habitats
Jun 05 2016 03:57
yes
Adam Gibson
@agibsonccc
Jun 05 2016 03:57
not sure what to tell you there
Patrick Skjennum
@Habitats
Jun 05 2016 04:03
is there no way to force a gc on libnd4j?
Adam Gibson
@agibsonccc
Jun 05 2016 04:03
it's all offheap
Look at it from our perspective here a bit - how can we know that or even test for it?
You're the only one using joyent machines
We can try to speculate what's wrong but we'd have to do profiling on our side somehow
How do you even get access to those? I'm serious
Patrick Skjennum
@Habitats
Jun 05 2016 04:06
i gave treo access to one of my joyents
Adam Gibson
@agibsonccc
Jun 05 2016 04:06
I'd say file an issue and let him dig in then
He can probably distill the issue down to something we can actually fix
This is likely going to be a one off glibc thing or something crazy
or the alloc/dealloc specific to those machine
we've encountered that on windows for instance
Keep in mind I'm not saying you're wrong or anything
I'm just trying to understand how we can debug this
Patrick Skjennum
@Habitats
Jun 05 2016 04:07
but what's the estimated off-heap usage anyway?
is there any?
Adam Gibson
@agibsonccc
Jun 05 2016 04:07
All I can say is file an issue
You just asked me: "Can you predict the usage of an arbitary algorithm for an arbitrary problem"
Patrick Skjennum
@Habitats
Jun 05 2016 04:08
atm i don't know how to identify if there's a problem other than when it crashes
Adam Gibson
@agibsonccc
Jun 05 2016 04:08
It's the same as watching the speed
You'd watch the memory usage over time and plot the allocations
We'd also need relative numbers for a known working environment
eg: linux/windows
Patrick Skjennum
@Habitats
Jun 05 2016 04:09
yeah well, is it that far-fetched to estimate it within an order of magnitude?
Adam Gibson
@agibsonccc
Jun 05 2016 04:10
exact numbers are what matter here
You just asked me to pick a discrete point on a continuous distribution
Eg: can't be done
Magnitudes aren't useful
only relative numbers
I'm being a hard ass about this so we can actually try to FIX the problem
not just randomly guess
Patrick Skjennum
@Habitats
Jun 05 2016 04:11
no i asked for an estiamte. it's a 1000 input, two hidden (500, 300) layer network with 800k training examples
is that supposed to use 60gb offheap mem?
prob not?
Adam Gibson
@agibsonccc
Jun 05 2016 04:11
ok so you're getting better now
So that's going to be relative to the number of partitions, a batch size, and your data type (float/double)
Patrick Skjennum
@Habitats
Jun 05 2016 04:12
i'm using floats
Adam Gibson
@agibsonccc
Jun 05 2016 04:12
Then fill in the other variables
You can compute rough memory usage per core if you know the batch size
Patrick Skjennum
@Habitats
Jun 05 2016 04:12
250 atm
Adam Gibson
@agibsonccc
Jun 05 2016 04:12
It also depends how much of your data is in memory
Right I'm not going to do the calculation for you
I'm telling you how to do it
Eg: calculate the ram usage from the matrices floating around on each core
You'd know all those variables
Eg: on each core there's a job that has 1 neural net
that neural net has k parameters
and the batches have a certain length * the data type size
Patrick Skjennum
@Habitats
Jun 05 2016 04:14
yeah there isn't even any parallelization here, the job that crashed trained a single net on a single core
Adam Gibson
@agibsonccc
Jun 05 2016 04:14
You can at least get magnitude etimates from that
well even better then
You should be able to get the ram usage based on the data type and neural net/bath size
Start calculating those numbers and hit @treo up
I have a plane ride to get ready for here in a bit
or file a github issue
Alex Black
@AlexDBlack
Jun 05 2016 05:12
@treo you had libnd4j (cuda) compiling on Windows? seeing an issue there currently (cc @raver119)
https://gist.github.com/AlexDBlack/ec14ef827b41713ca615519554adae40
raver119
@raver119
Jun 05 2016 05:42
@AlexDBlack last time i was compiling master 6 hours ago, everything was fine
@AlexDBlack current libnd4j master still compiles for me, just checked both debug and release
Adam Gibson
@agibsonccc
Jun 05 2016 05:57
JFYI for you guys - I'm working on an integrated streaming lib now
spark/kafka/canova/camel
Hopefully we can see some online learning happening once this is done :D
Alex Black
@AlexDBlack
Jun 05 2016 06:22
@raver119 thanks. probably something on my system then. not that I've changed anything recently :/
Paul Dubs
@treo
Jun 05 2016 08:44
@AlexDBlack try deleting the checkout and starting with a fresh clone. And make sure your storage isn't full.
Alex Black
@AlexDBlack
Jun 05 2016 09:37
hm, deleted and re-cloned, all good now
Paul Dubs
@treo
Jun 05 2016 10:11
Cmake has some Cache files that can sometimes make life harder. And as they are ignored by git, the repository looks clean even though it isn't
Ruben Fiszel
@rubenfiszel
Jun 05 2016 10:58
Hello, will you be able to update libnd4j and dep on maven-central today ?
Alex Black
@AlexDBlack
Jun 05 2016 11:05
@treo good to know, thanks
@atollFP release is going out tomorrow, I believe
Alex Black
@AlexDBlack
Jun 05 2016 14:45
just a heads up in case anyone is interested:
I've added a basic hyperparameter optimization example (random search over neural network hyperparameters) using Arbiter to the dl4j examples repo
deeplearning4j/dl4j-0.4-examples#154
I'll probably get a more advanced example up there in addition to that, at some point. Plus we need docs on Arbiter at some point too...
Dror370
@Dror370
Jun 05 2016 16:50
Hi All, I wander if someone can upload a full step by step Deeplearning4j/nd4j installation for GPU running on Ubuntu 64 14.04.
I was trying to go over the the full installation , but still unable to run on GPU. It seems that i have problem with the jcuda loading/jblast??
Adam Gibson
@agibsonccc
Jun 05 2016 18:33
@Dror370 not sure where we say to install jcuda
Is there some out of date tutorial on the internet I need to complain about?
If not chances are it's your cuda installation
Dl4j doesn't take any setup unless ypu ate compiling from source
You are
Dror370
@Dror370
Jun 05 2016 20:06
Thanks in advance, I am using caffe with my CuDnn on Nvidia Titan Z , and everything is OK,
Adam Gibson
@agibsonccc
Jun 05 2016 20:10
@Dror370 Not sure what the difference would be - do you have a concrete error of some kind if you just use rc3.9 and nd4j-cuda-7.5?
Is nvcc on your path?
Adam Gibson
@agibsonccc
Jun 05 2016 20:12
Could you just answer my question?
Your reply makes no sense
I'm asking if you have an error and if nvcc is on your path
Dror370
@Dror370
Jun 05 2016 20:14
Sorry, I am using rc3.9 and my cuda NVidia cuda is 7.028
Adam Gibson
@agibsonccc
Jun 05 2016 20:14
k so upgrade to 7.5
"but why?!" : we didn't get around to backporting stuff yet
Just upgrade
if you can't file an issue
Dror370
@Dror370
Jun 05 2016 20:15
It seems that I am running on my Backened - my cpu instead of the GPU, even if I am changing the POM>XML
Adam Gibson
@agibsonccc
Jun 05 2016 20:15
You aren't changing your pom right AND your cuda version is wrong
Are you modifying the examples?
Look for nd4j-native
If you so you need to change the nd4j backend in the properties
Change that to nd4j-cuda-7.5
Dror370
@Dror370
Jun 05 2016 20:17
Not at all, I am just trying to make it run on the GPU
Adam Gibson
@agibsonccc
Jun 05 2016 20:17
This message was deleted
This message was deleted
k
so you are starting from scratch?
Can I see your pom maybe?
Dror370
@Dror370
Jun 05 2016 20:18
This is my POM- <dependencyManagement>
<dependencies>
<!--<dependency> <groupId>org.nd4j</groupId> <artifactId>nd4j-native</artifactId> <version>${nd4j.version}</version> </dependency>-->
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-cuda-7.0</artifactId>
<version>${nd4j.version}</version>
</dependency>
</dependencies>
</dependencyManagement>
Adam Gibson
@agibsonccc
Jun 05 2016 20:18
So again
I've said twice
7.0 is not supported at all
I've also said for the 3rd time now you need to upgrade
nd4j-cuda-7.0 isn't in there
I've also told you why
Scroll up
We also say what the available versions are right on that page
one of: 7.5
Dror370
@Dror370
Jun 05 2016 20:20
Thanks for your patience, I am new coming from Caffe and Tourch
Adam Gibson
@agibsonccc
Jun 05 2016 20:20
That's all we support atm
Right sure
I'm just trying to get you to read a bit closer
Dror370
@Dror370
Jun 05 2016 20:21
Thanks a lot I am appreciate this
Adam Gibson
@agibsonccc
Jun 05 2016 20:21
yup
The docs could use a lot of work - so if you have anything that's not readable feel free to open up issues on the deeplearning4j/deeplearning4j or deeplearning4j/nd4j repos
Dror370
@Dror370
Jun 05 2016 20:22
I am promising to remember this, thanks. There is still problems with cuda 7.5 and Caffe, so I was avoiding the upgrate
Adam Gibson
@agibsonccc
Jun 05 2016 20:23
really?! huh
We're working on supporting 8 right now o_0
good to know
Maybe a docker container might work?
The normal workflow for the docker containers is to mount a volume
You mount a volume on the docker container that's persistent on your host os
and you modify the code in there - the docker container picks it up
Dror370
@Dror370
Jun 05 2016 20:25
I came to deeplearning4j, because the example of DBN, I was trying to implement it using Caffe and it was a very hard to get.
Adam Gibson
@agibsonccc
Jun 05 2016 20:25
It's getting there
Our neural nets are made with a numpy fo rjava
for java*
so it's fairly easy to add layers and the like
not as easy as tensorflow/theano yet but not terrible
hence the backend concept
it maybe a bit disorienting at first
it's better than recompiling though
k I got a flight in a few here - keep posting if you have other questions
Dror370
@Dror370
Jun 05 2016 20:28
The example work fine on cpu, but it is not practical for me to train without a GPU capabilities. Take care and have a good flight
Thanks, you are doing a great job, may be we met on the ILSVRC 2016
Adam Gibson
@agibsonccc
Jun 05 2016 20:39
@Dror370 you just need to upgrade
And I gave you a way to run cuda
I'm not sure what else you are missing
It's not like we are just flat out missing gpu capabilities
Also..doubtful dl4j and the company behind it is actually enterprise
You are more likely to see me at a fintech conference