These are chat archives for deeplearning4j/deeplearning4j/earlyadopters

23rd
May 2016
Paul Dubs
@treo
May 23 2016 07:36
@crockpotveggies even if you don't want to go the extra mile of building a more current libiomp5 you should install it, and LD_PRELOAD=/usr/lib/libiomp5.so (or whatever the path is for you :D)
Andreas Eberle
@andreas-eberle
May 23 2016 08:10
@raver119: Are the cuda fixes out in a new / updated release? Or do I still have to compile from code?
Adam Gibson
@agibsonccc
May 23 2016 08:11
Compile atm
Andreas Eberle
@andreas-eberle
May 23 2016 08:21
k, thx
Patrick Skjennum
@Habitats
May 23 2016 11:43
is it not possible to load models that i have trained one computer on another ... or that differs in 1 git commit?
Paul Dubs
@treo
May 23 2016 11:44
@Habitats it should be possible
what problem are you seeing?
Patrick Skjennum
@Habitats
May 23 2016 11:44
getting weird deserialization ... but if it should be possible the problem is probably on my end
Paul Dubs
@treo
May 23 2016 11:45
You may have the problem that you can't load the updater, but the model and parameters should load fine
Patrick Skjennum
@Habitats
May 23 2016 11:47
btw i have very limited time to do testing atm:\ i need to get my own shit done
David Kolb
@Treiblesschorle
May 23 2016 11:48
Hey guys, is there a way to swap between CPU and GPU in java if i have both dependencies?
Paul Dubs
@treo
May 23 2016 11:48
@Treiblesschorle not yet
having both dependencies in your pom.xml is currently a recipe for problems
David Kolb
@Treiblesschorle
May 23 2016 11:50
ok thanks, do you plan to add this?
Paul Dubs
@treo
May 23 2016 11:51
As far as I know it is planned, but I don't know when to expect it
David Kolb
@Treiblesschorle
May 23 2016 11:51
ok thanks for the info
Adam Gibson
@agibsonccc
May 23 2016 13:41
@treo #pragma omp parallel for simd schedule(guided)
^
In file included from /opt/libnd4j/blas/cpu/../NativeOpExcutioner.h:12:0,
from /opt/libnd4j/blas/cpu/NativeOps.cpp:6:
/opt/libnd4j/include/reduce3.h: In member function 'T functions::reduce3::Reduce3<T>::execScalar(T, int, T, T, int*)':
/opt/libnd4j/include/reduce3.h:610:26: error: expected '#pragma omp' clause before 'simd'

pragma omp parallel for simd

You see this?
Paul Dubs
@treo
May 23 2016 13:42
nope
Adam Gibson
@agibsonccc
May 23 2016 13:42
this came up on one of my vms :P
Paul Dubs
@treo
May 23 2016 13:42
gcc version?
Adam Gibson
@agibsonccc
May 23 2016 13:42
I might have to remove those
older
it's centos
Is there a version of gcc we require now?
I'm assuming you're abusing newer tricks?
Paul Dubs
@treo
May 23 2016 13:43
Not really abusing :D
It just tells it to also use simd when parallelizing
Adam Gibson
@agibsonccc
May 23 2016 13:44
right
Could you try running JUST the cpu compilation on centos or something?
Paul Dubs
@treo
May 23 2016 13:44
according to http://openmp.org/mp-documents/OpenMP-4.0-C.pdf it is valid openmp 4.0
Adam Gibson
@agibsonccc
May 23 2016 13:45
I'm curious to see if there's something I'm missing here
sec will try locally as well
Paul Dubs
@treo
May 23 2016 13:45
which gcc version is it there?
I'm using 5.3 on windows and linux currently
Adam Gibson
@agibsonccc
May 23 2016 13:45
honestly not sure
it's in a docker container with centos 7
sec
Paul Dubs
@treo
May 23 2016 13:46

GCC 4.9 supports OpenMP 4.0 for C/C++

from the GCC Wiki

Adam Gibson
@agibsonccc
May 23 2016 13:47
gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)
:P
yeah that's the latest one
Paul Dubs
@treo
May 23 2016 13:49
so on centos the best we can have is openmp 3.1
Adam Gibson
@agibsonccc
May 23 2016 13:50
nah
I'm going to upgrade our docker containers
they fixed this a long time ago
was just verifying
Paul Dubs
@treo
May 23 2016 13:51
so the minimal GCC version is now 4.9 (as documented :D)
Melanie Warrick
@nyghtowl
May 23 2016 17:42
@treo I've updated the aws instance to 5.1 but we're still getting compile errors: https://gist.github.com/nyghtowl/3ab547e898efca45c2e7079a619bbab1 any recommendations?
Paul Dubs
@treo
May 23 2016 18:17

@nyghtowl

-- The C compiler identification is GNU 4.8.3

I guess it is an ubuntu based ami?
In that case you will have to update the alternatives
Susan Eraly
@eraly
May 23 2016 18:18
@treo So I got it to build after moving stuff around in /usr/bin
and @nyghtowl
Paul Dubs
@treo
May 23 2016 18:18
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5 --slave /usr/bin/gfortran gfortran /usr/bin/gfortran-5
something along the lines of that should be it
Susan Eraly
@eraly
May 23 2016 18:18
But now the issue is that nvcc complains because it doesn't take versions 4.9 and newer
So cpu will build with the newer version of gcc but that will not work for cuda
Paul Dubs
@treo
May 23 2016 18:19
Now that's unfortunate... haven't built cuda on linux yet
the first answer indicates that it should work
Susan Eraly
@eraly
May 23 2016 18:20
Hmm
Let me go take a look
Eh. It's the stuff for simd for the cpu cores.
Paul Dubs
@treo
May 23 2016 18:24
When building cuda it shouldn't care about that anyway
Susan Eraly
@eraly
May 23 2016 18:25
I guess you could build cpu first and then exclude it and build cuda
But that is so so so very hacky
Paul Dubs
@treo
May 23 2016 18:25
I'm not quite sure I can follow... what do you mean by exclude it?
Susan Eraly
@eraly
May 23 2016 18:27
From what I understand the post says you just exclude the header for the MMX registers from the gcc compiler (/usr/lib/gcc/x86_64-redhat-linux/5.3.1/include/mwaitxintrin.h) which I think means you don't get SIMD support.
So this is not a problem on windows?
Paul Dubs
@treo
May 23 2016 18:28
not at all... on windows the visual studio compiler is used for cuda
I can build a cuda executable without having a cuda capable card, right?
Susan Eraly
@eraly
May 23 2016 18:29
I think so.
At least you can hack your way to it
Paul Dubs
@treo
May 23 2016 18:29
great, then I'll try it on my linux vm
Susan Eraly
@eraly
May 23 2016 18:29
Yeah. That would be good. Thanks.
Paul Dubs
@treo
May 23 2016 18:31
Given the documentation here: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#axzz49VNTcH1C GCC 4.9 is shown as supported
let's see how we can force it to support 5.x :D
Susan Eraly
@eraly
May 23 2016 18:35
@treo So Melanie made the same point about how 4.9 is supported. What about downgrading to 4.9? Will cpu build with that? Sorry, I am missing the background on why we had to switch
Paul Dubs
@treo
May 23 2016 18:35
Yes, 4.9 is the minimal version
But I'm not sure everything will be as fast as possible with that
Susan Eraly
@eraly
May 23 2016 18:37
Is there an issue I can go read on this? It's seems like we are caught between a rock and hard place here.
Paul Dubs
@treo
May 23 2016 18:37
I'm not aware of one
You can have both 4.9 and 5.x (and 6.x) at the same time on the system
and builing with one over the other is just an update-alternatives away
I think you can even just pass environment variables if you want
Susan Eraly
@eraly
May 23 2016 18:39
I can. :grimacing: but do I want to? I am not the one who packages this stuff for release, so should I care?
Paul Dubs
@treo
May 23 2016 18:40
The binaries only care about the glibc version anyway
Susan Eraly
@eraly
May 23 2016 18:41
We'll try :)
Patrick Skjennum
@Habitats
May 23 2016 18:44
i don't know if this is dl4j-related, but one of my machines (joyent) suddenly started never giving up it's memory
it just increases until it runs out, and then it segfaults
the jobs run fin on my desktop and on google
Paul Dubs
@treo
May 23 2016 18:45
the joyent thing is getting weirder by the day
Patrick Skjennum
@Habitats
May 23 2016 18:45
doesn't matter which jobs i run ... it always happens
yeah idno dude
Paul Dubs
@treo
May 23 2016 18:45
Do you now which SmartOS version it is running?
Patrick Skjennum
@Habitats
May 23 2016 18:45
it's pretty rad when it works, but it started doing this crazy shit yestarday
no, how do i check that?
actually, it crashes because it increases swap beyond its limit
(i've even turned swap off, but it's using it anyway...)
Paul Dubs
@treo
May 23 2016 18:47
don't know if you can from inside one of the lx zones
you'll probably have to poke around in /sys and /proc and maybe check what dmesg says
and then there is also bugs like this: https://smartos.org/bugview/OS-3985
Patrick Skjennum
@Habitats
May 23 2016 18:54
dmesg didn't do anything
no output
Paul Dubs
@treo
May 23 2016 18:55
not unexpected...
Patrick Skjennum
@Habitats
May 23 2016 18:55
look at this nonsense
blob
it's running out of swap for absolutely no reason
even when swap is turned off
Paul Dubs
@treo
May 23 2016 18:56
what's the result of uname -a ?
Patrick Skjennum
@Habitats
May 23 2016 18:56
Linux sparktest 3.13.0 BrandZ virtual linux x86_64 x86_64 x86_64 GNU/Linux
Paul Dubs
@treo
May 23 2016 18:58
ok... so, the actual problem seems that you are running on top of the linux emulation layer of smartos, which itself is solaris
or illumos or whatever they are calling the non oracle fork nowadays
As we can see this emulation layer has all kinds of weird behavior
and even from their own documentation about compatibility they're claiming only partial java compatibility with that
I'd love to see how this behaves if you were given a solaris based zone, but you have already enough on your plate with your thesis deadline looming over your head
Patrick Skjennum
@Habitats
May 23 2016 19:01
;|
yeah
and i just figured out dl4j's fscore didn't match my own implmeneted evals, so all of my results up until now need to be scrapped lol
so yeah, suddenly got a lot more to do
Paul Dubs
@treo
May 23 2016 19:02
at least it doesn't take 45 days for a single run anymore :)
Patrick Skjennum
@Habitats
May 23 2016 19:03
sigh
yeah
Paul Dubs
@treo
May 23 2016 19:03
you can use the joyent based vm for free, right?
Patrick Skjennum
@Habitats
May 23 2016 19:03
yeah
and i can get 10 of em
but the sysop is a lazy ass
atm i'm stuck with google, which is $25 a day ..
Paul Dubs
@treo
May 23 2016 19:05
If you want you can ask them to give you a SmartOS based zone instead, and I'll take a look if I can get dl4j going on it
Patrick Skjennum
@Habitats
May 23 2016 19:10
i goto talk to thim this week
he stopped responding to emails:P
Paul Dubs
@treo
May 23 2016 19:15
@eraly I can trick NVCC into thinking it uses an older compiler, and it looks like everything builds just fine
@agibsonccc If GCC 4.9 is the minimum required, shouldn't we maybe check that that is what is used?
Melanie Warrick
@nyghtowl
May 23 2016 19:25
@treo how do you trick nvcc?
Paul Dubs
@treo
May 23 2016 19:26
by redefining the GCC version number that is available for the preprocessor
Melanie Warrick
@nyghtowl
May 23 2016 19:26
Where do you redefine the version number?
Paul Dubs
@treo
May 23 2016 19:27
in the CMakeLists file for blas, I'll post a pull request shortly, so we can discuss how bad of an idea that really is
Melanie Warrick
@nyghtowl
May 23 2016 19:27
Sounds good
Paul Dubs
@treo
May 23 2016 19:39
deeplearning4j/libnd4j#210 there it is
raver119
@raver119
May 23 2016 20:03
@andreas-eberle fixes are available at master, and improvements are available at my private branches.
Paul Dubs
@treo
May 23 2016 20:04
A right, I should take a look at those as well :D
raver119
@raver119
May 23 2016 20:05
lol
yes, this time we need more tests before release :) i’m not going to fall in the same pitfall twice lol
@AlexDBlack and me tested it, but it would be just awesome if you’ll take a look there too
Paul Dubs
@treo
May 23 2016 20:08
so that is libndj4/large_reduce and nd4j/r119_next_step right?
raver119
@raver119
May 23 2016 20:12
sec
let me check history
yes
those two branches
tests & gradients are passing there
also it has better speed
i’ve also checked some examples, nothing is bad so far
alex made check up there too
Adam Gibson
@agibsonccc
May 23 2016 21:00
So did this gcc thing get figured out?
I bumped in to that on my docker images last night..
Paul Dubs
@treo
May 23 2016 21:01
deeplearning4j/libnd4j#210
Paul Dubs
@treo
May 23 2016 21:11
So you don't think that tricking NVCC into supporting GCC > 4.9 is a bad idea :D
Adam Gibson
@agibsonccc
May 23 2016 21:16
It's fine
NVCC I've had to hack anyways
Paul Dubs
@treo
May 23 2016 21:16
great :D
Arthur Rehm
@rehm
May 23 2016 21:36
a Amazon Machine Image (AMI) for deeplearning4j (with GPU support) would be nice :D
Patrick Skjennum
@Habitats
May 23 2016 21:40
@treo getting a kvm-virtualized instance tomorrow, if that doesn't work he'll just give me a native one
Adam Gibson
@agibsonccc
May 23 2016 21:41
@rehm I'm working on some internal stuff for that
Paul Dubs
@treo
May 23 2016 21:41
the kvm one should work, albeit slower than a native one. But on the other hand you will not have to try and got it working on solaris :D
Adam Gibson
@agibsonccc
May 23 2016 21:41
Not sure when the OSS one will come yet
Paul Dubs
@treo
May 23 2016 21:42
Probably when I have to use AWS again :D
but... I would include MKL, so maybe that's not going to happen then
Adam Gibson
@agibsonccc
May 23 2016 21:43
Well I'd like to see what else we can get besides "gpu image"
Right now I have a lot of docker containers
for things like the gui
bundled command line stuff
Paul Dubs
@treo
May 23 2016 21:43
I think it's going to be more interesting to use AWS as soon as both CPU and GPU can be used together
Henry Saputra
@hsaputra
May 23 2016 21:43
Why can we go with just Docker containers?
Adam Gibson
@agibsonccc
May 23 2016 21:44
@hsaputra I have a base AMI I use to build docker containers
For our distro I plan on using docker/k8s
Henry Saputra
@hsaputra
May 23 2016 21:44
Ah I see, ok!
Patrick Skjennum
@Habitats
May 23 2016 21:44
@treo yeah i hope so
Adam Gibson
@agibsonccc
May 23 2016 21:44
amazon container registry etc allows you to basically launch vms
there's no reason I can't repurpose that stuff for oss as well
I'm prototyping what that stuff looks like on the distro
I want the OSS version to be a gutted version of whatever I produce for the core business
eg openblas etc
Paul Dubs
@treo
May 23 2016 21:45
should be pretty simple using stacked dockerfiles
Adam Gibson
@agibsonccc
May 23 2016 21:46
Right
which is what I have now
Right now the distro is basically:
./skil test
that launches a container which will run debug + release tests
and gradient checks + examples
./skil notebook
that will bring up a spark notebook with cuda and everything installed
./skil build-spark will build a spark submit jar
you can give it any hadoop/scala/spark/cuda version
and it outputs a jar
1 command (+main class basically)
Another one I'm going to do is add talking to spark-submit directly
not quite sure how that will work yet
I also want to see what I can do on the oss side with that
So far from what I learned a base ami was needed
then you use that to build docker containers
I want to give a template so people can use amazon to build their own docker containers they could use on other clouds
Arthur Rehm
@rehm
May 23 2016 21:50
@agibsonccc "I want to give a template so people can use amazon to build their own docker containers they could use on other clouds" exactly what i need :D
Adam Gibson
@agibsonccc
May 23 2016 21:51
right
so again this is in our proprietary stuff
an open source version of that would include different components
I'm basically prototyping on our internal stuff now
since I haven't done this before
I'm learning what it looks like by building it
Right now I have an AMI with cuda 7.5 on it but it's pretty hacky
Arthur Rehm
@rehm
May 23 2016 22:21
do you recommend to use cuda 7 or 7.5
Adam Gibson
@agibsonccc
May 23 2016 22:21
There's no option atm
7.5 is it
We have an AMI with 7.5 that's what I'm using
Arthur Rehm
@rehm
May 23 2016 22:21
good to know :D