These are chat archives for deeplearning4j/deeplearning4j/earlyadopters

9th
May 2016
Jeroen Steggink
@jsteggink
May 09 2016 05:25
@AlexDBlack Thanks I will
Arthur Rehm
@rehm
May 09 2016 05:26
@agibsonccc compile the libnd4j sources and set LIBND4j_HOME works - thanks
Andreas Eberle
@andreas-eberle
May 09 2016 08:08
hey guys, is the current master of libnd4j and nd4j stable?
  • I mean should it work...
Paul Dubs
@treo
May 09 2016 08:09
probably should
Adam Gibson
@agibsonccc
May 09 2016 08:09
@andreas-eberle yeah it's been fine we've mainly been benchmarking
all tests are passing on cuda/cpu now
doesn't mean cuda is fast yet though
(keep in mind there)
Andreas Eberle
@andreas-eberle
May 09 2016 08:11
ok, I have two problems...
Adam Gibson
@agibsonccc
May 09 2016 08:12
@treo btw - I just noticed something nice
Paul Dubs
@treo
May 09 2016 08:12
that is? :D
Andreas Eberle
@andreas-eberle
May 09 2016 08:12
when I run an adapted LenetMnistExample with cuda, which ran last week without an issue, I get this: https://gist.github.com/andreas-eberle/a94bbae8bc51204ac1a481dd1971a3cd
Adam Gibson
@agibsonccc
May 09 2016 08:13
@treo you can auto install an open mp version
which means no having to compile from source
Andreas Eberle
@andreas-eberle
May 09 2016 08:14
sorry, messed up the gist, give me a sec
Adam Gibson
@agibsonccc
May 09 2016 08:14
I've been investigating what kind of automation you can do
Worth noting if you use amazon linuxand don't want to install mkl
(or can't)
Paul Dubs
@treo
May 09 2016 08:16
that openblas isn't optimized for the given processor then
Andreas Eberle
@andreas-eberle
May 09 2016 08:16
(on the buttom)
Adam Gibson
@agibsonccc
May 09 2016 08:16
@treo be less vague though
what other kinds of optimizations does it do?
do you mean for haswell vs sandy bridge or something?
Paul Dubs
@treo
May 09 2016 08:17
It detects the instructions the processor supports
avx vs avx2
and stuff like that
Andreas Eberle
@andreas-eberle
May 09 2016 08:18
any idea why cuda is crashing here?
Adam Gibson
@agibsonccc
May 09 2016 08:20
@andreas-eberle file an issue and try to give us something that reproduces it
Exception in thread "main" java.lang.IllegalStateException: MemcpyAsync failed
For any error messages in java ever
you can always scroll to the bottom
that's always the root cause
did your gpu run oom?
try running nvidia-smi throughout your app
and see what it does
raver119
@raver119
May 09 2016 08:22
@andreas-eberle idea is simple
pull masters
for both nd4j and libnd4j
your versions are old
Adam Gibson
@agibsonccc
May 09 2016 08:23
@treo so what would be a sufficient level of optimization here?
3 different versions of openblas?
with openmp?
I'm a company not a grad student with 1 server
I have automation to think about here :P
Andreas Eberle
@andreas-eberle
May 09 2016 08:24
my versions are old? I just pulled master half an hour ago and build everything...
Adam Gibson
@agibsonccc
May 09 2016 08:24
Do you know if it does anything else?
raver119
@raver119
May 09 2016 08:24
impossible...
let me check once again
Patrick Skjennum
@Habitats
May 09 2016 08:25
@agibsonccc didn't know i couldn't just .add()? why is it there if addi is faster?
is it the same for div, mul etc?
Paul Dubs
@treo
May 09 2016 08:25
@agibsonccc as far as I know it only does cpu detection, but that is all from its readme file
Adam Gibson
@agibsonccc
May 09 2016 08:25
addi
inplace
Patrick Skjennum
@Habitats
May 09 2016 08:25
i know what inplace is, but that doesn't answer my question
Adam Gibson
@agibsonccc
May 09 2016 08:25
so say you have a string of array operations
yeah it does answer your question
Patrick Skjennum
@Habitats
May 09 2016 08:26
it doesn't allocate new stuff, i get it
Adam Gibson
@agibsonccc
May 09 2016 08:26
you can use in place operations rather than anything that requires a dup
no it applies it on the same array with no data copy
that means no allocation or data copy happens
you're creating excess objects
say you have a set of ops you need to do
you'd do
Patrick Skjennum
@Habitats
May 09 2016 08:26
yes, i know what inplace is. i'm asking if there's a use case where you wouldn't want that
Adam Gibson
@agibsonccc
May 09 2016 08:26
arr.add(...).addi(...).addi(...)
not really
if you need a new array
you can create the array once with add
then use in place ops after that
so it only creates 1 array
add(...) is just: dup().addi()
fwiw
Patrick Skjennum
@Habitats
May 09 2016 08:28
right
Adam Gibson
@agibsonccc
May 09 2016 08:28
I'd try that
Patrick Skjennum
@Habitats
May 09 2016 08:28
didn't know this. treo gave me the code. i blame him:p
raver119
@raver119
May 09 2016 08:28
funny.
Adam Gibson
@agibsonccc
May 09 2016 08:28
hahahaha
raver119
@raver119
May 09 2016 08:28
thats’ really memcoy caused error
memcpy
Paul Dubs
@treo
May 09 2016 08:28
you mean the one-line pseudo code? :P
Patrick Skjennum
@Habitats
May 09 2016 08:28
yeah!
Adam Gibson
@agibsonccc
May 09 2016 08:28
I'm closing that issue
that's not us
Patrick Skjennum
@Habitats
May 09 2016 08:29
ANYWAY though, it still used 100% cpu last week
with this code
Adam Gibson
@agibsonccc
May 09 2016 08:29
not much I can do about that
Patrick Skjennum
@Habitats
May 09 2016 08:29
that part of the code is old
Adam Gibson
@agibsonccc
May 09 2016 08:29
Unless it's us there's not much I can do here
try updating it first
Patrick Skjennum
@Habitats
May 09 2016 08:29
i am atm
Adam Gibson
@agibsonccc
May 09 2016 08:29
cool
Andreas Eberle
@andreas-eberle
May 09 2016 08:29
@raver119: That means?
raver119
@raver119
May 09 2016 08:29
dont’ know yet
Adam Gibson
@agibsonccc
May 09 2016 08:29
@treo you're our optimization wizard - even your psuedo code should be perfect
Paul Dubs
@treo
May 09 2016 08:30
haha :D
Adam Gibson
@agibsonccc
May 09 2016 08:30
don't give people code that compiles
that's asking for trouble
they blame you for them not doing due diligence
heh
@andreas-eberle what's the spec on your gpu?
just curious
Andreas Eberle
@andreas-eberle
May 09 2016 08:31
GTX 970M with 6 GB VRAM
raver119
@raver119
May 09 2016 08:32
@andreas-eberle show me your config, if you’re using any for cuda
i mean CudaConfiguration
Paul Dubs
@treo
May 09 2016 08:33
@agibsonccc Anyway, on the installing openblas with openmp front: Providing just versions that are optimized for ec2 c4.* and c3.* should probably be enough
Andreas Eberle
@andreas-eberle
May 09 2016 08:33
where would I find it? I didn't create something like that
Patrick Skjennum
@Habitats
May 09 2016 08:33
@agibsonccc so now it's like this, but still maxing at 50-60% CPU
blob
raver119
@raver119
May 09 2016 08:34
@andreas-eberle ok
Adam Gibson
@agibsonccc
May 09 2016 08:34
@Habitats expand that please?
Paul Dubs
@treo
May 09 2016 08:34
@Habitats add the backtrace
Adam Gibson
@agibsonccc
May 09 2016 08:34
pair wise transform is beginning to get on my nerves
Paul Dubs
@treo
May 09 2016 08:35
I haven't profiled it in some time, but ind2sub was like 50% of it last time I looked
raver119
@raver119
May 09 2016 08:36
:)))
Patrick Skjennum
@Habitats
May 09 2016 08:36
it's coming from the same place
blob
maybe i should just cache my doc vectors to file
Andreas Eberle
@andreas-eberle
May 09 2016 08:36
btw: is there still the memory limit for GPUs?
Adam Gibson
@agibsonccc
May 09 2016 08:36
they have infinite ram
nvidia is hardware jesus
raver119
@raver119
May 09 2016 08:37
hahaha
Adam Gibson
@agibsonccc
May 09 2016 08:37
they can do whatever they please
Andreas Eberle
@andreas-eberle
May 09 2016 08:37
I mean the artifical limit set by libnd4j some weeks ago
Paul Dubs
@treo
May 09 2016 08:37
@Habitats that is actually a decent idea
raver119
@raver119
May 09 2016 08:37
@andreas-eberle there’s CudaConfiguration bean
Andreas Eberle
@andreas-eberle
May 09 2016 08:37
is there some documentation, I can read about that?
raver119
@raver119
May 09 2016 08:37
not yet
Adam Gibson
@agibsonccc
May 09 2016 08:37
raver doesn't document code
you have to read his mind
Andreas Eberle
@andreas-eberle
May 09 2016 08:38
:D
Adam Gibson
@agibsonccc
May 09 2016 08:38
and give him an offering
only then are you good enough to use his code
raver119
@raver119
May 09 2016 08:38
sec, will find example in tests
Andreas Eberle
@andreas-eberle
May 09 2016 08:38
thx
Adam Gibson
@agibsonccc
May 09 2016 08:38
I just bribed him
you owe me now
Andreas Eberle
@andreas-eberle
May 09 2016 08:38
:D
Adam Gibson
@agibsonccc
May 09 2016 08:39
@Habitats can you give me the shapes of the arrays it's adding?
raver119
@raver119
May 09 2016 08:39
CudaEnvironment.getInstance().getConfiguration()
.setExecutionModel(Configuration.ExecutionModel.ASYNCHRONOUS)
.setFirstMemory(AllocationStatus.DEVICE)
.setMaximumBlockSize(256)
.enableDebug(true);
read comments here, it’s documented
Andreas Eberle
@andreas-eberle
May 09 2016 08:41
thx
Adam Gibson
@agibsonccc
May 09 2016 08:43
@treo could you profile again?
on that?
This block should never be executing afaik..
Patrick Skjennum
@Habitats
May 09 2016 08:46
@agibsonccc it's only 1000d vectors, stacking them into a matrix then dividing by length.
Adam Gibson
@agibsonccc
May 09 2016 08:46
so you're adding vectors?
hmm
Patrick Skjennum
@Habitats
May 09 2016 08:47
creating document vector from n word vectors
but adding and squashing
Adam Gibson
@agibsonccc
May 09 2016 08:47
hmm
Patrick Skjennum
@Habitats
May 09 2016 08:47
vectors.reduce(_.addi(_)).divi(vectors.size)
Paul Dubs
@treo
May 09 2016 08:48
he is basically creating a mean vector
Adam Gibson
@agibsonccc
May 09 2016 08:48
@Habitats couldn't you just do mean(0) o_0
Sadat Anwar
@SadatAnwar
May 09 2016 08:48
hello guys
Patrick Skjennum
@Habitats
May 09 2016 08:48
@agibsonccc you tell me. @treo gave me the code:p
Adam Gibson
@agibsonccc
May 09 2016 08:48
or do you have a list of vectors or something?
Patrick Skjennum
@Habitats
May 09 2016 08:48
i have a list of vectors
it's creating a new vector, with 1000d
Sadat Anwar
@SadatAnwar
May 09 2016 08:49
I just did a maven reimport and now I am getting this error
java.lang.RuntimeException: java.lang.NullPointerException
    at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:4788)
    at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:4716)
    at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:148)
    ... 31 more
Caused by: java.lang.NullPointerException
    at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:4749)
    ... 33 more
Adam Gibson
@agibsonccc
May 09 2016 08:49
I wonder if an hstack + mean(0) would work?
Paul Dubs
@treo
May 09 2016 08:49
it is literally the same thing
Adam Gibson
@agibsonccc
May 09 2016 08:49
@raver119 oh yeah what was that?
right
@treo well what I'm thinking, I wonder if it could be faster
the reduce etc happens on the jvm
that's asking for trouble
I wonder what's faster
raver119
@raver119
May 09 2016 08:50
@andreas-eberle what exactly you’re launching?
you shouldn’t have that exception in any circumstances
Paul Dubs
@treo
May 09 2016 08:50
@agibsonccc oh, didn't notice that mean is an Op now
Adam Gibson
@agibsonccc
May 09 2016 08:51
@treo it's been an op before you even showed up in this gitter channel
:P
Paul Dubs
@treo
May 09 2016 08:51
still didn't notice it :P
Adam Gibson
@agibsonccc
May 09 2016 08:51
Here are all the ops
@Habitats do a vstack + mean(0)
see if that's faster
@treo it'd also be great if you could check if ind2sub is still being called
it shouldn'e bt
Patrick Skjennum
@Habitats
May 09 2016 08:53
vstack, alright
Adam Gibson
@agibsonccc
May 09 2016 08:53
should*
Hmm
I wonder if TADs could be faster
for the pairwise ops
we could leveragem ulti threading with that
but problem is: it's 1k
so we'd be hitting the 1k thing here
I could try adding a vectors only case
Paul Dubs
@treo
May 09 2016 08:55
@agibsonccc running it with the msvc profiller is a bit involved, so it takes some time to set up
Adam Gibson
@agibsonccc
May 09 2016 08:55
it's worth it
I keep seeing pairwise come up
I want to know what it's hitting
Patrick Skjennum
@Habitats
May 09 2016 08:56
@agibsonccc same performance, pretty much exactly; both in training time and CPU util
Paul Dubs
@treo
May 09 2016 08:56
@Habitats screenshots :D
Adam Gibson
@agibsonccc
May 09 2016 08:56
mind showing the new trace?
Patrick Skjennum
@Habitats
May 09 2016 08:57
a shit i fucked up. redoing it
it a LOT slower, actually
cpu util at like 40%
raver119
@raver119
May 09 2016 09:00
??
now even me surprised
i’d like to see adam’s face now
Patrick Skjennum
@Habitats
May 09 2016 09:00
blob
Andreas Eberle
@andreas-eberle
May 09 2016 09:00
@raver119: Sorry, my computer just froce to death... I'm running the LenetMnistExample slightly adapted to accept 40x40 images.
Adam Gibson
@agibsonccc
May 09 2016 09:00
haha
hmm
raver119
@raver119
May 09 2016 09:01
hm
Adam Gibson
@agibsonccc
May 09 2016 09:01
so it's not reduce
or sorry: not pairwise
Andreas Eberle
@andreas-eberle
May 09 2016 09:01
However, I wasn't able to reproduce it with the Lenet example yet. I have a larger batch size... that's the only difference I was able to see...
Patrick Skjennum
@Habitats
May 09 2016 09:01
this is even 6 times slower than my original idea, without inplace
Adam Gibson
@agibsonccc
May 09 2016 09:01
You're on windows right?
Andreas Eberle
@andreas-eberle
May 09 2016 09:01
yes, Windows
Adam Gibson
@agibsonccc
May 09 2016 09:01
@Habitats sorry
Patrick Skjennum
@Habitats
May 09 2016 09:01
i'm still on windows
Paul Dubs
@treo
May 09 2016 09:02
blob
raver119
@raver119
May 09 2016 09:02
@andreas-eberle i’m going to have new merge today
Adam Gibson
@agibsonccc
May 09 2016 09:02
this will sound weird but does openmp work on windows
Andreas Eberle
@andreas-eberle
May 09 2016 09:02
@raver119: I'm rerunning it with your Configuration stuff...
Adam Gibson
@agibsonccc
May 09 2016 09:02
I imagine it does with msys2
Paul Dubs
@treo
May 09 2016 09:02
@agibsonccc that 10% is 100% of the execPairwiseTransform calls
Andreas Eberle
@andreas-eberle
May 09 2016 09:02
ok, tell me, when I should test it.
raver119
@raver119
May 09 2016 09:02
till then - please try to reproduce your error
but make sure you’re 100% on master for everything
dl4j, nd4j and libnd4j
i don’t want to spend a day for hunting non-existent bug :)
Adam Gibson
@agibsonccc
May 09 2016 09:03
@treo what do you make of this?
simd not working?
maybe it's not vectorizing
raver119
@raver119
May 09 2016 09:03
i just don’t have that day :(
Paul Dubs
@treo
May 09 2016 09:03
simd not working + maybe small size optimisation not worth it any more
Adam Gibson
@agibsonccc
May 09 2016 09:03
@treo this was your idea :P
0/2 today
Paul Dubs
@treo
May 09 2016 09:04
@agibsonccc in places with sync :P
Adam Gibson
@agibsonccc
May 09 2016 09:04
0/2
hmm
Patrick Skjennum
@Habitats
May 09 2016 09:04
so yeah, i guess i'll cache it and not worry about it:P
Adam Gibson
@agibsonccc
May 09 2016 09:04
yeah
I'll do a perf pass in the next few days
I need to do some automation work first
I'm now devops guy :(
@treo docker for this maybe?
re: openblas
Paul Dubs
@treo
May 09 2016 09:06
what exactly do you mean?
Adam Gibson
@agibsonccc
May 09 2016 09:06
compilation optimized for each core
or processor
I"m thinking of just setting up a docker container with stuff installed on it
Paul Dubs
@treo
May 09 2016 09:06
ah, that could work fine
Adam Gibson
@agibsonccc
May 09 2016 09:06
then it's just docker run ...
make
k
Andreas Eberle
@andreas-eberle
May 09 2016 09:06
@raver119: I'll try to reproduce it ;)
raver119
@raver119
May 09 2016 09:06
@andreas-eberle thanks!
Adam Gibson
@agibsonccc
May 09 2016 09:07
I've been using docker a lot these last few days
I think I "get" containers now
Paul Dubs
@treo
May 09 2016 09:07
I've been using it a lot at my last job :)
Introduced it there actually
Adam Gibson
@agibsonccc
May 09 2016 09:07
we're doing k8s too
Paul Dubs
@treo
May 09 2016 09:08
that's great :)
Adam Gibson
@agibsonccc
May 09 2016 09:08
still learning this
If you want easy k8s orchestration
some friends of mine
Paul Dubs
@treo
May 09 2016 09:08
I wonder how well the fabric8 guys are doing nowadays
Adam Gibson
@agibsonccc
May 09 2016 09:09
the python framework for automating server setup?
Paul Dubs
@treo
May 09 2016 09:09
no, not fabric
fabric8
fabric8 is an opinionated open source microservices platform based on Docker, Kubernetes and Jenkins
Andreas Eberle
@andreas-eberle
May 09 2016 09:09
@raver119: You might be right... I just found that my IDEA reset the maven it's using... :-(
Adam Gibson
@agibsonccc
May 09 2016 09:09
huh
Andreas Eberle
@andreas-eberle
May 09 2016 09:09
sorry for that...
raver119
@raver119
May 09 2016 09:10
juct check if that’s the roots of your issue
i’ll do merge later today, it’ll bring approx. x3 speedup for cuda
was confirmed yesterday, but still need to find mistakes in flow controller
Andreas Eberle
@andreas-eberle
May 09 2016 09:12
WOW :+1:
raver119
@raver119
May 09 2016 09:12
it’s not wow
Adam Gibson
@agibsonccc
May 09 2016 09:12
3x speed up means still slower than cpu
afaik?
raver119
@raver119
May 09 2016 09:12
cuda is still slower then cpu, just difference got smaller
Adam Gibson
@agibsonccc
May 09 2016 09:12
yeah
raver119
@raver119
May 09 2016 09:12
yes, 1.6 s on mkl, 2.2 on cuda
Paul Dubs
@treo
May 09 2016 09:13
I wonder if cuda is now faster than openblas...
raver119
@raver119
May 09 2016 09:13
should be
openblas is slow
also - that’s rnn
Sadat Anwar
@SadatAnwar
May 09 2016 09:13
what happens if I build an MSI installer for libND4J ? And where is the installer placed?
raver119
@raver119
May 09 2016 09:13
not cnn
i’ve just hadn’t checked cnn perf
rnns are harder for gpus
Alex Black
@AlexDBlack
May 09 2016 09:15
yeah, cnn perf isn't worth checking right now
I'm working on that currently
raver119
@raver119
May 09 2016 09:15
cool
@AlexDBlack have you checked pm?
i’ve wrote you yesterday regarding that test
Alex Black
@AlexDBlack
May 09 2016 09:16
right, saw that, looked good to me
raver119
@raver119
May 09 2016 09:16
cool, thanks
Adam Gibson
@agibsonccc
May 09 2016 09:20
@sadatanwer not much yet - ideally it'd set the env variable and what not for you
I'm still cleaning up the packaging there yet
it can build rpms,debs, and msis
I'm investingating osx as well
Sadat Anwar
@SadatAnwar
May 09 2016 09:20
cool!!
@agibsonccc so once you are done, ideally I would build an MSI and then just copy it and run it on all systems that need it?
Adam Gibson
@agibsonccc
May 09 2016 09:21
no you'd download an MSI
and hit run
raver119
@raver119
May 09 2016 09:21
@treo mind running syntheticRNN with openblas? :)
Sadat Anwar
@SadatAnwar
May 09 2016 09:21
Cool! Even better :D
oh and the runtime exception I got earlier, just rebuild the whole thing and its gone.. so, no worries there ..
Thanks
Paul Dubs
@treo
May 09 2016 09:22
@raver119 not really, it's a bit of a hassle to force it to use a different blas
raver119
@raver119
May 09 2016 09:22
ok
Arthur Rehm
@rehm
May 09 2016 09:57
I've tried to compile the libnd4j sources on ubuntu 14.04 but have some problems on" "Linking CXX shared library libnd4j.so" step. https://gist.github.com/rehm/420be1174d7c69cfe7fc1f6f9df41066
Adam Gibson
@agibsonccc
May 09 2016 09:58
Can you give me how you ran it?
it should just be ./buildnativeoperations.sh now
Paul Dubs
@treo
May 09 2016 09:59
Also what is the problem? The log looks like everything went well
Adam Gibson
@agibsonccc
May 09 2016 10:00
"-- A library with BLAS API not found. Please specify library location.
"
that def doesn't help
I'm guessing openblas wasn't setup
Paul Dubs
@treo
May 09 2016 10:00
-- Found OpenBLAS libraries: /usr/local/lib/libopenblas.so
that is also there
Adam Gibson
@agibsonccc
May 09 2016 10:00
ohh
cd /root/stuff/libnd4j/blasbuild/cpu/blas && /usr/bin/cmake -E cmake_link_script CMakeFiles/nd4j.dir/link.txt --verbose=1
/usr/bin/c++ -fPIC -march=native -fopenmp -Wall -g -Wall -fopenmp -std=c++11 -fassociative-math -funsafe-math-optimizations -shared -Wl,-soname,libnd4j.so -o libnd4j.so CMakeFiles/nd4j.dir/cpu/NativeBlas.cpp.o CMakeFiles/nd4j.dir/cpu/NativeOps.cpp.o /usr/local/lib/libopenblas.so -Wl,-rpath,/usr/local/lib
make[2]: Leaving directory /root/stuff/libnd4j/blasbuild/cpu' /usr/bin/cmake -E cmake_progress_report /root/stuff/libnd4j/blasbuild/cpu/CMakeFiles 1 2 [100%] Built target nd4j make[1]: Leaving directory/root/stuff/libnd4j/blasbuild/cpu'
/usr/bin/cmake -E cmake_progress_start /root
nvm
it's fine
if you see "built"it's fine
do you mean when you built nd4j?
Arthur Rehm
@rehm
May 09 2016 10:12
nope i mean libnd4j - My Tomcat Log says "no jnind4j in java.library.path" - so i am currently checking all dependencies. (OpenBLAS,libnd4j,nd4j) https://gist.github.com/rehm/e462453d17ce43675f52af418facf753
Adam Gibson
@agibsonccc
May 09 2016 10:12
oh c++ and tomcat
this is going to be fun
hmm
Paul Dubs
@treo
May 09 2016 10:27
is your tomcat running in a 32 bit jvm?
Adam Gibson
@agibsonccc
May 09 2016 10:27
oh that's interesting
??
Arthur Rehm
@rehm
May 09 2016 10:28
x86_64 x86_64 x86_64 GNU/Linux
Paul Dubs
@treo
May 09 2016 10:28
still, is your tomcat running in a 32 bit jvm?
Arthur Rehm
@rehm
May 09 2016 10:31
Server version: Apache Tomcat/8.0.33
Server built: Mar 18 2016 20:31:49 UTC
Server number: 8.0.33.0
OS Name: Linux
OS Version: 3.13.0-042stab113.11
Architecture: amd64
JVM Version: 1.8.0_91-b14
JVM Vendor: Oracle Corporation
amd64 - so 64bit!?
raver119
@raver119
May 09 2016 10:32
yes
Paul Dubs
@treo
May 09 2016 10:34
Still not quite satisfied, because as far as I remember the jvm never says amd64
can you simply run a java -version with the jvm your tomcat is using?
it isn't necessarily using the system jvm
Adam Gibson
@agibsonccc
May 09 2016 10:35
right
raver119
@raver119
May 09 2016 10:35
amd64 thats from kernel
linux kernel i mean
Adam Gibson
@agibsonccc
May 09 2016 10:36
I WOULD actually make sure that you configure tomcat with the proper LD_LIBRARY_PATH
Paul Dubs
@treo
May 09 2016 10:36
it should get the lib out of the jar
Adam Gibson
@agibsonccc
May 09 2016 10:36
also make sure your environment variables with tomcat are consistent
ahh yeah..hmm
weird
class loader issue?
that's another common one
in fact I bet it's the tomcat class loader
I wonder if @saudet has tested javacpp with tomcat
raver119
@raver119
May 09 2016 10:38
yay! i've managed to break special transforms!
Adam Gibson
@agibsonccc
May 09 2016 10:38
on cuda I hope?
:P
raver119
@raver119
May 09 2016 10:38
yea
that in-kernel special case
just found that it still uses new
now it's not using
but also not working lol
Arthur Rehm
@rehm
May 09 2016 10:40
@treo /usr/lib/jvm/java-8-oracle/jre/bin/java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
Paul Dubs
@treo
May 09 2016 10:41
Great :) And you are sure that it is also the jvm that is used by tomcat?
Adam Gibson
@agibsonccc
May 09 2016 10:41
it uses java home I believe
unless that's changed
Arthur Rehm
@rehm
May 09 2016 10:41
yhea in /etc/init/tomcat.conf i set JAVA_HOME
Paul Dubs
@treo
May 09 2016 10:42
great, then it looks like a classloader problem
you might want to put the appropriate jar in tomcats lib folder, it should be picked up from there, if I recall correctly... haven't poked tomcat in quite a while
Arthur Rehm
@rehm
May 09 2016 11:01
@agibsonccc LD_LIBRARY_PATH approach doesn't works
were i can find the "no jnind4j in java.library.path"
Samuel Audet
@saudet
May 09 2016 11:05
@rehm Never tested personally, but there's nothing special about JavaCPP. With JNI in general we have to be careful: https://wiki.apache.org/tomcat/HowTo#I.27m_encountering_classloader_problems_when_using_JNI_under_Tomcat
Samuel Audet
@saudet
May 09 2016 11:13
Like @treo says, it should work fine if you put nd4j in the shared/lib directory...
Arthur Rehm
@rehm
May 09 2016 12:00
okay i got it :D i deployed with maven from my macbook -- soo i deploy nd4j-native-0.4-rc3.9-SNAPSHOT-macosx-x86_64.jar but the linux system requires nd4j-native-0.4-rc3.9-SNAPSHOT-linux-x86_64.jar -- thank you =)
Paul Dubs
@treo
May 09 2016 12:01
This raises a valid concern though
If I create an uberjar for deployment on my dev machine and deploy it on something running a different os/platform, how will that work out?
Adam Gibson
@agibsonccc
May 09 2016 12:02
the classifier would have to be different
Paul Dubs
@treo
May 09 2016 12:03
So I would simply add the classified version to the dependencies?
Adam Gibson
@agibsonccc
May 09 2016 12:03
I mean you'd need to
that's what docker is supposed to solve :D
Paul Dubs
@treo
May 09 2016 12:04
:D
filed an issue for that
Patrick Skjennum
@Habitats
May 09 2016 13:03
@agibsonccc @treo alright, cached the doc vecs to file and it's kickin it now. back at 100%
closing the issue
neat. speed up my ffn training by like 10x today
:P
Adam Gibson
@agibsonccc
May 09 2016 13:05
Lol
Patrick Skjennum
@Habitats
May 09 2016 13:07
memoize all the things
there's so many things i have to take care of that i would normally disregard, when working with shit tons of data
Paul Dubs
@treo
May 09 2016 13:09
memoize all the things, but don't forget to reload them, as dl4j likes to mess with the inputs :P
e.g. dropout
Patrick Skjennum
@Habitats
May 09 2016 13:10
i'm only pulling from cache when creating the iterator
Paul Dubs
@treo
May 09 2016 13:10
if you aren't using dropout, it should be ok
Patrick Skjennum
@Habitats
May 09 2016 13:11
and if i used dropout, how would that affect this?
i don't see the connection
Paul Dubs
@treo
May 09 2016 13:11
dropout modifies the ndarray you use as input
Patrick Skjennum
@Habitats
May 09 2016 13:12
ugh
Paul Dubs
@treo
May 09 2016 13:12
deeplearning4j/deeplearning4j#1510
Patrick Skjennum
@Habitats
May 09 2016 13:13
right, well i'm not using dropout, but i thought of checking it out
i suppose i'd dig myself deep in if i asked why it's implemented like that
Paul Dubs
@treo
May 09 2016 13:14
probably efficiency, but you'll have to ask @AlexDBlack
Patrick Skjennum
@Habitats
May 09 2016 13:17
yeah alright
my doc vector cache is 2x the size of my entire dataset.
Alex Black
@AlexDBlack
May 09 2016 13:19
yeah, 99% of the time it doesn't matter if the input is modified by dropout (like, when you are using a RecordReaderDataSetIterator), and dups are expensive
Patrick Skjennum
@Habitats
May 09 2016 13:21
i created my own iterators
Patrick Skjennum
@Habitats
May 09 2016 13:29
but yeah, regardless of caching etc, doing addi etc seems to be super slow on my computer. any ideas @treo ?
Paul Dubs
@treo
May 09 2016 13:31
addi and a lot of other pairwise stuff is branched, if the size of the problem is small (i.e. less than 8k elements) it is done using a single core, because it was faster that way, I'll have to revisit it this evening to see if this has changed somehow due to all the recent changes
Patrick Skjennum
@Habitats
May 09 2016 13:33
hmm yeah alright
Patrick Skjennum
@Habitats
May 09 2016 13:55
@treo so keeping 1.4M 1000d INDArrays in memory was quite intractable. maybe i could store it as a matrix? but how would i index it? think it would save a lot of space?
atm i'm just using a map
could use an id -> index mapping, i assume getting a row is O(1)
Paul Dubs
@treo
May 09 2016 13:59
1.4m*100d = should be just 534mb?
Patrick Skjennum
@Habitats
May 09 2016 13:59
yeah well it's not, it's like 17gb
Paul Dubs
@treo
May 09 2016 13:59
on the disk?
because that should be about 550mb in memory
Patrick Skjennum
@Habitats
May 09 2016 14:01
i stored only the floats in a txt file and it was 17gb
i assumed wrapping those in objects would give more overhead
also it's 1000d, sorry
:P
Paul Dubs
@treo
May 09 2016 14:02
that should be around 5.5gb in ram
the floats in a txt file take a lot more space than binary floats
that's the reason why I have my own w2v format
Patrick Skjennum
@Habitats
May 09 2016 14:03
well the application ate all of my ram in like 5 seconds
i guess i could profile it
Paul Dubs
@treo
May 09 2016 14:04
I can load it (5.3gb on disk) in just 27 seconds - in the dl4j text format not only ate all my ram, it also took a long while to load
and was 27gb on disk
Patrick Skjennum
@Habitats
May 09 2016 14:04
dl4j text format?
Paul Dubs
@treo
May 09 2016 14:04
for the word vectors
Patrick Skjennum
@Habitats
May 09 2016 14:05
oh you mean storing the model in binary format
i don't know how to do that:s
also spark doesn't like that i think
Paul Dubs
@treo
May 09 2016 14:05
How are you storing it right now?
gist the code, and I'll take a look, or link it in your repository
Patrick Skjennum
@Habitats
May 09 2016 14:06
ugh goto push it
Paul Dubs
@treo
May 09 2016 14:06
you can gist it aswell :P
Patrick Skjennum
@Habitats
May 09 2016 14:06
code is not in one place
i could work it out though
Paul Dubs
@treo
May 09 2016 14:08
serializing and deserializing should be in one place
Patrick Skjennum
@Habitats
May 09 2016 14:12
ugh i can't even run the profiler to check this stuff, because my computer just hangs due to swap
Paul Dubs
@treo
May 09 2016 14:13
Then post your serialization and deserialization code
there is also an old state of my binary de/serializer up as a gist: https://gist.github.com/treo/2947b21a55c1b175ac5ed24a8673924d
I'm still not done with it, as I want it to load as fast as the SSD can read... but right now it is stuck at about 250mb/s
that is obviously for word vectors but you can see how it deals with a lot of arrays
Patrick Skjennum
@Habitats
May 09 2016 14:19
code is superfast, but yeah, resulting map is superduperhuge
Paul Dubs
@treo
May 09 2016 14:21
sc.textFile reads the whole file, right?
and than you split it on "," thus creating 1.4 mio strings effectively doubling your memory requirement
Patrick Skjennum
@Habitats
May 09 2016 14:22
it's quite complex, but it doesn't just grab the whole file no. it reads it as a stream, possibly with multiple workers at the same time
that shouldn't matter as it would be garbage collected before spark has finished
Paul Dubs
@treo
May 09 2016 14:23
I'd guess with the way you are loading it, you need at least 34 GB of free ram
Patrick Skjennum
@Habitats
May 09 2016 14:23
there's hardly any overhead with sc.textfile as long as you cut away the excess before you collect
it's only whatevers left at the collect that is stored in RAM
also, it reads everything just fine. the Map that func returns is just stupid huge
Paul Dubs
@treo
May 09 2016 14:25
didn't you say you run out of memory?
Patrick Skjennum
@Habitats
May 09 2016 14:25
yeah when i try to train my network afterwards
using this cache
i don't have enough memory to do both
Paul Dubs
@treo
May 09 2016 14:26
so take a look at how big your map is, you should be able to have a profiler run with that
Patrick Skjennum
@Habitats
May 09 2016 14:26
this cache needs to be in RAM during training in order to be useful:p
yeah that's what i'm doing atm
Paul Dubs
@treo
May 09 2016 14:26
Because it shouldn't be larger than 5.5gb
Patrick Skjennum
@Habitats
May 09 2016 14:26
starting the profiler when RAM is thrashing like crazy isn't easy though
Adam Gibson
@agibsonccc
May 09 2016 14:26
@treo fwiw sc.textFile is lazy it shouldn't be quite reading it all at once
Patrick Skjennum
@Habitats
May 09 2016 14:27
everything with spark is lazy
that's why it's awesome
Adam Gibson
@agibsonccc
May 09 2016 14:27
right
Patrick Skjennum
@Habitats
May 09 2016 14:28
you can have 20 maps and filters doing all kinds of crazy shit and spark will just combine them all in one giant optimized way:D
and it reads that 17gb file into ram and creates objects from it in about a 60s
would've taken literally days with standard io
Paul Dubs
@treo
May 09 2016 14:31
anyway something is eating up your ram, and you should find out what it is
Patrick Skjennum
@Habitats
May 09 2016 14:32
:P you don't say
Patrick Skjennum
@Habitats
May 09 2016 14:39
ah btw @treo i just remember a bug with that piece of code i showed you
it doesn't work if i put the collect after Nd4j.create, i have to put it in front, so my computer has to put all that crap i'm using to create the INDarrays into ram before creating them
because creating INDArrays on a spark worker doesn't work for some reason. it loses all of its info when it serialized and transfered to the driver
i believe this was the reason Spark never worked with dl4j for me
but i thought alex fixed it
Patrick Skjennum
@Habitats
May 09 2016 15:18
@treo only happens when using Kryo to serialize, even though i added INDArray to the Kyro config
Paul Dubs
@treo
May 09 2016 15:19
yeah, tried kryo and it didn't work at all...
Patrick Skjennum
@Habitats
May 09 2016 15:19
that's a pity, since kryo is so much faster
Paul Dubs
@treo
May 09 2016 15:20
it probably needs its own kryo serializer
Patrick Skjennum
@Habitats
May 09 2016 15:20
so i'm out of luck for a while?
Paul Dubs
@treo
May 09 2016 15:21
You could write one yourself :P
Patrick Skjennum
@Habitats
May 09 2016 15:21
don't get it though. all of my own classes are serializable out of the box
all of your own classes don't muck with off-heap memory
Patrick Skjennum
@Habitats
May 09 2016 15:43
yeah can't get it to work. idno how to register custom serializers to spark
i don't have access to the kryo instance
Patrick Skjennum
@Habitats
May 09 2016 15:45
yeah that's how i've done it with all of my classes
that says nothing about custom serializer
all it says is "go see kryo docs for custom serializers"
Patrick Skjennum
@Habitats
May 09 2016 15:57
yeah i tried that but then all of my other default classes wasn't serializable anymore
Paul Dubs
@treo
May 09 2016 15:57
sure, you have to register them with this registrator now
Patrick Skjennum
@Habitats
May 09 2016 15:57
ah shit wrong setting
@treo yeah but i get like Class is not registered: scala.Tuple2[]
i never had to add that stuff manuallyu
Paul Dubs
@treo
May 09 2016 16:01
hmm... don't know about that, as I said, I'm not using spark :D
Patrick Skjennum
@Habitats
May 09 2016 16:02
yeah, so this kind of forces me to use something else than kryo:(
Paul Dubs
@treo
May 09 2016 16:02
beg @agibsonccc to do something about it :)
Patrick Skjennum
@Habitats
May 09 2016 16:39
@treo another thing, you said indarrays do funny things with memory
i have xmx at 30, and my application is using 40g, what's up?
no wonder i'm suffering from swapping
visuamvm shows 18g in use, and resource monitor shows 60gb in use
Patrick Skjennum
@Habitats
May 09 2016 16:47
blob
Paul Dubs
@treo
May 09 2016 16:54
nd4j uses arrays that are not on the java heap
but usually it respects -Xmx
Patrick Skjennum
@Habitats
May 09 2016 16:54
how do i control this behavior?
it's definitely not doing that
Paul Dubs
@treo
May 09 2016 16:55
don't know how it behaves with spark
Patrick Skjennum
@Habitats
May 09 2016 17:00
spark shouldn't have anything to do with this
i'm not using spark for anything other than laoding the data from file
i used xmx60g earlier, and java.exe ended up allocating almost 100g before my computer went completely nuts and froze down:S
Paul Dubs
@treo
May 09 2016 17:14
Don't really know how it works, can be that it goes to 2x Xmx, @agibsonccc or @raver119 will have to take a look at it
raver119
@raver119
May 09 2016 17:14
that's @saudet
i know how it works in cuda backed
backend
-Xmx is honored there, but best results are achieved when -Xmx is fixed somewhere around 1-2gb and real limits are set using CudaEnvironment
so Xmx just provides gc activity :)
also, in cuda there's no real sense allowing to use more memory then your gpu has
Paul Dubs
@treo
May 09 2016 17:16
you can have that with the concurrent mark sweep :)
Justin Long
@crockpotveggies
May 09 2016 17:32
regarding @Habitats problem I am experiencing the same thing. Although my local is 8GB, my Spark cluster has 32GB of memory per node
and I can easily see this inflating into a problem
AkshitaT
@AkshitaT
May 09 2016 18:02
Hi! I think there is an inconsistency between TfidfVectorizer and TfidfRecordReader.
  1. I converted my corpus to tf-idf vectors, and wrote the vocabulary using WordVectorSerializer.writeVocabCache to a text file (as suggested by @raver119 ). When I tried reading the file using TfidfRecordReader, it gave me the null pointer exception, as it could not parse the records and corresponding labels.
  2. In the second approach, I converted the corpus to tf-idf vectors using vec.transform() and wrote the vectors of INDArray type to a text file using writeNumpy. When I tried parsing that file using TfidfRecordReader, it gives me NumberFormatException.
    How can I write my feature vectors to a text file which can be parsed by TfidfRecordReader?
raver119
@raver119
May 09 2016 18:03
nono
tfidf i was speaking about is dl4j one
not a canova
canova has it's own as far as i know
technically you don't have your tf-ifd in dl4j
it's just few frequencies
for each word
AkshitaT
@AkshitaT
May 09 2016 18:05
Yes, Canova has it’s own. I am confused. Which one am I supposed to use?
raver119
@raver119
May 09 2016 18:06
the one that suits your needs
any one
original dl4j tf-idf/bow were written ages ago, and got overhaul only to get rid of excessive dependencies
canova tf-idf should work fine too
AkshitaT
@AkshitaT
May 09 2016 18:07
I am using the one in dl4j. But I now need to provide those vectors as an input to another neural net. How do I do that?
raver119
@raver119
May 09 2016 18:08
there's transform() method, which converts string into tf-idf vector
AkshitaT
@AkshitaT
May 09 2016 18:09
Yes, I did that.
And wrote those tf-idf vectors to a text file using writeNumpy.
I tried parsing that file using TfidfRecordReader- and it gives me number format exception!
raver119
@raver119
May 09 2016 18:11
sry, i don't know what TfidfRecordReader expects to get
obviously, if you see nfe - it's format issue
AkshitaT
@AkshitaT
May 09 2016 18:11
Yes, this is precisely my question- what does TfidfRecordReader expect?
raver119
@raver119
May 09 2016 18:13
check it's source code?
AkshitaT
@AkshitaT
May 09 2016 18:15
I did. It asks for a file, and a split separator.
Patrick Skjennum
@Habitats
May 09 2016 18:17
@treo should i create an issue or? it's kind of a big issue.
Justin Long
@crockpotveggies
May 09 2016 19:16
@Habitats @treo do you have the stack trace? I'd like to look at the code real quick
I think this is happening during weights averaging?
I wonder if there's some runaway partitioning process, or if this is easily fixed by tuning spark paramters
raver119
@raver119
May 09 2016 19:20
yep, averaging happens there, in fit() method
Justin Long
@crockpotveggies
May 09 2016 19:21
is it possible that every time points.cache() is called that it is creating new data in memory vs. replacing the existing cache?
line 182
I almost wonder if its a garbage collection issue
Patrick Skjennum
@Habitats
May 09 2016 19:23
@crockpotveggies i'm not even using spark for training. i'm just using it to load data, then i collect it and train
Justin Long
@crockpotveggies
May 09 2016 19:24
@Habitats So you're not using any RDDs or Spark trainers?
Patrick Skjennum
@Habitats
May 09 2016 19:24
no
Justin Long
@crockpotveggies
May 09 2016 19:26
hmm that doesn't support my theory then, even though it may be true
Patrick Skjennum
@Habitats
May 09 2016 19:26
and i'm not caching rdd's or anything funny like that. dl4j literally has no communication with spark what so ever
Paul Dubs
@treo
May 09 2016 19:31
when in doubt file an issue :)
raver119
@raver119
May 09 2016 19:31
golden rule!
Patrick Skjennum
@Habitats
May 09 2016 19:41
but i'm always in doubt:(
Justin Long
@crockpotveggies
May 09 2016 19:45
tried to debug it in IntelliJ but somehow the IDE cached all the SLF4J connector JARs so it's refusing to start Spark ugh :fire:
Gradle standalone works fine though, I'll look at this later and try to provide some insight
Paul Dubs
@treo
May 09 2016 19:45
File -> Invalidate Caches & Restart
Justin Long
@crockpotveggies
May 09 2016 19:45
already did that
must be another cache somewhere, I've ran into this before
from the Spark logs: WARN [2016-05-09 19:50:08,983] org.apache.spark.scheduler.TaskSetManager: Stage 0 contains a task of very large size (184396 KB). The maximum recommended task size is 100 KB.
Justin Long
@crockpotveggies
May 09 2016 19:51
why would that task be so huge?
Patrick Skjennum
@Habitats
May 09 2016 19:52
that only happens for me if i don't read data distributed, or collects and sc.paralellize(..)
but i guess i'll leave this as an exercise for tomorrow-me. it's pizza night!
considering this is coming from ND4J inside Spark I wonder if I'm seeing the same issue as @Habitats
I'm going to run this without Spark and see if it reproduces itself. however, I'm suspect of the interaction with the RDD
Paul Dubs
@treo
May 09 2016 20:23
Now I had the time to look at the small size single core optimisation and compare it again with the multi core version... and the result is that for small sized arrays, the single core version is 10x faster than the multi core version
Justin Long
@crockpotveggies
May 09 2016 20:31
@treo for my own knowledge, what qualifies a small sized array?
Paul Dubs
@treo
May 09 2016 20:31
less than 8000 entries
Justin Long
@crockpotveggies
May 09 2016 20:32
so in practice, if the training dataset uses images that are 100x100 then performance is actually sacrificed?
Paul Dubs
@treo
May 09 2016 20:33
no
Justin Long
@crockpotveggies
May 09 2016 20:33
okay good to know :)
Paul Dubs
@treo
May 09 2016 20:33
then you have 10k entries, and you are going to be using multiple cores to work with them :)
Adam Gibson
@agibsonccc
May 09 2016 20:33
@crockpotveggies it means depending on the number of examples more cores are used
Paul Dubs
@treo
May 09 2016 20:33
but when you have less then 8k, it is usually not worth the overhead
Justin Long
@crockpotveggies
May 09 2016 20:33
ugh misread I thought you said 80,000
@agibsonccc thanks! that helps my understanding
Paul Dubs
@treo
May 09 2016 20:36
so now I guess the only way to really improve that performance is to force simd to work
Paul Dubs
@treo
May 09 2016 20:46
some cursory googling also shows that virtual method calls can be very expensive... So there may be some reorganization needed for rc3.10
raver119
@raver119
May 09 2016 20:49
i doubt it's that high, that's actually pointer to pointer to a function
in rough c we had something like that for ages
and it wasn't too expensive
it has some overhead, but it can't be ground breaking
Paul Dubs
@treo
May 09 2016 20:51
I'm not really sure how I'd have to measure it myself, but some googling brought up 10% to 50% overhead just over a simple function call
Justin Long
@crockpotveggies
May 09 2016 20:52
Screen Shot 2016-05-09 at 1.52.14 PM.png
because I know everyone here loves curves
Paul Dubs
@treo
May 09 2016 20:53
That is a pretty nice one :)
Justin Long
@crockpotveggies
May 09 2016 20:53
note that particular score is happening on a Standalone implementation. I don't have memory issues
while on the other hand, memory in Spark seems to explode, I'm trying to gather more information on that
Justin Long
@crockpotveggies
May 09 2016 21:28
hey guys pulled the latest on libnd4j and nd4j and getting this error, did the build instructions change?
Load training data...
Exception in thread "main" java.lang.UnsatisfiedLinkError: no jnind4j in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864)
    at java.lang.Runtime.loadLibrary0(Runtime.java:870)
    at java.lang.System.loadLibrary(System.java:1122)
    at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:654)
    at org.bytedeco.javacpp.Loader.load(Loader.java:492)
    at org.nd4j.nativeblas.NativeOps.<clinit>(NativeOps.java:26)
    at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.<init>(NativeOpExecutioner.java:27)
........
Caused by: java.lang.UnsatisfiedLinkError: no nd4j in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864)
    at java.lang.Runtime.loadLibrary0(Runtime.java:870)
been following exactly the same build steps as before
raver119
@raver119
May 09 2016 21:28
what was your output for libnd4j?
Adam Gibson
@agibsonccc
May 09 2016 21:29
@crockpotveggies ./buildnativeoperations.sh
raver119
@raver119
May 09 2016 21:29
have you checked if it actually built?
Adam Gibson
@agibsonccc
May 09 2016 21:29
that's all you should need to do for cpu now
Paul Dubs
@treo
May 09 2016 21:29
another one bites the dust :P
@crockpotveggies is that on spark or standalone?
Justin Long
@crockpotveggies
May 09 2016 21:29
standalone, just grabbing the output
here it is ./buildnativeoperations.sh blas cpu Release
Scanning dependencies of target nd4j
make -f blas/CMakeFiles/nd4j.dir/build.make blas/CMakeFiles/nd4j.dir/build
[ 33%] Building CXX object blas/CMakeFiles/nd4j.dir/cpu/NativeBlas.cpp.o
cd /Users/justin/Projects/libnd4j/blasbuild/cpu/blas && clang-omp++   -D__CPUBLAS__=true -Dnd4j_EXPORTS -I/Users/justin/Projects/libnd4j/include  -march=native -Wall -g -Wall -fopenmp -std=c++11 -fassociative-math -funsafe-math-optimizations -fPIC   -o CMakeFiles/nd4j.dir/cpu/NativeBlas.cpp.o -c /Users/justin/Projects/libnd4j/blas/cpu/NativeBlas.cpp
[ 66%] Building CXX object blas/CMakeFiles/nd4j.dir/cpu/NativeOps.cpp.o
cd /Users/justin/Projects/libnd4j/blasbuild/cpu/blas && clang-omp++   -D__CPUBLAS__=true -Dnd4j_EXPORTS -I/Users/justin/Projects/libnd4j/include  -march=native -Wall -g -Wall -fopenmp -std=c++11 -fassociative-math -funsafe-math-optimizations -fPIC   -o CMakeFiles/nd4j.dir/cpu/NativeOps.cpp.o -c /Users/justin/Projects/libnd4j/blas/cpu/NativeOps.cpp
[100%] Linking CXX shared library libnd4j.dylib
cd /Users/justin/Projects/libnd4j/blasbuild/cpu/blas && /usr/local/Cellar/cmake/3.4.1/bin/cmake -E cmake_link_script CMakeFiles/nd4j.dir/link.txt --verbose=1
clang-omp++   -march=native -Wall -g -Wall -fopenmp -std=c++11 -fassociative-math -funsafe-math-optimizations -dynamiclib -Wl,-headerpad_max_install_names  -o libnd4j.dylib -install_name @rpath/libnd4j.dylib CMakeFiles/nd4j.dir/cpu/NativeBlas.cpp.o CMakeFiles/nd4j.dir/cpu/NativeOps.cpp.o -framework Accelerate -framework Accelerate
[100%] Built target nd4j
/usr/local/Cellar/cmake/3.4.1/bin/cmake -E cmake_progress_start /Users/justin/Projects/libnd4j/blasbuild/cpu/CMakeFiles 0
Adam Gibson
@agibsonccc
May 09 2016 21:30
yeah that's it
Paul Dubs
@treo
May 09 2016 21:30

okay i got it :D i deployed with maven from my macbook -- soo i deploy nd4j-native-0.4-rc3.9-SNAPSHOT-macosx-x86_64.jar but the linux system requires nd4j-native-0.4-rc3.9-SNAPSHOT-linux-x86_64.jar -- thank you =)

from earlier today... anything related to that?

Adam Gibson
@agibsonccc
May 09 2016 21:31
ohh
yeah that's right
the new artifact
we split out the binaries
for different platforms
sorry about that
Justin Long
@crockpotveggies
May 09 2016 21:31
I see...
Adam Gibson
@agibsonccc
May 09 2016 21:31
we did that so we can deploy this to maven central
Paul Dubs
@treo
May 09 2016 21:31
@crockpotveggies you are on sbt, or gradle, right?
Adam Gibson
@agibsonccc
May 09 2016 21:31
and maintain our sanity :D
Justin Long
@crockpotveggies
May 09 2016 21:31
Gradle
Adam Gibson
@agibsonccc
May 09 2016 21:32
gradle is gimped and can't resolve the classifier
Paul Dubs
@treo
May 09 2016 21:32
than I think you have to add it to your dependencies manually
Adam Gibson
@agibsonccc
May 09 2016 21:32
right
Paul Dubs
@treo
May 09 2016 21:32
maven and leiningen do it automatically :D
Justin Long
@crockpotveggies
May 09 2016 21:32
I don't need to modify LIBND4J_HOME at all? it's Gradle conf?
Paul Dubs
@treo
May 09 2016 21:33
add compile 'org.nd4j:nd4j-native:0.4-rc3.9-SNAPSHOT:macosx-x86_64'
LIBND4J_HOME is only ever used at compile time of nd4j
Justin Long
@crockpotveggies
May 09 2016 21:34
okay let's see if that does the trick
am I going to have to change my conf when I package this for Spark (on ubuntu)?
Paul Dubs
@treo
May 09 2016 21:35
deeplearning4j/nd4j#917
Justin Long
@crockpotveggies
May 09 2016 21:36
@treo hehe gotcha :+1:
Paul Dubs
@treo
May 09 2016 21:36
so far I think you can simply add the linux dependencies
at runtime it only tries to load what is actually needed for the platform
Justin Long
@crockpotveggies
May 09 2016 21:38

how about this?

String jarName;
switch(System.getProperty('os.name').toLowerCase().split()[0]) {
  case 'windows':
    jarName = 'swt_win_32.jar' 
    break
  case 'linux':
    jarName = 'swt_linux_x86.jar' 
    break
  default:
    throw new Exception('Unknown OS')
}

dependencies {
  runtime fileTree(dir: 'swt', include: jarName)
}

via http://stackoverflow.com/questions/8796615/modify-gradles-runtime-dependencies-according-to-operation-system

if that works I'll create a PR and add it to the docs
Paul Dubs
@treo
May 09 2016 21:38
that depends on the system you are building on
usually you are building on your dev machine and then move it for deployment
Adam Gibson
@agibsonccc
May 09 2016 21:39
What you COULD do here is a parameter?
maybe a part of a build.properties or command line value?
or if gradle has profiles
you could do prod vs dev
Paul Dubs
@treo
May 09 2016 21:40
that would be all in your config though... But a pr for documentation can be helpful
Justin Long
@crockpotveggies
May 09 2016 21:40
ah you both are right here, let me see if I can get a command line value working
and if it does...PR
Adam Gibson
@agibsonccc
May 09 2016 21:40
great!
yeah I'm just thinking in general
it's def app specific
I was mentioning it FOR the docs
we can't do anything about that on our side
Justin Long
@crockpotveggies
May 09 2016 21:43
on the same page, it's for the docs
Justin Long
@crockpotveggies
May 09 2016 21:50
what are the platform-specific string values? such as macosx-x86_64
Justin Long
@crockpotveggies
May 09 2016 21:57
I see profiles in here but ones such as macosx don't seem to be matching up https://github.com/deeplearning4j/nd4j/blob/master/pom.xml
Screen Shot 2016-05-09 at 2.58.29 PM.png
found 'em
switch(libnd4jOS) {
  case 'windows':
    libnd4jOS = 'windows-x86_64'
    break
  case 'linux':
    libnd4jOS = 'linux-x86_64'
    break
  case 'linux-ppc64':
    libnd4jOS = 'linux-ppc64'
    break
  case 'linux-ppc64le':
    libnd4jOS = 'linux-ppc64le'
    break
  case 'macosx':
    libnd4jOS = 'macosx-x86_64'
    break
  default:
    throw new Exception('Unknown OS defined for -Plibnd4jOS parameter. ND4J will be unable to find platform-specific binaries and thus unable to run.')
}
Justin Long
@crockpotveggies
May 09 2016 22:14
where do the docs live for ND4J? Can't find them using search or under gh-pages branch
ChrisN
@chrisvnicholson
May 09 2016 22:14
ND4J has a separate gh-pages
Justin Long
@crockpotveggies
May 09 2016 22:15
ah perfect thanks!
Justin Long
@crockpotveggies
May 09 2016 22:48
here ya go! deeplearning4j/nd4j#920
Justin Long
@crockpotveggies
May 09 2016 23:26
@chrisvnicholson are there any DL4J design changes coming soon? may I whip up something for fun while I'm waiting for neural net training?