These are chat archives for deeplearning4j/deeplearning4j/earlyadopters

20th
Apr 2016
Adam Gibson
@agibsonccc
Apr 20 2016 00:02
yeah that's an old version
Could you try reinstalling nd4j?
Also make sure you get rid of the snapshot reo in your pom
Mikhail Zyatin
@Sitin
Apr 20 2016 00:03
Hmmm. I’ve just built it. May be it is something wrong with caches.
Mikhail Zyatin
@Sitin
Apr 20 2016 00:25

Caches fixed but now I have:

Exception in thread "main" java.lang.NoClassDefFoundError: org/nd4j/nativeblas/NativeOps
    at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.<init>(NativeOpExecutioner.java:27)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at java.lang.Class.newInstance(Class.java:442)
    at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:4708)
    at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:4655)
    at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:146)
    at org.canova.image.loader.ImageLoader.toINDArrayBGR(ImageLoader.java:420)
    at org.canova.image.loader.ImageLoader.asRowVector(ImageLoader.java:119)
    at org.canova.image.recordreader.BaseImageRecordReader.next(BaseImageRecordReader.java:212)
    at org.deeplearning4j.datasets.canova.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:160)
    at org.deeplearning4j.datasets.canova.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:313)
    at org.neuromatriarchy.models.RBMModel$$anon$1.next(RBMModel.scala:190)
    at org.neuromatriarchy.models.RBMModel$.main(RBMModel.scala:204)
    at org.neuromatriarchy.models.RBMModel.main(RBMModel.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.ClassNotFoundException: org.nd4j.nativeblas.NativeOps
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 22 more

I have only nd4j-native in dependencies for nd4j. Is it enough?

Adam Gibson
@agibsonccc
Apr 20 2016 00:25
Something on your end is still insanely screwed up ;/
nd4j-native is enough and nativeops is in nd4j-native-api
it should be pulling that in
Can you purge your .m2/repository/org/nd4j/?
Mikhail Zyatin
@Sitin
Apr 20 2016 00:26
And rebuild it again?
Adam Gibson
@agibsonccc
Apr 20 2016 00:26
Then rerun the install
yeah
Mikhail Zyatin
@Sitin
Apr 20 2016 00:27
will do
Mikhail Zyatin
@Sitin
Apr 20 2016 00:47

@agibsonccc, you were right, that was a strange behavior of SBT with

resolvers += "Sonatype OSS Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots"

For some reason it preferes older online versions.

Adam Gibson
@agibsonccc
Apr 20 2016 00:48
figured
ok cool
Mikhail Zyatin
@Sitin
Apr 20 2016 00:49
BTW, should I also build cannova 0.0.0.15-SNAPSHOTjust to be sure that everething will work fine?
Adam Gibson
@agibsonccc
Apr 20 2016 00:49
yes
Mikhail Zyatin
@Sitin
Apr 20 2016 00:58
thank you
Adam Gibson
@agibsonccc
Apr 20 2016 00:59
glad this is getting built now :D
Patrick Skjennum
@Habitats
Apr 20 2016 07:23
training still doesn't use more than max 60% cpu -- does this mean my configuration could be better? other minibatch size or something?
Adam Gibson
@agibsonccc
Apr 20 2016 07:23
The only thing that matters right now are numbers from jvisual vm etc
It really depends on the methods being called
raver119
@raver119
Apr 20 2016 07:24
yea, profiling helps alot to us
Patrick Skjennum
@Habitats
Apr 20 2016 07:24
i can do some profiling, np
raver119
@raver119
Apr 20 2016 07:24
so, dont hestitate to send us screenshots of hot spots
Adam Gibson
@agibsonccc
Apr 20 2016 07:24
thanks!
right
raver119
@raver119
Apr 20 2016 07:24
that goes for both cpu and cuda
Adam Gibson
@agibsonccc
Apr 20 2016 07:24
esp with say: spark vs local/single
Patrick Skjennum
@Habitats
Apr 20 2016 07:25
ya, only dealing with pure dl4j/nd4j now
Adam Gibson
@agibsonccc
Apr 20 2016 07:25
we really can't make any recommendations right now because we haven't looked at it
whatever works for you
yu just using it is great
Paul Dubs
@treo
Apr 20 2016 07:48
60% CPU utilization may be a sign that you have a hyperthreading capable CPU and your Blas lib has decided to only use as much threads as you have physical cores
Patrick Skjennum
@Habitats
Apr 20 2016 07:48
i have an i7-3820 with 4 physical and 8 logical cores
Paul Dubs
@treo
Apr 20 2016 07:49
as I guessed :D
If you are using openblas, you may have to set some environment variables to make it use all cores
Patrick Skjennum
@Habitats
Apr 20 2016 07:49
i'm using openblas yes
Paul Dubs
@treo
Apr 20 2016 07:51
then set OPENBLAS_NUM_THREADS=8 and OMP_DYNAMIC=FALSE (you can do this just for the project in intellij, by editing the project run settings)
Patrick Skjennum
@Habitats
Apr 20 2016 07:52
holy moly, i never realized i could actually set env variables in intellij
even though it's been right in front of me this whole time
Paul Dubs
@treo
Apr 20 2016 07:52
This may give you a boost of up to 15%... so don't expect any wonders there
Patrick Skjennum
@Habitats
Apr 20 2016 07:53
yeah, but that's a lot if it works. at the current rate my job is going to take 1080 hours (or 45 days...)
raver119
@raver119
Apr 20 2016 07:56
@Habitats what's your training dataset size and input features dimension?
Patrick Skjennum
@Habitats
Apr 20 2016 07:57
1.8 M entries with 4-10 w2v (1000 dimensions) each, and 18 classes, which i've devided into 18x binary classification problems
dataset is 17 gb in size, but that's including everything.
raver119
@raver119
Apr 20 2016 08:00
and what's batch size?
Patrick Skjennum
@Habitats
Apr 20 2016 08:01
been trying 50 and 100
didn't notice much of a difference
raver119
@raver119
Apr 20 2016 08:01
great
do you have few gpus by any chance?
Patrick Skjennum
@Habitats
Apr 20 2016 08:01
none from nvidia
raver119
@raver119
Apr 20 2016 08:02
ok :(
Patrick Skjennum
@Habitats
Apr 20 2016 08:02
got some kickass amd cards though:p
Adam Gibson
@agibsonccc
Apr 20 2016 08:04
@Habitats oh so you're the one writing our opencl support
cool
heh
Patrick Skjennum
@Habitats
Apr 20 2016 08:05
haha
over my dead body:D
Adam Gibson
@agibsonccc
Apr 20 2016 08:05
you're one of 5 people on the planet
raver119
@raver119
Apr 20 2016 08:05
lies. 6.
i know one more
Adam Gibson
@agibsonccc
Apr 20 2016 08:05
oh man
AMD is gaining traction now!
raver119
@raver119
Apr 20 2016 08:05
:)
Patrick Skjennum
@Habitats
Apr 20 2016 08:07
for the profiling, what attribute is more interesting? self time (CPU)?
Adam Gibson
@agibsonccc
Apr 20 2016 08:07
right
also just methods that take up cpu time
eg: the hotspots
raver119
@raver119
Apr 20 2016 08:08
throw us full trees
Adam Gibson
@agibsonccc
Apr 20 2016 08:08
right
raver119
@raver119
Apr 20 2016 08:08
and we'll do the rest
Adam Gibson
@agibsonccc
Apr 20 2016 08:08
:D
Patrick Skjennum
@Habitats
Apr 20 2016 08:08
blob
raver119
@raver119
Apr 20 2016 08:08
hm
what's that select? i've thought you're training on standalone
Patrick Skjennum
@Habitats
Apr 20 2016 08:09
you tell me
i have spark running in the background, but training isn't done through spark
raver119
@raver119
Apr 20 2016 08:09
show me your full source code for configuration
is there some listener attached
?
Adam Gibson
@agibsonccc
Apr 20 2016 08:10
@raver119 that actually looks REALLY good
Patrick Skjennum
@Habitats
Apr 20 2016 08:10
ye iteratorlistener, nothing else
Adam Gibson
@agibsonccc
Apr 20 2016 08:10
fwiw
raver119
@raver119
Apr 20 2016 08:10
yea, thats good. but select shouldn't be there
Adam Gibson
@agibsonccc
Apr 20 2016 08:10
oh that's typically jetty
I see that a lot
Patrick Skjennum
@Habitats
Apr 20 2016 08:10
still only 60% utilization, though
raver119
@raver119
Apr 20 2016 08:10
i guess thats dropwizard-related
Adam Gibson
@agibsonccc
Apr 20 2016 08:11
yeah
raver119
@raver119
Apr 20 2016 08:11
and i guess it's detached from main thread
so probably we don't care
Adam Gibson
@agibsonccc
Apr 20 2016 08:11
righgt
Patrick Skjennum
@Habitats
Apr 20 2016 08:12
my RAM utilization is at 70-80% btw. way below xmx
and all data is in mem
blob
raver119
@raver119
Apr 20 2016 08:15
+- fine
Patrick Skjennum
@Habitats
Apr 20 2016 08:15
hm?
raver119
@raver119
Apr 20 2016 08:15
you're training on single thread actually
don't forget that :)
on two threads via spark that'll be flat 100% load
Patrick Skjennum
@Habitats
Apr 20 2016 08:16
yeah it is
however, it's also 6-8 times slower:P
raver119
@raver119
Apr 20 2016 08:16
it will be faster
after native stuff is done we'll get to parameter server
that'll help.
Patrick Skjennum
@Habitats
Apr 20 2016 08:17
time frame?
raver119
@raver119
Apr 20 2016 08:17
that's up to adam
i'm just minor guy here, hummering code
Patrick Skjennum
@Habitats
Apr 20 2016 08:18
goto finish my experiments in a month~ for my thesis. goto know where i should put my efforts:p
if i should focus on just putting this on google cloud
Paul Dubs
@treo
Apr 20 2016 08:42
@Habitats try using mkl first
for me it is about 2x faster then openblas
and with 3.9 you don't have to jump through all the hoops that 3.8 required
Patrick Skjennum
@Habitats
Apr 20 2016 08:44
didn't even know there was an alternative
Paul Dubs
@treo
Apr 20 2016 08:44
The environment variables you want for it are MKL_DYNAMIC=FALSE and MKL_NUM_THREADS=8 :)
@Habitats I can't remember, are you on windows?
Patrick Skjennum
@Habitats
Apr 20 2016 08:47
indeed i am
Paul Dubs
@treo
Apr 20 2016 08:47
ok, then you may be in for another speed treat, as you are using RNNs
that may work, or it may kill kittens, but in my experience it boosts performance a lot for RNNs (currently)
Patrick Skjennum
@Habitats
Apr 20 2016 08:49
i'm listening!
Paul Dubs
@treo
Apr 20 2016 08:50
so:
  1. Install jemalloc on your msys console (pacman -S mingw64/mingw-w64-x86_64-jemalloc)
  2. Rebuild Nd4j with mvn clean install -DskipTests -Dmaven.javadoc.skip=true -pl '!:nd4j-cuda-7.5,!:nd4j-tests' -Djavacpp.compilerOptions='-DNATIVE_ALLOCATOR=je_malloc,-DNATIVE_DEALLOCATOR=je_free,-ljemalloc,-include,c:\msys64\mingw64\include\jemalloc\jemalloc.h'
This tells nd4j to use jemalloc as its allocator, that is highly experimental and may crash if you look at it sideways, but it is also worth trying :) But install mkl first, as this will give you a more immediate speedup
Patrick Skjennum
@Habitats
Apr 20 2016 08:53
i see, yeah the thing didn't even build. i suppose i've messed up my libnd4j paths:p
Paul Dubs
@treo
Apr 20 2016 08:53
If it tells you to set up your libnd4j paths, then you did
if not, I'd like to know what it tells you
Patrick Skjennum
@Habitats
Apr 20 2016 08:55
yeah, i'll just try mlk first. i spent forever getting libnd4j to work to begin with. don't want to touch it unless i have to:p
Patrick Skjennum
@Habitats
Apr 20 2016 09:05
@treo mlk is complaining about visual studio requirements
but i suppose i can ignore those
Paul Dubs
@treo
Apr 20 2016 09:05
you can :)
Patrick Skjennum
@Habitats
Apr 20 2016 09:06
so, is mlk a superset of the features of openblas?
it was like 3gb
Paul Dubs
@treo
Apr 20 2016 09:07
it has a lot of other stuff as well, the actually interesting files are like 40mb
Patrick Skjennum
@Habitats
Apr 20 2016 09:07
i never gave me the option to only install the interesting parts, hehe
so, how do i tell nd4j to use mlk? openblas worked by just putting it on the path
Paul Dubs
@treo
Apr 20 2016 09:10
you have to have it on your path as well and then you have to rebuild libnd4j
Patrick Skjennum
@Habitats
Apr 20 2016 09:11
add what to the path exactly? C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\bin ?
Paul Dubs
@treo
Apr 20 2016 09:12
restart your console, and take a look at your path (echo $PATH) it usually sets it self up
Patrick Skjennum
@Habitats
Apr 20 2016 09:14
ah
Paul Dubs
@treo
Apr 20 2016 09:15
So if it is on your path, you can just rebuild libnd4j now
It worked if you see mkl_rt.dll in the output after Linking CXX shared library libnd4j.dll
Patrick Skjennum
@Habitats
Apr 20 2016 09:18
yeah, i got that output
does that imply mlk?
oh wait
Paul Dubs
@treo
Apr 20 2016 09:18
that means that it linked against mkl_rt.dll
Patrick Skjennum
@Habitats
Apr 20 2016 09:18
misread what you wrote
Paul Dubs
@treo
Apr 20 2016 09:18
which is the mkl runtime
Patrick Skjennum
@Habitats
Apr 20 2016 09:19
i don't see mkl_rt.dll anywhere
[100%] Linking CXX shared library libnd4j.dll
cd /D/Dropbox/code/misc/libnd4j/blasbuild/cpu/blas && /C/msys64/mingw64/bin/cmake.exe -E remove -f CMakeFiles/nd4j.dir/objects.a
cd /D/Dropbox/code/misc/libnd4j/blasbuild/cpu/blas && /C/msys64/mingw64/bin/ar.exe cr CMakeFiles/nd4j.dir/objects.a "CMakeFiles/nd4j.dir/cpu/NativeBlas.cpp.obj" "CMakeFiles/nd4j.dir/cpu/NativeOps.cpp.obj"
cd /D/Dropbox/code/misc/libnd4j/blasbuild/cpu/blas && /C/msys64/mingw64/bin/g++.exe -Wall -fopenmp -std=c++11 -fassociative-math -funsafe-math-optimizations -march=native -fopenmp -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=2 -fopt-info-vec -fopt-info-vec-missed -shared -o libnd4j.dll -Wl,--out-implib,libnd4j.dll.a -Wl,--major-image-version,0,--minor-image-version,0 -Wl,--whole-archive CMakeFiles/nd4j.dir/objects.a -Wl,--no-whole-archive -lopenblas -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32
make[2]: Leaving directory '/d/Dropbox/code/misc/libnd4j/blasbuild/cpu'
[100%] Built target nd4j
make[1]: Leaving directory '/d/Dropbox/code/misc/libnd4j/blasbuild/cpu'
/C/msys64/mingw64/bin/cmake.exe -E cmake_progress_start /D/Dropbox/code/misc/libnd4j/blasbuild/cpu/CMakeFiles 0
FINISHING BUILD
Paul Dubs
@treo
Apr 20 2016 09:20
ok, show me how your path looks like
Patrick Skjennum
@Habitats
Apr 20 2016 09:22
whole thing? it's like this %INTEL_DEV_REDIST%redist\intel64_win\mpirt;%INTEL_DEV_REDIST%redist\ia32_win\mpirt;%INTEL_DEV_REDIST%redist\intel64_win\compiler;%INTEL_DEV_REDIST%redist\ia32_win\compiler;C:\ProgramData\Oracle\Java\javapath;C:\spark\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\AMD\ATI.ACE\Core-Static;C:\Program Files\OpenVPN\bin;C:\Program Files (x86)\ATI Technologies\ATI.ACE\Core-Static;C:\Program Files\apache-maven-3.3.3\bin;C:\Program Files\nodejs;C:\Program Files\Sublime Text 3;C:\Program Files (x86)\sbt\bin;C:\Program Files (x86)\scala\bin;C:\gradle-2.10\bin;C:\Program Files\Java\jdk1.8.0_60\bin;C:\cygwin64\bin;C:\msys64\mingw64\bin;C:\msys64\usr\bin;"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\bin"
Paul Dubs
@treo
Apr 20 2016 09:23
what is your output in MSYS?
when you use echo $PATH there?
Just want to make sure it resolves everything correctly there
Patrick Skjennum
@Habitats
Apr 20 2016 09:23
/d/Dropbox/code/misc/libnd4j$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/opt/bin:/c/Program Files/ConEmu:/c/Program Files/ConEmu/ConEmu:/c/ProgramData/Oracle/Java/javapath:/c/spark/bin:/c/WINDOWS/system32:/c/WINDOWS:/c/WINDOWS/System32/Wbem:/c/WINDOWS/System32/WindowsPowerShell/v1.0:/c/Program Files (x86)/AMD/ATI.ACE/Core-Static:/c/Program Files/OpenVPN/bin:/c/Program Files (x86)/ATI Technologies/ATI.ACE/Core-Static:/c/Program Files/apache-maven-3.3.3/bin:/c/Program Files/nodejs:/c/Program Files/Sublime Text 3:/c/Program Files (x86)/sbt/bin:/c/Program Files (x86)/scala/bin:/c/gradle-2.10/bin:/c/Program Files/Java/jdk1.8.0_60/bin:/c/cygwin64/bin:/c/OpenBLAS:/mingw64/bin:/usr/bin:/c/Users/mail/AppData/Local/Google/Cloud SDK/google-cloud-sdk/bin:/c/Users/mail/AppData/Roaming/npm
Paul Dubs
@treo
Apr 20 2016 09:25
have you restarted it after installing mkl?
Patrick Skjennum
@Habitats
Apr 20 2016 09:25
ya
Paul Dubs
@treo
Apr 20 2016 09:26
that's odd, because it is missing all of the mkl paths, and it doesn't look like a cleaned path
Patrick Skjennum
@Habitats
Apr 20 2016 09:26
i have this evn variable: MIC_LD_LIBRARY_PATH=%INTEL_DEV_REDIST%compiler\lib\intel64_win_mic
might be unrelated, but never seen it before:p
Paul Dubs
@treo
Apr 20 2016 09:27
add this to your path: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\redist\intel64\compiler and C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\redist\intel64\mkl
make sure that you have that on your path in msys
Patrick Skjennum
@Habitats
Apr 20 2016 09:28
yeah
made no difference
Paul Dubs
@treo
Apr 20 2016 09:31
how does your path look like now?
Patrick Skjennum
@Habitats
Apr 20 2016 09:31
/d/Dropbox/code/misc/libnd4j$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/opt/bin:/c/Program Files/ConEmu:/c/Program Files/ConEmu/ConEmu:/c/Program Files (x86)/IntelSWTools/compilers_and_libraries/windows/redist/intel64/compiler and C:/Program Files (x86)/IntelSWTools/compilers_and_libraries/windows/redist/intel64/mkl:/c/Program Files (x86)/Common Files/Intel/Shared Libraries/redist/intel64_win/mpirt:/c/Program Files (x86)/Common Files/Intel/Shared Libraries/redist/ia32_win/mpirt:/c/Program Files (x86)/Common Files/Intel/Shared Libraries/redist/intel64_win/compiler:/c/Program Files (x86)/Common Files/Intel/Shared Libraries/redist/ia32_win/compiler:/c/ProgramData/Oracle/Java/javapath:/c/spark/bin:/c/WINDOWS/system32:/c/WINDOWS:/c/WINDOWS/System32/Wbem:/c/WINDOWS/System32/WindowsPowerShell/v1.0:/c/Program Files (x86)/AMD/ATI.ACE/Core-Static:/c/Program Files/OpenVPN/bin:/c/Program Files (x86)/ATI Technologies/ATI.ACE/Core-Static:/c/Program Files/apache-maven-3.3.3/bin:/c/Program Files/nodejs:/c/Program Files/Sublime Text 3:/c/Program Files (x86)/sbt/bin:/c/Program Files (x86)/scala/bin:/c/gradle-2.10/bin:/c/Program Files/Java/jdk1.8.0_60/bin:/c/cygwin64/bin:/mingw64/bin:/usr/bin:/c/Users/mail/AppData/Local/Google/Cloud SDK/google-cloud-sdk/bin:/c/Users/mail/AppData/Roaming/npm
ah lol
i copied your "and"
Paul Dubs
@treo
Apr 20 2016 09:32
I see :D Just wanted to mock you for it :P
Patrick Skjennum
@Habitats
Apr 20 2016 09:33
hah
impossible to read a single line path
yeah there's some mkl stuff in the output now
Paul Dubs
@treo
Apr 20 2016 09:35
great :)
Patrick Skjennum
@Habitats
Apr 20 2016 09:35
[100%] Linking CXX shared library libnd4j.dll cd /D/Dropbox/code/misc/libnd4j/blasbuild/cpu/blas && /C/msys64/mingw64/bin/cmake.exe -E remove -f CMakeFiles/nd4j.dir/objects.a cd /D/Dropbox/code/misc/libnd4j/blasbuild/cpu/blas && /C/msys64/mingw64/bin/ar.exe cr CMakeFiles/nd4j.dir/objects.a "CMakeFiles/nd4j.dir/cpu/NativeBlas.cpp.obj" "CMakeFiles /nd4j.dir/cpu/NativeOps.cpp.obj" cd /D/Dropbox/code/misc/libnd4j/blasbuild/cpu/blas && /C/msys64/mingw64/bin/g++.exe -Wall -fopenmp -std=c++11 -fassociative-math -funsafe-math-optimizations -march=native -fopenmp -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=2 -fopt-info-vec -fopt-info-vec-missed -shared -o libnd4j.dll -Wl,--out-implib,libnd4j.dll.a -Wl,--maj or-image-version,0,--minor-image-version,0 -Wl,--whole-archive CMakeFiles/nd4j.dir/objects.a -Wl,--no-whole-archive "/C/Program Files (x86)/IntelSWTools/compilers_and_libra ries/windows/redist/intel64/mkl/mkl_rt.dll" "/C/Program Files (x86)/IntelSWTools/compilers_and_libraries/windows/redist/intel64/mkl/mkl_rt.dll" -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 make[2]: Leaving directory '/d/Dropbox/code/misc/libnd4j/blasbuild/cpu' [100%] Built target nd4j make[1]: Leaving directory '/d/Dropbox/code/misc/libnd4j/blasbuild/cpu' /C/msys64/mingw64/bin/cmake.exe -E cmake_progress_start /D/Dropbox/code/misc/libnd4j/blasbuild/cpu/CMakeFiles 0 FINISHING BUILD
Paul Dubs
@treo
Apr 20 2016 09:35
just as it should :)
now build nd4j
Patrick Skjennum
@Habitats
Apr 20 2016 09:36
yeah about that, i don't know how to fix my path
i mean, libnd4j env
atm it's: LIBND4J_HOME=/Dropbox/code/misc/libnd4j
Paul Dubs
@treo
Apr 20 2016 09:37
it is easy :)
Patrick Skjennum
@Habitats
Apr 20 2016 09:37
if i include d/ which is the drive, it doesn't work
and that doesn't work either
Paul Dubs
@treo
Apr 20 2016 09:37
in the libnd4j folder type: export LIBND4J_HOME=pwd
argh...
Patrick Skjennum
@Habitats
Apr 20 2016 09:37
hm
yeah that works too:P
Paul Dubs
@treo
Apr 20 2016 09:37
export LIBND4J_HOME=`pwd`
Patrick Skjennum
@Habitats
Apr 20 2016 09:38
still mvn clean install -DskipTests -Dmaven.javadoc.skip=true -pl '!org.nd4j:nd4j-cuda-7.5' right
raver119
@raver119
Apr 20 2016 09:38
@treo i need your help
can you please build libnd4j master and nd4j master
Paul Dubs
@treo
Apr 20 2016 09:39
if you want to skip cuda, yes :)
Patrick Skjennum
@Habitats
Apr 20 2016 09:39
amd GPU ;_;
raver119
@raver119
Apr 20 2016 09:39
and check few my tests in nd4-cuda package?
Paul Dubs
@treo
Apr 20 2016 09:39
@raver119 currently using it
raver119
@raver119
Apr 20 2016 09:39
please, run test class CudaScalarsTests
Paul Dubs
@treo
Apr 20 2016 09:40
using the native backend, or the cuda backend?
its in nd4j-cuda package
so only cuda backend is possible there
Paul Dubs
@treo
Apr 20 2016 09:40
right
Patrick Skjennum
@Habitats
Apr 20 2016 09:41
!!! LIBND4J_HOME must be a valid unix path! [WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireFilesExist failed wit h message: !!! You have to compile libnd4j with cpu support first! Some required files are missing: D:\Dropbox\code\misc\nd4j\nd4j-backends\nd4j-backend-impls\nd4j-native\pwd\blas\ NativeBlas.h D:\Dropbox\code\misc\nd4j\nd4j-backends\nd4j-backend-impls\nd4j-native\pwd\blasb uild\cpu\blas
so much for that export
raver119
@raver119
Apr 20 2016 09:41
but i need exactly libnd4j master for it
not my async branch
Paul Dubs
@treo
Apr 20 2016 09:41
ok
@Habitats damn msys for changing how it works every 2 seconds
Patrick Skjennum
@Habitats
Apr 20 2016 09:42
ah no, i'm retarded
i copied your first command
not the fixed one
obviously "pwd" is no real path:PP
worked now!
Paul Dubs
@treo
Apr 20 2016 09:44
then you can also try the jemalloc thing :D
Patrick Skjennum
@Habitats
Apr 20 2016 09:44
sounded dangerous
Paul Dubs
@treo
Apr 20 2016 09:45
@raver119 so the test should run with the master version of the backend, but your version of the test?
raver119
@raver119
Apr 20 2016 09:45
no, master for everything should be fine
Paul Dubs
@treo
Apr 20 2016 09:45
@Habitats just try it out, if it doesn't work, simply rebuild it without
raver119
@raver119
Apr 20 2016 09:45
i'm trying to understand if thats libnd4j broken
or my JCuda eradication went wrong somewhere
i see some tests passing, and some tests crashing
Patrick Skjennum
@Habitats
Apr 20 2016 09:46
@treo yeah just want to check if it works with mkl. doesn't seem like it did. lol
raver119
@raver119
Apr 20 2016 09:46
but if that's me, why i see passing tests then...
Paul Dubs
@treo
Apr 20 2016 09:47
@Habitats why do you thing that it didn't?
Patrick Skjennum
@Habitats
Apr 20 2016 09:47
do i need to rebuild anything else?
than nd4j
Paul Dubs
@treo
Apr 20 2016 09:48
no, just nd4j, and use nd4j-native as the backend
Patrick Skjennum
@Habitats
Apr 20 2016 09:48
yeah well, now my jvm crashes on launch

`#

A fatal error has been detected by the Java Runtime Environment:

#

EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00005044c0057351, pid=5256, tid=2772

#

JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)

Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode windows-amd64 compressed oops)

Problematic frame:

C 0x00005044c0057351

#

Failed to write core dump. Minidumps are not enabled by default on client versions of Windows

#

An error report file with more information is saved as:

D:\Dropbox\code\projects\corpus\hs_err_pid5256.log

Compiled method (c2) 3460 1231 4 org.nd4j.linalg.api.ndarray.BaseNDArray::putScalar (103 bytes)
total in heap [0x000000000296ca90,0x0000000002974fe0] = 34128
relocation [0x000000000296cbb0,0x000000000296cde8] = 568
main code [0x000000000296ce00,0x000000000296f8e0] = 10976
stub code [0x000000000296f8e0,0x000000000296f9a0] = 192
oops [0x000000000296f9a0,0x000000000296f9b8] = 24
metadata [0x000000000296f9b8,0x000000000296fad0] = 280
scopes data [0x000000000296fad0,0x0000000002974340] = 18544
scopes pcs [0x0000000002974340,0x0000000002974d10] = 2512
dependencies [0x0000000002974d10,0x0000000002974d40] = 48
handler table [0x0000000002974d40,0x0000000002974f38] = 504
nul chk table [0x0000000002974f38,0x0000000002974fe0] = 168
#

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

`

oh shit that didn't work
Ruben Fiszel
@rubenfiszel
Apr 20 2016 09:49
I'm still developing so It might be my fault but using the latest snapshot, can it be possible that I get infinite score for wrong reasons ?
raver119
@raver119
Apr 20 2016 09:49
@Habitats http://gist.github.com :)
Patrick Skjennum
@Habitats
Apr 20 2016 09:49
yeah, sorry:p
Paul Dubs
@treo
Apr 20 2016 09:49
that is odd... But breakfast is ready, so I'll take a look at it after
Ruben Fiszel
@rubenfiszel
Apr 20 2016 09:50
Usually it is because the learning rate is too high, but now even on the first iteration I get infinite score
Patrick Skjennum
@Habitats
Apr 20 2016 09:50
alright
Paul Dubs
@treo
Apr 20 2016 09:50
but that doesn't look like it is mkl related
Ruben Fiszel
@rubenfiszel
Apr 20 2016 09:50
And I get also jvm crash sometimes
Patrick Skjennum
@Habitats
Apr 20 2016 09:50
everything built fine, and i've cleaned gradle and refreshed the project
# Problematic frame:# C [cygwin1.dll+0x31297] doesn't sound promising
raver119
@raver119
Apr 20 2016 09:53
forget that
show full log
thats hs_xxxxxx.log
hs_err_pidXXXXX.log
raver119
@raver119
Apr 20 2016 09:54
RBX={method} {0x00000000511ff550} 'putScalar' '(ID)Lorg/nd4j/linalg/api/ndarray/INDArray;' in 'org/nd4j/linalg/api/ndarray/BaseNDArray'
RCX=0x00000001c0000028 is an oop
that's what matters there
Patrick Skjennum
@Habitats
Apr 20 2016 09:55
and oop?
an*
oh that was a part of the message
Paul Dubs
@treo
Apr 20 2016 10:14
@Habitats are you on current master?
Patrick Skjennum
@Habitats
Apr 20 2016 10:15
good point
now i am:P
Paul Dubs
@treo
Apr 20 2016 10:17
@raver119 on Master all tests of CudaScalarTests pass
raver119
@raver119
Apr 20 2016 10:18
yea :/
i've failed somewhere there...
Patrick Skjennum
@Habitats
Apr 20 2016 10:26
@treo pulled from master and redid everything
problem remains
Paul Dubs
@treo
Apr 20 2016 10:26
can you isolate it, so you have only a the single putScalar call in a test program? Maybe you found a bug :)
Even better: remove mkl from your path, to make sure it isn't the culprit
Patrick Skjennum
@Habitats
Apr 20 2016 10:32
ah jesus christ that doesn't work either
Ruben Fiszel
@rubenfiszel
Apr 20 2016 10:34
I feel for you @Habitats
Paul Dubs
@treo
Apr 20 2016 10:34
because you have a smashed stack, I guess you have some mismatched shapes somewhere
so, try to isolate the location
Patrick Skjennum
@Habitats
Apr 20 2016 10:34
everything was working earlier though. so it's not the code?
or what do you mean
Paul Dubs
@treo
Apr 20 2016 10:37
it working earlier doesn't necessarily mean that it worked correctly
Patrick Skjennum
@Habitats
Apr 20 2016 10:37
:(
Paul Dubs
@treo
Apr 20 2016 10:38
But it can also be just a bug in libnd4j or nd4j
Patrick Skjennum
@Habitats
Apr 20 2016 10:38
well, my final classification model made sense at least
Paul Dubs
@treo
Apr 20 2016 10:39
just to make sure, you are on 8f506d0afe67e316c6e70d55e576455a328d323b for nd4j, and have no modified files there?
Patrick Skjennum
@Habitats
Apr 20 2016 10:39
git reset --hard HEAD && git pull
master branch
anything i can check for in the sourec in intellij?
maybe my gradle screwed me again
it likes to do that
Paul Dubs
@treo
Apr 20 2016 10:41
Then you'll have to try and isolate the bug, master works fine for me (with and without mkl)
Patrick Skjennum
@Habitats
Apr 20 2016 10:43
after i removed mkl i suddenly get jvm crashes related to spark
Paul Dubs
@treo
Apr 20 2016 10:43
gist log?
never seen this before
Paul Dubs
@treo
Apr 20 2016 10:45
Didn't you say earlier that you don't use spark?
Patrick Skjennum
@Habitats
Apr 20 2016 10:45
i use spark to load the data, but not for training
so there's really no spark integration with any dl4j stuff
and my spark tests runs fine, so it's not a spark issue per se
Paul Dubs
@treo
Apr 20 2016 10:47
try to add some diagnositic prints to your data loading code to narrow in on where the problem originates
Ruben Fiszel
@rubenfiszel
Apr 20 2016 10:49
I watch the training and sometimes the evaluation of the same data gives very different results between iteration. Could it be that the training keeps multiple weight in parralel and when the score deviates too much change to another set of weight ?
Paul Dubs
@treo
Apr 20 2016 10:51
not that i'd know of, but if your data is ordered, and you have only one type of class for some batches and then the other for the next, it will behave like you described
Ruben Fiszel
@rubenfiszel
Apr 20 2016 10:51
Well no the exact same data in the batches
with different label
Patrick Skjennum
@Habitats
Apr 20 2016 10:53
@treo it crashes on net.init()
Paul Dubs
@treo
Apr 20 2016 10:53
I mean, for each label you should have about the same number of examples per batch
@Habitats great, can you share your config?
Ruben Fiszel
@rubenfiszel
Apr 20 2016 10:53
@treo It's reinforcement learning so there is no garantee of that
guaranty*
guarantee*
:D
Paul Dubs
@treo
Apr 20 2016 10:54
@atollFP then I can't help you with that
Ruben Fiszel
@rubenfiszel
Apr 20 2016 10:54
Only thing I can say is that Q-learning with deeplearning4j diverges so far
Patrick Skjennum
@Habitats
Apr 20 2016 10:55
@treo like, as soon as i enter the init() method it jvm crashes. doesn'te even get to the first line. it's a little confusing:s
Patrick Skjennum
@Habitats
Apr 20 2016 10:56
ya
you're quite the wizard:p
Paul Dubs
@treo
Apr 20 2016 10:57
I just know how not to wait for information :D
Patrick Skjennum
@Habitats
Apr 20 2016 10:58
i was going to give you the conf.json but i realize this is probably easier
for what it's worth i'm using the createBinary()
Paul Dubs
@treo
Apr 20 2016 11:00
So, create a clean java project, and see if just https://gist.github.com/treo/d1a7474fd5eec0d94a6691cd655d09b4 crashes
because it works fine for me
Patrick Skjennum
@Habitats
Apr 20 2016 11:01
yeah that crashes too
Paul Dubs
@treo
Apr 20 2016 11:02
gist log
this is without mkl btw
i've pulled and rebuilt deeplearning4j as well btw
Paul Dubs
@treo
Apr 20 2016 11:07
Ok.... Let's try something simpler: Nd4j.create(1)
Patrick Skjennum
@Habitats
Apr 20 2016 11:08
that worked
Paul Dubs
@treo
Apr 20 2016 11:08
also: cafebabecafebabe <--WTF?
Patrick Skjennum
@Habitats
Apr 20 2016 11:08
lol i have no idea
Ruben Fiszel
@rubenfiszel
Apr 20 2016 11:09
I remember it's an easter egg of java
I don't remember why
Paul Dubs
@treo
Apr 20 2016 11:10
ok, now lets try something more complex: Nd4j.rand(1000, 1000)
Patrick Skjennum
@Habitats
Apr 20 2016 11:10
boom
total apocalypse with that one
Adam Gibson
@agibsonccc
Apr 20 2016 11:11
@atollFP have you seen our gradient check stuff?
Paul Dubs
@treo
Apr 20 2016 11:11
And what do you get from Nd4j.create(100, 100)?
eh, 1000,1000 I meant
Ruben Fiszel
@rubenfiszel
Apr 20 2016 11:11
@agibsonccc no not really but I check by hand most of the time
Patrick Skjennum
@Habitats
Apr 20 2016 11:12
INDArray v = Nd4j.create(1000, 1000) works
INDArray v = Nd4j.rand(1000, 1000) does not
Paul Dubs
@treo
Apr 20 2016 11:13
great, now try Nd4j.rand(1)
Patrick Skjennum
@Habitats
Apr 20 2016 11:13
that's not a legal command it seems

`13:13:37.933 DEBUG - Number of threads used for linear algebra 1
13:13:37.938 DEBUG - Number of threads used for linear algebra 1
Exception in thread "main" java.lang.IllegalArgumentException: Length must be >= 1
at org.nd4j.linalg.api.buffer.BaseDataBuffer.<init>(BaseDataBuffer.java:538)
at org.nd4j.linalg.api.buffer.FloatBuffer.<init>(FloatBuffer.java:40)
at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createFloat(DefaultDataBufferFactory.java:227)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1157)
at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:215)
at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:107)
at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:239)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3969)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3229)
at org.nd4j.linalg.api.rng.DefaultRandom.nextDouble(DefaultRandom.java:123)
at org.nd4j.linalg.factory.BaseNDArrayFactory.rand(BaseNDArrayFactory.java:623)
at org.nd4j.linalg.factory.BaseNDArrayFactory.rand(BaseNDArrayFactory.java:637)
at org.nd4j.linalg.factory.Nd4j.rand(Nd4j.java:2346)
at org.deeplearning4j.examples.word2vec.sentiment.RNNEX.main(RNNEX.java:14)
Disconnected from the target VM, address: '127.0.0.1:57781', transport: 'socket'

Process finished with exit code 1`

ah god damnit that never works
Patrick Skjennum
@Habitats
Apr 20 2016 11:15
Nd4j.rand(1,1) worked btw
Paul Dubs
@treo
Apr 20 2016 11:15
ok, then try to increase it until it crashes again
Patrick Skjennum
@Habitats
Apr 20 2016 11:16
Nd4j.rand(10,10) works too, but Nd4j.rand(100,100) does not
Paul Dubs
@treo
Apr 20 2016 11:18
try that Nd4j.getRandom().nextGaussian(new int[]{100, 100});
Patrick Skjennum
@Habitats
Apr 20 2016 11:18
yeah, it crashes on 54, 54
53, 53 works fine:P
Nd4j.getRandom().nextGaussian(new int[]{100, 100}); crashes
Paul Dubs
@treo
Apr 20 2016 11:19
ok... we are homing in on the problem
Patrick Skjennum
@Habitats
Apr 20 2016 11:19
gaussians didn't work with 53 though:(
Paul Dubs
@treo
Apr 20 2016 11:20
when you run it again it may be works again
Patrick Skjennum
@Habitats
Apr 20 2016 11:21
the output i get isn't 100% consistent though
ah yeah, after like 20 tries, 54, 54 worked with the gaussians
what is this sorcery?!
Paul Dubs
@treo
Apr 20 2016 11:22
run this:
INDArray ret = Nd4j.create(2500);
for (int i = 0; i < 2500; i++){ 
    System.out.println(i);
    ret.putScalar(i, 42);
}
Patrick Skjennum
@Habitats
Apr 20 2016 11:22
works
Paul Dubs
@treo
Apr 20 2016 11:22
then resize it to something larger, like 3000 or 5000
Patrick Skjennum
@Habitats
Apr 20 2016 11:26
it doesn't carsh
Adam Gibson
@agibsonccc
Apr 20 2016 11:26
so basically we need an illegal shape validation at the java level etc
huh
Patrick Skjennum
@Habitats
Apr 20 2016 11:27
10000000 works just fine
Paul Dubs
@treo
Apr 20 2016 11:27
ok, so run the following
        Random random = Nd4j.getRandom();
        INDArray ret = Nd4j.create(55, 55);
        INDArray linear = ret.linearView();
        System.out.println("Length: "+linear.length());
        for (int i = 0; i < linear.length(); i++) {
            System.out.println("i: "+i);
            linear.putScalar(i, random.nextGaussian());
        }
Patrick Skjennum
@Habitats
Apr 20 2016 11:31
no issues
Paul Dubs
@treo
Apr 20 2016 11:31
even with larger sizes?
Patrick Skjennum
@Habitats
Apr 20 2016 11:32
doesn't look like it
Paul Dubs
@treo
Apr 20 2016 11:32
because that is literally what Nd4j.getRandom().nextGaussian(new int[]{55, 55}); does
Patrick Skjennum
@Habitats
Apr 20 2016 11:32
10000 works
although it takes forever
Paul Dubs
@treo
Apr 20 2016 11:32
the only difference is added prints
Patrick Skjennum
@Habitats
Apr 20 2016 11:33
ah shit, when i use my logger and not just sysouts
it breaks
lol
or no, nvm
1000, 1000 works, 10000 does not
printing the iterations works, but printing the "ret" crashes @treo
Paul Dubs
@treo
Apr 20 2016 11:36
Hm... Something seems to be thoroughly messed up. As it helped yesterday with a similar case where I was out of ideas: completely delete your libnd4j and nd4j repositories and check them out fresh, and then rebuild everything
Adam Gibson
@agibsonccc
Apr 20 2016 11:36
@treo valgrind?
Paul Dubs
@treo
Apr 20 2016 11:36
on windows?
Adam Gibson
@agibsonccc
Apr 20 2016 11:36
msys2?
not sure if it works or not:P
Might be worth updating that wiki for windows (remember you'are apart of the dl4j org now)
you'd know better than me
I just assume unix stuff is available under msys
Paul Dubs
@treo
Apr 20 2016 11:38
the problem is that it seems to work on my machine, and the jvm crashes in cygwin.dll
Adam Gibson
@agibsonccc
Apr 20 2016 11:38
huh
Paul Dubs
@treo
Apr 20 2016 11:38
so, I'm trying to single out the reason why, it does so
Adam Gibson
@agibsonccc
Apr 20 2016 11:39
/cc @saudet ?
He tends to know how to find stuff like this
Samuel Audet
@saudet
Apr 20 2016 11:39
cygwin? It's not supposed to link to anything related to cygwin
Paul Dubs
@treo
Apr 20 2016 11:39
@Habitats you could also try to update your msys while you are at it :) (see https://msys2.github.io/)
Patrick Skjennum
@Habitats
Apr 20 2016 11:40
i installed it like less than a week ago
Adam Gibson
@agibsonccc
Apr 20 2016 11:40
msys2 doesn't work for you either?
it shouldn't matter technically..
Patrick Skjennum
@Habitats
Apr 20 2016 11:40
msys2 works fine
Adam Gibson
@agibsonccc
Apr 20 2016 11:40
huh
so what crashses?
libnd4j on cygwin?
Patrick Skjennum
@Habitats
Apr 20 2016 11:40
net.init()
yeah apparently
Adam Gibson
@agibsonccc
Apr 20 2016 11:41
do you have a dump?
@saudet it's apparently the weight initialization crashing
Paul Dubs
@treo
Apr 20 2016 11:41
@Habitats also your jvm seems to be a bit old, the current one is u91
Patrick Skjennum
@Habitats
Apr 20 2016 11:41
atm i'm rebuilding everything; i pasted error log earlier
Adam Gibson
@agibsonccc
Apr 20 2016 11:41
only under cygwin though
Patrick Skjennum
@Habitats
Apr 20 2016 11:41
yeah i could update java, but that seems like a shot in the dark:P
Samuel Audet
@saudet
Apr 20 2016 11:41
cygwin isn't compatible with JNI... cafebabe, interesting, maybe that's to give the user a clue :)
Adam Gibson
@agibsonccc
Apr 20 2016 11:42
wait how does that work?
huh
what's the difference between cygwin and msys2 as far as compat goes there?
Patrick Skjennum
@Habitats
Apr 20 2016 11:42
thought cygwin was just a /more/
but yeah, i ca't build libnd4j in cygwin
only works with msys2, so idno
Paul Dubs
@treo
Apr 20 2016 11:43
@Habitats you do have cygwin on your path before msys2
Paul Dubs
@treo
Apr 20 2016 11:44
can you try to remove it from your path before building nd4j and libnd4j?
Adam Gibson
@agibsonccc
Apr 20 2016 11:44
this is a thing
interesting
Patrick Skjennum
@Habitats
Apr 20 2016 11:44
yes, msys2 is last on my path
Samuel Audet
@saudet
Apr 20 2016 11:44
cygwin is a full port of POSIX, it does whatever required to get that compatibility, whereas MinGW does the minimum required to get builds with GCC while retaining compatibility with other native frameworks like Java
Paul Dubs
@treo
Apr 20 2016 11:44
just go out there and remove everything cygwin related from your path before building (you can do that temporarily, simply by exporting a new path without it)
Patrick Skjennum
@Habitats
Apr 20 2016 11:45
clever. never thought of that
raver119
@raver119
Apr 20 2016 11:45
@all i've just merged libnd4j & nd4j working without jcuda
so if you're using cuda for profiling - please pull both masters
Paul Dubs
@treo
Apr 20 2016 11:45
ooo, nice will try it, after we have @Habitats problem fixed
Patrick Skjennum
@Habitats
Apr 20 2016 11:46
i appreciate the help guys. i'm super confused.
raver119
@raver119
Apr 20 2016 11:46
@treo that won't change too much for you, that's mostly for that powerpc issue
also, @treo i'll have another async branch up for you and me soon :)
master is still synchronized
Patrick Skjennum
@Habitats
Apr 20 2016 11:47
ya, purged local repo, removed sonar from maven to ensure it's using my local stuff, rebuilt --> same error
Ruben Fiszel
@rubenfiszel
Apr 20 2016 11:47
Quick question. If I set the learning rate to 1/n what it was before for a specific layer, I should also set the biasLearningRate to 1/n it's previous value ?
Paul Dubs
@treo
Apr 20 2016 11:48
ok, now rebuild it without cygwin on the path
Patrick Skjennum
@Habitats
Apr 20 2016 11:48
do i need to rebuild deeplearning4j?
Paul Dubs
@treo
Apr 20 2016 11:48
no
just libnd4j and nd4j
Adam Gibson
@agibsonccc
Apr 20 2016 11:51
@Habitats that's mainly for relinking the binaries
Patrick Skjennum
@Habitats
Apr 20 2016 11:53
lol not crashing now
fml
Adam Gibson
@agibsonccc
Apr 20 2016 11:53
so it was sycgwin
Paul Dubs
@treo
Apr 20 2016 11:53
right
Adam Gibson
@agibsonccc
Apr 20 2016 11:53
we should at least note that somewhere ;/
Paul Dubs
@treo
Apr 20 2016 11:53
I'll write it up in the windows trouble shooting section
Adam Gibson
@agibsonccc
Apr 20 2016 11:53
great!
Patrick Skjennum
@Habitats
Apr 20 2016 11:54
cygwin + msys2 installed simultaneously should've been an obvious red flag, i gues
maybe provide a minimal PATH that is required for msys2 to compile?
and export that to temporarily override whatever the user might have installed
Paul Dubs
@treo
Apr 20 2016 11:55
in newer versions msys doesn't even use the system path without additional trickery
Patrick Skjennum
@Habitats
Apr 20 2016 11:55
i don't understand
i'm using an old version?
Ruben Fiszel
@rubenfiszel
Apr 20 2016 11:57
@agibsonccc By construction, I think the way learningRate and biasLearningRate are set in the builder configuration is error prone
Paul Dubs
@treo
Apr 20 2016 11:57
That's what I meant when I said they change what they are doing every 2 seconds
Patrick Skjennum
@Habitats
Apr 20 2016 11:57
there's no new version since i downloaded mine:s
Ruben Fiszel
@rubenfiszel
Apr 20 2016 11:57
So by exemple if I set the learningRate in a specific layer I most likely want to set the same for the bias learningRate
Paul Dubs
@treo
Apr 20 2016 11:57
Depending on when you got it, you can have different PATH behavior (like @raver119 seems to get everytime :D)
Patrick Skjennum
@Habitats
Apr 20 2016 11:57
i'm using msys2-x86_64-20160205
Adam Gibson
@agibsonccc
Apr 20 2016 11:57
@atollFP you'll be an intern this summer
fix it then ;)
raver119
@raver119
Apr 20 2016 11:57
haha true
Ruben Fiszel
@rubenfiszel
Apr 20 2016 11:58
Yes, yes I can
Adam Gibson
@agibsonccc
Apr 20 2016 11:58
hhahaa
raver119
@raver119
Apr 20 2016 11:58
@treo two attempts - two different PATH setups at msys
Ruben Fiszel
@rubenfiszel
Apr 20 2016 11:58
I'm just arguing if you believe it makes sense to change it the way I see it
Paul Dubs
@treo
Apr 20 2016 11:58
anyway, you know the problem now, so now you can try to rebuild it with mkl :)
Adam Gibson
@agibsonccc
Apr 20 2016 11:58
if you make the change I'll probably click the merge button
Patrick Skjennum
@Habitats
Apr 20 2016 11:58
yeah, just going to make a script for this, because i feel like i'll be doing this a lot:p
Am I fired if it break the library ? :>
My score doesn't diverge anymore ! but my q-learning still doesn't learn properly :<
Adam Gibson
@agibsonccc
Apr 20 2016 12:10
merged :D
see that wasn't hard
Ruben Fiszel
@rubenfiszel
Apr 20 2016 12:11
yee
Patrick Skjennum
@Habitats
Apr 20 2016 12:15
btw @treo do i need to keep openblas on path now?
Paul Dubs
@treo
Apr 20 2016 12:15
no, If you are using mkl, you don't need it anymore
Adam Gibson
@agibsonccc
Apr 20 2016 12:15
@Habitats fwiw, openblas is compiled in to the .so file
we link against a blas impl when we compile
rather than using net lib java
(we got rid of net lib java for this version)
the PATH variable thing was mainly for net lib java
Patrick Skjennum
@Habitats
Apr 20 2016 12:16
i see
Adam Gibson
@agibsonccc
Apr 20 2016 12:16
I'd keep it on the path if you are going to switch between the 2
eg for testing etc
it doesn't hurt the other version since we compile
Patrick Skjennum
@Habitats
Apr 20 2016 12:16
well as long as it doesn't interfere with anything
right
Adam Gibson
@agibsonccc
Apr 20 2016 12:17
right
for a while I'd rather that you keep it for testing purposes
this way it's easy to revert
Patrick Skjennum
@Habitats
Apr 20 2016 12:18
holy shit using mkl my cpu utilization went up to 85-90%
not sure if trainig is actually faster but
Adam Gibson
@agibsonccc
Apr 20 2016 12:18
whoa what
damn ok
good to know
Patrick Skjennum
@Habitats
Apr 20 2016 12:19
blob
raver119
@raver119
Apr 20 2016 12:19
:)
Patrick Skjennum
@Habitats
Apr 20 2016 12:19
compare taht to the one i posted earlier
Ruben Fiszel
@rubenfiszel
Apr 20 2016 12:19
I think my ai is drunk, those are the scores
Patrick Skjennum
@Habitats
Apr 20 2016 12:20
btw; while we're at it. i have this other issue with the histogramlistener. there are dependency conflicts, and i cannot for the life of me figure out what's causing it
it's related to spark
Paul Dubs
@treo
Apr 20 2016 12:21
mkl is just WAY faster then openblas :)
especially faster than a generic openblas
Patrick Skjennum
@Habitats
Apr 20 2016 12:21
yeah god damn didn't expect such an increase
should put this on the wiki
Paul Dubs
@treo
Apr 20 2016 12:22
Nah, not before 3.9 is done :) And it only works for intel cpus
Patrick Skjennum
@Habitats
Apr 20 2016 12:22
oh
yeah of course; would've stupid if there was a universal fix, right:p
raver119
@raver119
Apr 20 2016 12:22
do we actually have any other cpus? :)
i haven't saw decent amd cpu for ages
Paul Dubs
@treo
Apr 20 2016 12:23
I'd love to know how well it works with a xeon phi card, according to some cursory googling it can make use of it out of the box (for at least some problems)
raver119
@raver119
Apr 20 2016 12:23
yes
Patrick Skjennum
@Habitats
Apr 20 2016 12:24
should i still try that malloc whatever it was?
Paul Dubs
@treo
Apr 20 2016 12:24
sure, try it :)
I don't know if you have the problem that it fixes though :D
I had the problem, that with RNN's each iteration became slower then the one before
Patrick Skjennum
@Habitats
Apr 20 2016 12:25
atm my only issue is the histogram really
there're some issues with javax
conflicting with spark
Paul Dubs
@treo
Apr 20 2016 12:27
I'm not using spark, so I can't help you with that
Patrick Skjennum
@Habitats
Apr 20 2016 12:27
i think maybe @AlexDBlack ran into something similar
Paul Dubs
@treo
Apr 20 2016 12:27
but if you notice that your training seems to be getting slower, you can try the jemalloc thing
raver119
@raver119
Apr 20 2016 12:28
@treo i've merged async branch into master
looks +- stable
Paul Dubs
@treo
Apr 20 2016 12:29
great, is this what we had tested?
raver119
@raver119
Apr 20 2016 12:29
ye
for last few days
so on both masters cuda is async now
Patrick Skjennum
@Habitats
Apr 20 2016 12:29
@treo yeah i'll have a look at it
raver119
@raver119
Apr 20 2016 12:30
and now its profiling time \o/
Paul Dubs
@treo
Apr 20 2016 12:30
yay :D try to get it faster than cpu :P
raver119
@raver119
Apr 20 2016 12:31
heh
Patrick Skjennum
@Habitats
Apr 20 2016 12:31
you're going to end up making me buy an nvidia card
Paul Dubs
@treo
Apr 20 2016 12:31
@Habitats don't :)
raver119
@raver119
Apr 20 2016 12:32
i wonder how that uber performance will scale on 4 threads using cpu :)
Paul Dubs
@treo
Apr 20 2016 12:32
I actually do wonder how well MKL would work on one of the 36 core aws instances
Patrick Skjennum
@Habitats
Apr 20 2016 12:32
@treo using jemalloc i got Caused by: java.lang.UnsatisfiedLinkError: C:\Users\mail\AppData\Local\Temp\javacpp535855947280370\jniNativeOps.dll: A dynamic link library (DLL) initialization routine failed all of a sudden
Paul Dubs
@treo
Apr 20 2016 12:33
that is ok, so you ran in one of the kitten eating possibilities
Patrick Skjennum
@Habitats
Apr 20 2016 12:33
who would've thought
Paul Dubs
@treo
Apr 20 2016 12:33
:D
If you really run into the ever slower getting iterations problem, we can take a closer look at it
Patrick Skjennum
@Habitats
Apr 20 2016 12:34
didn't even know that was an issue?
Paul Dubs
@treo
Apr 20 2016 12:34
It may be that it simply vanished with all the memory allocation changes that were going on
Patrick Skjennum
@Habitats
Apr 20 2016 12:34
does it reset every epoch?
Paul Dubs
@treo
Apr 20 2016 12:34
Oh it does, but the windows memory allocator keeps getting slower
Patrick Skjennum
@Habitats
Apr 20 2016 12:34
i see
Paul Dubs
@treo
Apr 20 2016 12:35
But that was over a week ago, and I didn't have the time to see if it still is a problem
Patrick Skjennum
@Habitats
Apr 20 2016 12:35
alright, yeah i'll stick to what i have for now then.
try to sort out the damn histogram
Alex Black
@AlexDBlack
Apr 20 2016 13:00
@Habitats dropwizard + spark can be a bit painful due to many shared dependencies of different versions
Patrick Skjennum
@Habitats
Apr 20 2016 13:01
yeah i'm realizing that
Alex Black
@AlexDBlack
Apr 20 2016 13:01
this is from a project that I've got both running in: https://gist.github.com/AlexDBlack/3c9b1c9f43696a739e825daad9d00271
Patrick Skjennum
@Habitats
Apr 20 2016 13:01
resolving it one dependency at the time
Alex Black
@AlexDBlack
Apr 20 2016 13:01
few notes there that might help
it's not pretty, but it works :)
Patrick Skjennum
@Habitats
Apr 20 2016 13:06
didn't solve my issue:\ @AlexDBlack
also you've exlucded <exclusion> <groupId>com.sun.jersey</groupId> <artifactId>jersey-json</artifactId> </exclusion> twice:p
Alex Black
@AlexDBlack
Apr 20 2016 13:08
heh, didn't notice that. as I said, not pretty :)
my issue is jersey related though
Alex Black
@AlexDBlack
Apr 20 2016 13:11
could be that you've got both 1.0 and 2.0 jersey versions on your classpath... from what I've seen they don't work well together
raver119
@raver119
Apr 20 2016 13:11
thats definitely it
com.sun.jersey.* is jersey 1.x
and dropwizard is jersey 2.x
jersey had changed package name @ 2.0
org.glassfish.jersey. <— jersey 2.0
Patrick Skjennum
@Habitats
Apr 20 2016 13:12
love it when they do that
raver119
@raver119
Apr 20 2016 13:12
com.sun.jersey.core. < jersey 1.0
Patrick Skjennum
@Habitats
Apr 20 2016 13:13
i have this for my spark dep: exclude group: 'com.sun.jersey.jersey-test-framework', module: 'jersey-test-framework-grizzly2' exclude group: 'com.sun.jersey', module: 'jersey-json' exclude group: 'com.sun.jersey', module: 'jersey-core'
shouldn't that do the trick?
raver119
@raver119
Apr 20 2016 13:14
check UiServer repo
there’s pom.xml that works with spark
Patrick Skjennum
@Habitats
Apr 20 2016 13:14
yeah @AlexDBlack just showed me a pom that works
Patrick Skjennum
@Habitats
Apr 20 2016 13:15
but i tried to replicate it with gradle but yea
no spark there
?
Patrick Skjennum
@Habitats
Apr 20 2016 13:25
alright got it to work. god damn i hate gradle.
Ruben Fiszel
@rubenfiszel
Apr 20 2016 18:09
I think my nn is learning!
That's the score at self play
I should call google to challenge their alphago :P
Melanie Warrick
@nyghtowl
Apr 20 2016 18:14
It will crush alphago
Ruben Fiszel
@rubenfiszel
Apr 20 2016 18:14
and the large chart
Patrick Skjennum
@Habitats
Apr 20 2016 18:21
training is like 20% slower with histogramlistener running. is that normal?
raver119
@raver119
Apr 20 2016 18:26
no, that was already addressed, but my commits were lost somehow
i'll write that thing once again later
raver119
@raver119
Apr 20 2016 20:05
@treo what's your time on syntheticrnn with cpu & default settings? time between score reports
Paul Dubs
@treo
Apr 20 2016 20:14
wait a moment, have to move everything over to CPU first
about 9 seconds
raver119
@raver119
Apr 20 2016 20:19
and memory use?
Paul Dubs
@treo
Apr 20 2016 20:20
at peak 6gb
raver119
@raver119
Apr 20 2016 20:20
7.5 for me, and still growing...
Paul Dubs
@treo
Apr 20 2016 20:21
on cpu?
raver119
@raver119
Apr 20 2016 20:21
yea
8.2
23092 raver119 20 0 19,008g 8,216g 30672 S 486,2 26,2 21:32.98 java
Paul Dubs
@treo
Apr 20 2016 20:22
it even works fine with -Xmx2g for me
speed is essentially the same
raver119
@raver119
Apr 20 2016 20:22
hm, let me check
Paul Dubs
@treo
Apr 20 2016 20:42
@raver119 I've pushed a version with an IterationListener that is a bit more useful in our context :)
raver119
@raver119
Apr 20 2016 20:42
sup there?
damn, i'm definitely not happy.
Paul Dubs
@treo
Apr 20 2016 20:43
At iteration 5 a single iteration takes 1781 MILLISECONDS
another jvm crash, randomly happening after training for a while
happened after ~90 mins of training
raver119
@raver119
Apr 20 2016 20:44
Screenshot from 2016-04-20 23:44:30.png
too many native calls in top
Paul Dubs
@treo
Apr 20 2016 20:46
The lstm stuff seems to choke mostly on assign calls
raver119
@raver119
Apr 20 2016 20:47
yep, many assigns and many sums
however, still few create() in top
tomorrow going be the long day...
for gemm there's something like transpose used
thats where assign comes from
reshape c -> f
Paul Dubs
@treo
Apr 20 2016 20:50
yep 75% of the time is spent on this
raver119
@raver119
Apr 20 2016 20:51
i'm heading off now
cu tomorrow :)
Paul Dubs
@treo
Apr 20 2016 20:51
bye :)
raver119
@raver119
Apr 20 2016 20:51
it's going to be really long day tomorrow...
Patrick Skjennum
@Habitats
Apr 20 2016 20:53
@treo any idea of where i should start digging?
i don't understand much of these error logs
Paul Dubs
@treo
Apr 20 2016 20:56
@Habitats https://gist.github.com/Habitats/ef4118648990949465e2c02c985123e4#file-gistfile1-txt-L83 that is where your stacktrace is, and the error at the top tells you that it tries to access memory that it may not access
Patrick Skjennum
@Habitats
Apr 20 2016 20:57
that doesn't sound good
could it be my computer went out of memory?
there are no notable events in the event log
Paul Dubs
@treo
Apr 20 2016 20:58
most probably
Patrick Skjennum
@Habitats
Apr 20 2016 20:59
so is it true that the native lib doesn't care about xmx?
i cannot think of any other way to prevent this
Paul Dubs
@treo
Apr 20 2016 21:00
really don't know, for the test I just had running it seems to have cared
you can try to create some large arrays and see if it runs out of memory
Patrick Skjennum
@Habitats
Apr 20 2016 21:04
yeah
btw; i might be a bit dense in the evening, but i don't really undertstand all of the graphs in the UI. i've been reading this: http://deeplearning4j.org/visualization
but what exactly are the params?
Paul Dubs
@treo
Apr 20 2016 21:12
thats something that @AlexDBlack can probably explain
Ruben Fiszel
@rubenfiszel
Apr 20 2016 21:27
Someone mentionned mkl before, is it hard to install ?
I installed intel-mkl and I already build everything from source. I'm on archlinux
Patrick Skjennum
@Habitats
Apr 20 2016 21:28
not hard to install unless you're using cygwin
:P
Paul Dubs
@treo
Apr 20 2016 21:28
If you are using 3.9, it is fairly easy
Ruben Fiszel
@rubenfiszel
Apr 20 2016 21:28
I am
Paul Dubs
@treo
Apr 20 2016 21:28
it should work the same on windows as on linux...
can you post a gist of your output when building libnd4j?
Ruben Fiszel
@rubenfiszel
Apr 20 2016 21:29
Sure when my package manager is done with intel-mkl
4Go at 40 kbs ...
Paul Dubs
@treo
Apr 20 2016 21:29
that may take a while
I maybe in bed by then :)
Ruben Fiszel
@rubenfiszel
Apr 20 2016 21:30
328kbs!
Else tomorrow
Patrick Skjennum
@Habitats
Apr 20 2016 21:30
you need to sign up and get a serial etc though
Ruben Fiszel
@rubenfiszel
Apr 20 2016 21:31
oh really ?
Paul Dubs
@treo
Apr 20 2016 21:31
at least when you are getting it from https://software.intel.com/sites/campaigns/nest/
anyway, whe building libnd4j, you just have to make sure that you have the mkl libraries on your path, and you can see if it actually found them in the linking step of the libnd4j compilation
Romeo Kienzler
@romeokienzler
Apr 20 2016 21:41
@eraly just suggested to join this channel, I'd like to volunteer as a tester for the cuda support in ND4J....
....or in case you want me to test anything else please let me know...
Adam Gibson
@agibsonccc
Apr 20 2016 22:19
That's the c++
You'll need to install from source atm:
Sadat Anwar
@SadatAnwar
Apr 20 2016 23:13
for all those who were following #79 my issue about the OS X 11 build, it just passed
Adam Gibson
@agibsonccc
Apr 20 2016 23:14
cool
Sadat Anwar
@SadatAnwar
Apr 20 2016 23:14
and I figured out what was wrong! :D the bad exports!
Yes! I am so happy! :D
Adam Gibson
@agibsonccc
Apr 20 2016 23:14
so osx has been beat to death now
perfect
@sadatanwer could you create something like the windows.md?
I'd love an end to end guide for osx setup
Sadat Anwar
@SadatAnwar
Apr 20 2016 23:15
I was thinking of it
Adam Gibson
@agibsonccc
Apr 20 2016 23:15
It doesn't have to be perfect
A rough "these are things you would run into" would be enough
Sadat Anwar
@SadatAnwar
Apr 20 2016 23:15
I might not get he format right
Adam Gibson
@agibsonccc
Apr 20 2016 23:15
Nah
I don't care
seriously
Melanie Warrick
@nyghtowl
Apr 20 2016 23:15
Good to hear that its working now
Sadat Anwar
@SadatAnwar
Apr 20 2016 23:15
but ill do something for a start
Adam Gibson
@agibsonccc
Apr 20 2016 23:15
Yeah a start is huge
Sadat Anwar
@SadatAnwar
Apr 20 2016 23:20
on it!
Its so relaxing to know this is finally working!
Adam Gibson
@agibsonccc
Apr 20 2016 23:20
:D
Melanie Warrick
@nyghtowl
Apr 20 2016 23:43
@Habitats params are the weights and bias used to fit the model to the problem you are solving
so its listing the params per layer based on the number and it splits out w (weight) vs b (bias)