These are chat archives for deeplearning4j/deeplearning4j/earlyadopters

12th
May 2016
Paul Dubs
@treo
May 12 2016 08:22
a single ndarray doesn't change it's rank once it was created, right?
Alex Black
@AlexDBlack
May 12 2016 08:24
nope, can't think of any situation for which that occurs
Paul Dubs
@treo
May 12 2016 08:25
I've replaced all rank calculation calls in basendarray with access to the rank variable that is set on init
just wanted to make sure that I wasn't missing something obvious
Alex Black
@AlexDBlack
May 12 2016 08:25
sounds good to me
more generally, shape info shouldn't change (only exception that I can think of is permutei)
Paul Dubs
@treo
May 12 2016 08:27
so most of the stuff based on that should be cachable (and recalculated only on a permutei call?)
looks like it doesn't help in my case, as all my calls to rank go through Shape.getOffset
raver119
@raver119
May 12 2016 08:40
@treo shapeBuffers are immutable now
however, please note that one of latest changes for that immutability guarantee is still in my branch, and not merged yet
permutei is that case that's not merged
in all cases where shape might be changed (like permute) there's always new shapeInfo "created"
so, technically it's still possible to change them, but it'll cause really undesired consequences, so later put calls will throw warning on IntBuffers probably. or something like that
raver119
@raver119
May 12 2016 08:49
alo
also
shape.getOffset makes no real sense
by later agreements all pointers are 0 offset
so, any op is guaranteed to get pointer with 0 offset
so offset field in shapebuffer might be evern -19050341234, nobody should care about that
because offset is always 0
offset value is initialized during array/buffer creation
ME
@enache2004
May 12 2016 09:05
mvn.png
raver119
@raver119
May 12 2016 09:06
you don't have to run mvn from mingw
and you also need to install maven
ME
@enache2004
May 12 2016 09:06
good
I have 3.3.9
Alex Black
@AlexDBlack
May 12 2016 09:06
it's not on your path, at least not in mingw
ME
@enache2004
May 12 2016 09:06
I remember that Paul Dubs said that I need to run any command from msys2
Paul Dubs
@treo
May 12 2016 09:06
I told him to do everything in the same shell
raver119
@raver119
May 12 2016 09:07
well, i'm running mvn from ide terminal, and i'm fine
Paul Dubs
@treo
May 12 2016 09:07
you also have everything setup correctly :P
ME
@enache2004
May 12 2016 09:07
than I will run from cmd
Alex Black
@AlexDBlack
May 12 2016 09:08
one option is just provide the direct path... /path/to/mvn clean install etc
ME
@enache2004
May 12 2016 09:08
the nd4j contains the openblas automatically ?
raver119
@raver119
May 12 2016 09:08
no
you've installed it with msys
ME
@enache2004
May 12 2016 09:09
in the configuration step probably
raver119
@raver119
May 12 2016 09:10
yes, during installation
@treo that's bad
what you did
really bad
rollback shape() -> shape change
and just change method shape()
in this way you'll still have control over cache validity
and still have cache
in 1 place
instead of 100500 different places
Paul Dubs
@treo
May 12 2016 09:13
sounds reasonable :)
raver119
@raver119
May 12 2016 09:13
so if something goes wrong - we'll have 1 failure point instead of 10
Paul Dubs
@treo
May 12 2016 09:15
I'll do the same with rank and stride then
ME
@enache2004
May 12 2016 09:16
sorry but I can't figure out how to fix that fast

D:\RESEARCH\nd4j>mvn -version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T18:41:4
7+02:00)
Maven home: D:\SOFTWARE\Apache\apache-maven-3.3.9-bin\apache-maven-3.3.9\bin..
Java version: 1.8.0_91, vendor: Oracle Corporation
Java home: D:\SOFTWARE\Oracle\Java\jdk1.8.0_91\jre
Default locale: en_US, platform encoding: Cp1252
OS name: "windows 7", version: "6.1", arch: "amd64", family: "dos"

D:\RESEARCH\nd4j>mvn clean install -DskipTests -Dmaven.javadoc.skip=true -pl '!o
rg.nd4j:nd4j-cuda-7.5,!org.nd4j:nd4j-tests'
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for
org.nd4j:nd4j-perf:jar:0.4-rc3.9-SNAPSHOT
[WARNING] The expression ${version} is deprecated. Please use ${project.version}
instead.
[WARNING]
[WARNING] Some problems were encountered while building the effective model for
org.nd4j:nd4j-cuda-7.5:jar:0.4-rc3.9-SNAPSHOT
[WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be
unique: junit:junit:jar -> version ${junit.version} vs (?) @ org.nd4j:nd4j-cuda
-7.5:[unknown-version], D:\RESEARCH\nd4j\nd4j-backends\nd4j-backend-impls\nd4j-c
uda-7.5\pom.xml, line 188, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten t
he stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support buildin
g such malformed projects.
[WARNING]
[ERROR] [ERROR] Could not find the selected project in the reactor: '!org.nd4j:n
d4j-cuda-7.5 @
[ERROR] Could not find the selected project in the reactor: '!org.nd4j:nd4j-cuda
-7.5 -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MavenExecution
Exception

D:\RESEARCH\nd4j>

if the mvn is the latest version..i need to modify the nd4j pom file ?
Paul Dubs
@treo
May 12 2016 09:19
that is pretty weird, it works fine with that call on my machine
raver119
@raver119
May 12 2016 09:19
@treo right, keep getters, and you'll be fine
just do magic inside them :)
Paul Dubs
@treo
May 12 2016 09:19
or no magic in that case :D
Alex Black
@AlexDBlack
May 12 2016 09:21
@enache2004 and yeah, I too can run that command on my windows machines without issue...
Paul Dubs
@treo
May 12 2016 09:22
@enache2004 delete your nd4j folder, and start with a fresh clone
ME
@enache2004
May 12 2016 09:23
i did that twice
give a good url for git
ME
@enache2004
May 12 2016 09:26
D:\RESEARCH>git clone https://github.com/deeplearning4j/nd4j.git
Cloning into 'nd4j'...
remote: Counting objects: 81629, done.
remote: Compressing objects: 100% (217/217), done.
remote: Total 81629 (delta 81), reused 0 (delta 0), pack-reused 81328
Receiving objects: 100% (81629/81629), 206.17 MiB | 3.65 MiB/s, done.
Resolving deltas: 100% (40138/40138), done.
Checking connectivity... done.
Paul Dubs
@treo
May 12 2016 09:27
@raver119 will have to start this fresh, looks like I've even introduced some subtle bugs...
raver119
@raver119
May 12 2016 09:27
100%
it was the same for me with shapeInfo immutability
ME
@enache2004
May 12 2016 09:27
is not working..
raver119
@raver119
May 12 2016 09:27
so, don't forget to run all tests :)
ME
@enache2004
May 12 2016 09:28
I will try with IntelliJ to compile it
Paul Dubs
@treo
May 12 2016 09:28
@enache2004 don't
ME
@enache2004
May 12 2016 09:28
:(
raver119
@raver119
May 12 2016 09:28
intelliJ won't do that
you need maven compilation
ME
@enache2004
May 12 2016 09:28
but have support
netbeans also can do
Paul Dubs
@treo
May 12 2016 09:29
try to compile it without the -pl first, it should fail on cuda and tests
ME
@enache2004
May 12 2016 09:31
too many lines but I kept the relevant part
mvn clean install -DskipTests -Dmaven.javadoc.skip=true

[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireProperty failed with
message:
!!! LIBND4J_HOME must be a valid unix path!
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] nd4j ............................................... SUCCESS [ 10.431 s]
[INFO] nd4j-common ........................................ SUCCESS [ 6.445 s]
[INFO] nd4j-context ....................................... SUCCESS [ 0.285 s]
[INFO] nd4j-buffer ........................................ SUCCESS [ 4.234 s]
[INFO] nd4j-backends ...................................... SUCCESS [ 0.027 s]
[INFO] nd4j-api-parent .................................... SUCCESS [ 0.018 s]
[INFO] nd4j-api ........................................... SUCCESS [ 7.607 s]
[INFO] nd4j-jdbc .......................................... SUCCESS [ 0.050 s]
[INFO] nd4j-jdbc-api ...................................... SUCCESS [ 1.390 s]
[INFO] nd4j-jdbc-mysql .................................... SUCCESS [ 0.779 s]
[INFO] nd4j-native-api .................................... SUCCESS [ 0.394 s]
[INFO] nd4j-backend-impls ................................. SUCCESS [ 0.218 s]
[INFO] nd4j-native ........................................ FAILURE [ 1.498 s]
[INFO] nd4j-instrumentation ............................... SKIPPED
[INFO] nd4j-perf .......................................... SKIPPED
[INFO] nd4j-serde ......................................... SKIPPED
[INFO] nd4j-jackson ....................................... SKIPPED
[INFO] nd4j-bytebuddy ..................................... SKIPPED
[INFO] nd4j-cuda-7.5 ...................................... SKIPPED
[INFO] nd4j-tests ......................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 33.694 s
[INFO] Finished at: 2016-05-12T12:30:12+03:00
[INFO] Final Memory: 43M/581M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.
4.1:enforce (libnd4j-checks) on project nd4j-native: Some Enforcer rules have fa
iled. Look above for specific messages explaining why the rule failed. -> [Help
1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE
xception
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command

[ERROR] mvn <goals> -rf :nd4j-native

raver119
@raver119
May 12 2016 09:32
@enache2004 please, use http://gist.github.com
also, you hadn't copied the most relevant part
Paul Dubs
@treo
May 12 2016 09:33
he did
raver119
@raver119
May 12 2016 09:33
the most relevant part should be above
Paul Dubs
@treo
May 12 2016 09:33
!!! LIBND4J_HOME must be a valid unix path!
ME
@enache2004
May 12 2016 09:33
I saw that..is strange
raver119
@raver119
May 12 2016 09:33
oh
ME
@enache2004
May 12 2016 09:33
I
raver119
@raver119
May 12 2016 09:33
set path like that
ME
@enache2004
May 12 2016 09:33
I'm using Windows 7
raver119
@raver119
May 12 2016 09:33
/c/User/raver/bla-bla
not c:\
that's mentioned in windows.md
ME
@enache2004
May 12 2016 09:34
hmm
Paul Dubs
@treo
May 12 2016 09:34
that is why I wanted him to do everything in msys, because he can just export the correct path there
ME
@enache2004
May 12 2016 09:35
I've tried to export it from msys2
Paul Dubs
@treo
May 12 2016 09:35
that stays confined in the session
ME
@enache2004
May 12 2016 09:35
but nothing appears in the environment variables
raver119
@raver119
May 12 2016 09:35
yes, but maven will require the same path anyway, even from msys
and in the same unix format
Paul Dubs
@treo
May 12 2016 09:36
right, but export LIBND4J_HOME=`pwd` works without knowing anything about the format
anyway
@enache2004 just set LIBND4J_HOME to /d/RESEARCH/libnd4j
ME
@enache2004
May 12 2016 09:40
I also have a variable PATH = %JAVA_HOME%\bin;%MAVEN_HOME%\bin;%MSYS_HOME%\bin;%LIBND4J_HOME%
in my current session..and usually path from system is also modified to %LIBND4J_HOME%
Paul Dubs
@treo
May 12 2016 09:41
you don't need it on your Path, you just need the variable set correctly, so when you build nd4j it can find libnd4j
ME
@enache2004
May 12 2016 09:56
ok
I've tried multiple ways
so the one that partially works is D:/RESEARCH/etc.
Paul Dubs
@treo
May 12 2016 09:58
partially works, means what?
the cuda issue..but this is not important for now
Paul Dubs
@treo
May 12 2016 09:59
you could try with the -pl option again, and see if it is any better, but it should have installed the other things nevertheless
raver119
@raver119
May 12 2016 09:59
you still should skip cuda
like @treo said before
ME
@enache2004
May 12 2016 10:04
nop..it doesn't want
Paul Dubs
@treo
May 12 2016 10:06
very weird, but for now inconsequential - clone the dl4j examples, and replace the version for 3.9-SNAPSHOT and replace all occurrences of nd4j-x86 with nd4j-native in the pom.xml
ME
@enache2004
May 12 2016 10:24
so by doing that all the previous steps to compile nd4j, etc. are useless right ?
Paul Dubs
@treo
May 12 2016 10:25
nope
you have built the snapshot version on your machine
it will use the one you have just built instead of getting a (broken) one from sonatype
ME
@enache2004
May 12 2016 10:27
if I modify the examples pom file to support 3.9 nd4j native than when the project is compiled it seems that all these library are downloaded
I don't see where is the connection between what I've done earlier and the dl4j examples downloaded
I didn't place any file with new nd4j (like *.dll inside the dl4j examples folder )
Paul Dubs
@treo
May 12 2016 10:29
that is because you don't understand maven
ME
@enache2004
May 12 2016 10:30
maybe..I didn't used too much
Paul Dubs
@treo
May 12 2016 10:30
when you run mvn install it puts the results in your local repository, and when it looks up something, it first looks in the local repository and if it finds it there it doesn't download it
ME
@enache2004
May 12 2016 10:31
well..I think I know this part
Paul Dubs
@treo
May 12 2016 10:32
so... why don't you see the connection there?
libnd4j provides the dlls needed by nd4j, which packages them in its own jar files, and maven installs that to your local repository. dl4j-examples then gets them from there
ME
@enache2004
May 12 2016 10:36
ok..I feel much better with this explanation
so I've seen that is located somewhere in users/my_user/.m2 by default
Paul Dubs
@treo
May 12 2016 10:37
yes
ME
@enache2004
May 12 2016 10:41
can you take a look at this ?
Paul Dubs
@treo
May 12 2016 10:42
Now that's a first
ME
@enache2004
May 12 2016 10:42
oh..I should run it as admin
Paul Dubs
@treo
May 12 2016 10:43
you shouldn't
ME
@enache2004
May 12 2016 10:43
it begins to score the iteration
Paul Dubs
@treo
May 12 2016 10:43
but for some reason your tmp dir is admin writable only
make it user writable and it should work
ME
@enache2004
May 12 2016 10:46
I've seen that when you train a network, especially for the supervised case you need to have the all the set..for example detecting the people faces..you need to have their profiles from many angles
If you want to add a new person you need to repeat the entire training process which is time consuming if you already have a big collection of data
is there any way to avoid that for new data set that you want to be recognized ?
Paul Dubs
@treo
May 12 2016 10:54
it depends on how you approach the problem, if you model it as classification with one-hot output, you will probably have to retrain everytime you add someone new
as this adds a new class each time
ME
@enache2004
May 12 2016 10:55
yeah..and other way :D ?
Paul Dubs
@treo
May 12 2016 10:56
now, if you train an autoencoder that produces some kind of eigenface representation and do the detection based on a similarity metric, it may work pretty well without retraining
ME
@enache2004
May 12 2016 11:00
but that representation is somehow linked to a single face ...
training the autoencoder with multiple faces will extract some common values that describe a face
Paul Dubs
@treo
May 12 2016 11:01
right
ME
@enache2004
May 12 2016 11:02
but is like computing a hash code for a face but not really
there is a reason why I called it an eigenface representation
ME
@enache2004
May 12 2016 11:02
based on that you will search in database for those values that match with the one computed
Paul Dubs
@treo
May 12 2016 11:03
or are at least near it
ME
@enache2004
May 12 2016 11:21
MNIST with 2 layers has been completed in 12:18 minutes..
it seems a good time
ME
@enache2004
May 12 2016 11:26
in case I use Hadoop/Spark the time will decrease significantly
Alex Black
@AlexDBlack
May 12 2016 11:27
@enache2004 still some bottlenecks in CNNs that I'm working on now
soon they will be a lot faster
ME
@enache2004
May 12 2016 11:28
ok
raver119
@raver119
May 12 2016 12:54
@treo i've just finished arch changes within cuda all tests are passing, now will start caching stuff. so i hope we'll have testable version later today. so i hope for nice boost here
Paul Dubs
@treo
May 12 2016 12:54
great :)
raver119
@raver119
May 12 2016 12:55
we'll get rid of many minor allocations now
really many
Paul Dubs
@treo
May 12 2016 12:55
less allocations -> less fragmentation :) (also less allocations :D)
raver119
@raver119
May 12 2016 12:55
all extraArgs, dimensions, TADs
everything will be pushed away
less allocations = less operations. that's the main goal
Paul Dubs
@treo
May 12 2016 12:56
I hope you will take this improvements the cpu side later on?
raver119
@raver119
May 12 2016 12:56
each allocation for, say, dimension is: memory allocation (even cached), then memcpy, then destruction some time later
the same for TAD or for anything
it's cheaper to hold their references in constant memory
yep, sure
almost everything is designed as interfaces here
Paul Dubs
@treo
May 12 2016 12:57
great :)
raver119
@raver119
May 12 2016 12:57
so when i'm done with cuda, i'll just add proper implementations for cpu
good thing that cuda constant memory is equal on all devices lol
64kb everywhere
Paul Dubs
@treo
May 12 2016 13:00
until nvidia changes it :P
raver119
@raver119
May 12 2016 13:00
nah
it won't be changed anytime soon
just no need for it
cache working set might change
but 64kb constant memory is so far beyond most needs
that there's no sense improving it
to make it simplier
syntheticRNN example
it uses 55 different shapeInfos
average of 40 bytes each
2.1kb in other word
s
Paul Dubs
@treo
May 12 2016 13:02
right
raver119
@raver119
May 12 2016 13:02
even if you throw tads there
it will be +3kb approx
5.1kb in total
for serious linear algebra impl
5.1kb out of 64
Paul Dubs
@treo
May 12 2016 13:03
Ok, I get it :)
raver119
@raver119
May 12 2016 13:03
:)
for cnns it will be a bit larger though
matrix ranks are higher there
but even there it just can't get above 10kb
Paul Dubs
@treo
May 12 2016 13:04
so given that, as long as the industry as a whole doesn't start wasting that memory, nvidia will probably keep it as is
raver119
@raver119
May 12 2016 13:05
that's too specific memory
industry just don't have too much constant variables :)
we have them only due to shapeInfo architecture solution
i.e.: immutable array, describing other array
and it maps pretty nice into cuda design
you know, those half-warp broadcasts and caches
Patrick Skjennum
@Habitats
May 12 2016 14:50
@crockpotveggies so you got dl4j running with spark then i assume
i'm trying to set it up on google cloud now
Justin Long
@crockpotveggies
May 12 2016 14:53
yep Spark is up and running @Habitats
Patrick Skjennum
@Habitats
May 12 2016 14:53
how's performance?
Justin Long
@crockpotveggies
May 12 2016 14:53
I made one mistake which is you need to open chatter ports
Patrick Skjennum
@Habitats
May 12 2016 14:53
all ports are open here afaik
Justin Long
@crockpotveggies
May 12 2016 14:53
so I'm going to update the README in a couple hours
Patrick Skjennum
@Habitats
May 12 2016 14:54
my gcloud is really insecure #yolo
Justin Long
@crockpotveggies
May 12 2016 14:54
okay you can check if it's actually chatting by running SparkPi and checking the application logs
Patrick Skjennum
@Habitats
May 12 2016 14:54
what you mean by chatting anyway
messaging between workers?
Justin Long
@crockpotveggies
May 12 2016 14:54
as long as you don't see this: ipc.Client: Retrying connect to server you are okay
yea the ResourceManager sends wakeups to slaves to send them tasks
Patrick Skjennum
@Habitats
May 12 2016 14:55
yeah
all that is already configured on gcloud
it comes with spark pre-configured
just dial in the number of workers and you're good to go
Justin Long
@crockpotveggies
May 12 2016 14:56
oh I see, thought you downloaded my Docker image
Patrick Skjennum
@Habitats
May 12 2016 14:56
hard part is getting dl4j to not cause havoc in the meantime
ah no i didn't no
Justin Long
@crockpotveggies
May 12 2016 14:56
what do you mean by cause havoc?
OpenBLAS installed on all nodes, etc.?
Patrick Skjennum
@Habitats
May 12 2016 14:57
well we're early adopters after all
no i haven't gotten that far yet
so i should go for openblas? mkl doesn't work on unix?
Justin Long
@crockpotveggies
May 12 2016 14:58
honestly I don't know which one will eek out better performance
Adam Gibson
@agibsonccc
May 12 2016 14:58
mkl
Patrick Skjennum
@Habitats
May 12 2016 14:58
mkl is waaaaaaaaay faster
Justin Long
@crockpotveggies
May 12 2016 14:58
ah I see
Patrick Skjennum
@Habitats
May 12 2016 14:58
like 2x
Justin Long
@crockpotveggies
May 12 2016 14:58
I'll have to update my image ;)
Paul Dubs
@treo
May 12 2016 14:58
only on intel processors though
Patrick Skjennum
@Habitats
May 12 2016 14:58
well yeah
but who doesn't have intel cpu's
Paul Dubs
@treo
May 12 2016 14:58
google maybe? :D
Patrick Skjennum
@Habitats
May 12 2016 14:58
xeon afaik
Adam Gibson
@agibsonccc
May 12 2016 14:59
google's starting to do power
Justin Long
@crockpotveggies
May 12 2016 14:59
all of my servers have Xeon's
Paul Dubs
@treo
May 12 2016 14:59
then totally go for mkl
Patrick Skjennum
@Habitats
May 12 2016 14:59
try mkl then:3
Justin Long
@crockpotveggies
May 12 2016 14:59
is there an apt-get package for it?
Paul Dubs
@treo
May 12 2016 14:59
nope
Patrick Skjennum
@Habitats
May 12 2016 14:59
it's kind of a pain in the ass to install it
Adam Gibson
@agibsonccc
May 12 2016 14:59
mkl isn't automatable
that's the only problem
Adam Gibson
@agibsonccc
May 12 2016 14:59
(I did in our enterprise version)
Patrick Skjennum
@Habitats
May 12 2016 14:59
but it's worth it:P
Adam Gibson
@agibsonccc
May 12 2016 14:59
can't distribute it though
Justin Long
@crockpotveggies
May 12 2016 15:00
I can probably automate it with a script
Paul Dubs
@treo
May 12 2016 15:00
it is automatable... but not redistributable
Justin Long
@crockpotveggies
May 12 2016 15:00
so I can't include a script in an open source repo?
Paul Dubs
@treo
May 12 2016 15:00
exactly
Adam Gibson
@agibsonccc
May 12 2016 15:00
right
Patrick Skjennum
@Habitats
May 12 2016 15:00
yeah you have to register etc to downlaod it
Justin Long
@crockpotveggies
May 12 2016 15:01
shessh
Paul Dubs
@treo
May 12 2016 15:01
on the other hand, there is this: https://aur.archlinux.org/packages/intel-mkl/
Justin Long
@crockpotveggies
May 12 2016 15:02
wait I'm confused, why can archlinux distribute a script?
Adam Gibson
@agibsonccc
May 12 2016 15:02
yeah that confuses me too actually..
Paul Dubs
@treo
May 12 2016 15:03
it is on aur, which is community content
Justin Long
@crockpotveggies
May 12 2016 15:03
Docker image is in Github w/ Apache 2.0, does that count as community content?
Adam Gibson
@agibsonccc
May 12 2016 15:03
how does that compare to a gh repo though?
Paul Dubs
@treo
May 12 2016 15:03
no idea
either intel doesn't know, or they don't care
the package has been there for 4 years
Adam Gibson
@agibsonccc
May 12 2016 15:03
I'm working directly with their licensing guys
huh
I'll figure this out
Justin Long
@crockpotveggies
May 12 2016 15:04
perhaps a binary is non-distributable
and the reason is because it has to be optimized for that machine
Adam Gibson
@agibsonccc
May 12 2016 15:04
mkl isn't oss
Justin Long
@crockpotveggies
May 12 2016 15:04
hmm
Adam Gibson
@agibsonccc
May 12 2016 15:04
no you don't compile from source
it's all dlls/.sos
Paul Dubs
@treo
May 12 2016 15:04
I'm pretty tempted to build a arch vm, just to compare libnd4j built by icc vs gcc
that may cover dl4j/nd4j
Patrick Skjennum
@Habitats
May 12 2016 15:07
i guess i'll start with openblas and see if i can get that working first
Adam Gibson
@agibsonccc
May 12 2016 15:07
we're kind of a dual prong..
Justin Long
@crockpotveggies
May 12 2016 15:07
funny can find a Japanese page for Ubuntu trusty MKL but nothing else http://d.hatena.ne.jp/cmphys/20140501/1398925683
Paul Dubs
@treo
May 12 2016 15:07
the offer there is for the whole Parallel Studio XE Professional Edition... so it covers icc, mkl and a lot of other stuff
Patrick Skjennum
@Habitats
May 12 2016 15:07
i expect the dl4j native install to be a lot easier on unix:P?
Paul Dubs
@treo
May 12 2016 15:08
it is pretty easy
@agibsonccc IANAL, it looks like it should at least cover the community edition
Patrick Skjennum
@Habitats
May 12 2016 15:08
is there any pre-exisiting scripts to do everything?
Paul Dubs
@treo
May 12 2016 15:08
but only on linux
Justin Long
@crockpotveggies
May 12 2016 15:08
does parallel studio require desktop installer? http://tzutalin.blogspot.ca/2015/06/blas-atlas-openblas-and-mkl.html
all I'm running is ubuntu-server
Paul Dubs
@treo
May 12 2016 15:09
nope, it works on the cli just fine (mkl that is, haven't tried anything else)
Justin Long
@crockpotveggies
May 12 2016 15:09
okay that install_GUI script must have a install_CLI companion
I'll have this scripted by the time I get to the office ;)
Justin Long
@crockpotveggies
May 12 2016 15:11
how does DL4J look for BLAS/MKL support? what if both are installed on the same machine?
Adam Gibson
@agibsonccc
May 12 2016 15:11
well..right now it's compile libnd4j
We need to setup the pre cooked version
Patrick Skjennum
@Habitats
May 12 2016 15:11
copy pasting that to my build script and hoping for the best @treo
Paul Dubs
@treo
May 12 2016 15:11
@Habitats reload the page, I forgot the export at first
Adam Gibson
@agibsonccc
May 12 2016 15:12
it's not statically linked
@saudet is going to have to fill me in on how javacpp is going to do dynamic linking
so it should be specifiable
Patrick Skjennum
@Habitats
May 12 2016 15:12
@treo openblas etc is that apt-get stuff?
may i should just read the readme
Justin Long
@crockpotveggies
May 12 2016 15:13
@agibsonccc you mean that libnd4j needs to be compiled locally with MKL support?
Adam Gibson
@agibsonccc
May 12 2016 15:13
likely
we'e going to bundle openblas
Justin Long
@crockpotveggies
May 12 2016 15:13
so basically when I deploy a node, I'll need to compile libnd4j with MKL
Adam Gibson
@agibsonccc
May 12 2016 15:13
then for the enterprise version do mkl
same with cudnn
it's a bit better with cudnn since you can DL it for free
but again we can't just redistribute it
I can build say: a docker image with mkl though
licensing is weird like that
I'm not going to leave you out in the woods :P (hence the dynamic linking)
I just don't know how javacpp works
the intention is to have it be linke net lib jav
Justin Long
@crockpotveggies
May 12 2016 15:15
if I package together an uber jar on spark, am I going to need to specify a LIBND4J_HOME?
on all the worker nodes
Adam Gibson
@agibsonccc
May 12 2016 15:15
maybe?I haven't worked this out yet
depends on how the linking ends up working
the LIBND4J_HOME is def a script thing though
I don't think that'd be how it works with dynamic linking
Justin Long
@crockpotveggies
May 12 2016 15:16
for safety sake, I'm going to include it in my docker image
Adam Gibson
@agibsonccc
May 12 2016 15:16
I think it depends on how we handle the blas stuff
yaeh
yeah*
@agibsonccc since you're making a distro for commercial reasons that requires payment, I believe that's why you can't make a public Docker
but on the other hand, I can, because I'm an open-source contributor
Adam Gibson
@agibsonccc
May 12 2016 15:19
Well - I'm serious about the mkl licensing
We are going to get special terms
We'd be in a bit of a gray area
if an open source mkl thing can happen then great
I honestly just don't know
I'm telling you what I know right now
I need to read how it's going to work first
Justin Long
@crockpotveggies
May 12 2016 15:19
cool, let me trailblaze a bit on this public Docker
Adam Gibson
@agibsonccc
May 12 2016 15:19
cool
I also just don't know all of the stuff with redistro of mkl
Justin Long
@crockpotveggies
May 12 2016 15:20
alright stepping out, be back in 1.5 hours
Adam Gibson
@agibsonccc
May 12 2016 15:20
cool
Paul Dubs
@treo
May 12 2016 15:26
I'm also pretty sure you can create a Dockerfile, that builds everything given that you give it cudnn and mkl
so, you can't just start with the fully optimized docker image, but you can build it yourself
Adam Gibson
@agibsonccc
May 12 2016 15:28
shouldn't be impossible
we're just doing that + some other stuff
Patrick Skjennum
@Habitats
May 12 2016 15:48
does dl4j require special mvn or can i just use some old one
i remember spark being super picky about mvn version
i'll just try mvn 3.0.5 and see what happens
Paul Dubs
@treo
May 12 2016 15:52
if you want to exclude something you need a newer one
raver119
@raver119
May 12 2016 15:52
@treo
At iteration 10 a single iteration takes 1454 MILLISECONDS
Patrick Skjennum
@Habitats
May 12 2016 15:53
yeah i realized that @treo
it went horribly wrong
annoying that apt-get gives you an old version
Paul Dubs
@treo
May 12 2016 15:59
@raver119 faster than cpu :D
raver119
@raver119
May 12 2016 16:01
yes
15%
+-
Patrick Skjennum
@Habitats
May 12 2016 16:01
https://github.com/deeplearning4j/libnd4j says sudo apt-get install install cmake
:P
raver119
@raver119
May 12 2016 16:01
but that's still not the end
Paul Dubs
@treo
May 12 2016 16:01
awesome!
raver119
@raver119
May 12 2016 16:01
i'll be doing shmem/register pass
so occupancy should get closer to 100%
and on proper hardware (aka, not 970) - it will be faster :)
Paul Dubs
@treo
May 12 2016 16:05
my wife has already given me her blessing on getting an 1080 :D (as the will be getting the 970 :D)
raver119
@raver119
May 12 2016 16:06
i'm not sure... i'll probably wait for Ti
hadn't decided yet
AWS still a concern for me...
cc3.0 isn't the thing of my dream...
it just has lower parallelism limits
regardless of what i'll do there
it's just lower.
x2 at very least
Paul Dubs
@treo
May 12 2016 16:08
cc3.0?
raver119
@raver119
May 12 2016 16:08
yes
Paul Dubs
@treo
May 12 2016 16:08
that is?
raver119
@raver119
May 12 2016 16:08
aws uses grid
compute capability
nvidia grid is specia cuda
special for clouds
and compute capability describes what device can do, and can not
so, cc 3.0 has max of 8 blocks per mp
instead of 32 for our 970
  • lower registers number
plus
gtx 970 is cc5.2
here's aws gpu info
so, lower shared memory, lower parallelism, lower number of registers, lower memory bus, lower everything.
Paul Dubs
@treo
May 12 2016 16:15
i.e. atm there is no advantage to be gained from getting the gpu instances yet
It would be interesting to see how price efficent cuda is here, because the 36 core c4.8xlarge that I used for w2v, is pretty cheap compared to the g2.8xlarge
Patrick Skjennum
@Habitats
May 12 2016 16:23
there's a lot of stuff that isn't working in the libnd4j readme for ubuntu
apt-get cmake gives you wrong version
Paul Dubs
@treo
May 12 2016 16:26
depends on the version of your ubuntu
raver119
@raver119
May 12 2016 16:29
@treo i'm 100% sure we'll make something viable for aws, but i doub't it will happen today or tomorrow.
Paul Dubs
@treo
May 12 2016 16:29
will probably need more of your magic :)
raver119
@raver119
May 12 2016 16:30
yea, bloody magic will be required there. something like multigpu use for single task etc
imaginary computations
etc
but even then, someone with private cloud full of modern tesla cards will beat aws :)
because it will be multigpu + not cc3.0 :)
Paul Dubs
@treo
May 12 2016 16:34
:D
new profiler
looks like reduce kernel should be fisited next
your profiler report done week ago
Paul Dubs
@treo
May 12 2016 16:50
nice, max is now what was average then, and avg is now less than 1/10 of what min was
great work :+1:
ChrisN
@chrisvnicholson
May 12 2016 16:54
go raver!
Patrick Skjennum
@Habitats
May 12 2016 17:10
for spark, do i need to install native stuff manually on all my workers?
Paul Dubs
@treo
May 12 2016 17:13
not necessarily
if you linked against openblas, then you only have to install openblas
if you linked against mkl you will have to install mkl
Patrick Skjennum
@Habitats
May 12 2016 17:14
yeah
Paul Dubs
@treo
May 12 2016 17:14
everything else should be selfcontained
Patrick Skjennum
@Habitats
May 12 2016 17:14
good
wutzebaer
@wutzebaer
May 12 2016 17:56
cool 4 days till release =)
raver119
@raver119
May 12 2016 18:06
don't worry
after release, everyone here will start counting days till next release
and most of ppl will stick to snapshots...
Justin Long
@crockpotveggies
May 12 2016 18:20
This message was deleted
This message was deleted
This message was deleted
This message was deleted
This message was deleted
oh wow did not see the most recent conversation history :P
Justin Long
@crockpotveggies
May 12 2016 19:29
I'm having a problem with Docker and ephemeral ports with YARN
@Habitats you run into that, too?
basically, YARN is reaching out to non-standard ports in a huge range
I tried exposing that range - to my demise. such a huge range thread locked docker
Another app is currently holding the xtables lock
Patrick Skjennum
@Habitats
May 12 2016 19:35
haven't gotten that far. atm i can't even get spark to work on google cloud
spark doesn't deploy correct with theur automatic tools:|
Justin Long
@crockpotveggies
May 12 2016 19:37
thanks Google!
I narrowed down my issue to what I think is ephemeral ports
going to give that a shot
basically locked the entire cluster with my last fuck up, hard resets across the board
Patrick Skjennum
@Habitats
May 12 2016 19:39
fuck ups </3
Patrick Skjennum
@Habitats
May 12 2016 21:51
@crockpotveggies how's your performance with spark?
Justin Long
@crockpotveggies
May 12 2016 21:52
not quite there yet, dealing with networking issues
I'm switching to Weave virtual network since ports are an issue with YARN
config is such a pain
Patrick Skjennum
@Habitats
May 12 2016 21:52
second that
Justin Long
@crockpotveggies
May 12 2016 21:55
almost there
cluster was talking to itself yesterday, just a few more steps so it can finally start handing off tasks
Patrick Skjennum
@Habitats
May 12 2016 21:56
it'd be very interesting to see what you get in terms of performance
i've only just gotten spark to work locally
but it's veeeeeeeeeeeeeeeeeeeeeeery slow
Justin Long
@crockpotveggies
May 12 2016 21:58
yea lots of optimizations required
Patrick Skjennum
@Habitats
May 12 2016 21:58
my thesis depends on running my rnn on my entire dataset, but that'll take 45 days on my home comp
and i got a week to get it working on google cloud
Justin Long
@crockpotveggies
May 12 2016 21:58
and Google cloud isn't working?
ah
Patrick Skjennum
@Habitats
May 12 2016 21:58
yeah not yet no
spark works
Ruben Fiszel
@rubenfiszel
May 12 2016 21:58
building from source
/home/atoll/nd4j/nd4j-backends/nd4j-backend-impls/nd4j-native/target/classes/org/nd4j/nativeblas/linux-x86_64/jnind4j.cpp:4395:14: error: ‘class NativeOps’ has no member named ‘enableVerboseMode’
nd4j
raver119
@raver119
May 12 2016 22:00
pull master
for both libnd4j and nd4j
you're using outdated libnd4j at least
Ruben Fiszel
@rubenfiszel
May 12 2016 22:01
oh right I forgot I had to start with libnd4j
sorry
Do we have to change something about javacpp ?
I remember seeing an annoucement
Paul Dubs
@treo
May 12 2016 22:03
yes, you have to use javacpp master
Ruben Fiszel
@rubenfiszel
May 12 2016 22:04
javacpp master ?
clone it, mvn clean install it, and you should be good to go
Ruben Fiszel
@rubenfiszel
May 12 2016 22:04
Oh ok
Paul Dubs
@treo
May 12 2016 22:05
anyway, I'm heading to bed now, good night
Ruben Fiszel
@rubenfiszel
May 12 2016 22:08
good night
I think my build is broken now
seg fault on Compiled method (nm) 7198230 158 n 0 java.util.zip.ZipFile::getEntry (native)
Patrick Skjennum
@Habitats
May 12 2016 22:08
everything on current master should work fine at least
Ruben Fiszel
@rubenfiszel
May 12 2016 22:09
actually no, my current error is [error] (run-main-0) java.lang.UnsatisfiedLinkError: no jnind4j in java.library.path
java.lang.UnsatisfiedLinkError: no jnind4j in java.library.path
I recompiled everything just now
pulled everything
Patrick Skjennum
@Habitats
May 12 2016 22:10
got libnd4j on path?
and openblas?
Ruben Fiszel
@rubenfiszel
May 12 2016 22:11
Normally yes
atoll@rub440s  ~  echo $LIBND4J_HOME
/home/atoll/libnd4j
was working well before
Should I have to reinstall openblas ?
Patrick Skjennum
@Habitats
May 12 2016 22:14
no
Ruben Fiszel
@rubenfiszel
May 12 2016 22:14
Then I don't know what's happening :(
Patrick Skjennum
@Habitats
May 12 2016 22:16
git clone https://github.com/deeplearning4j/libnd4j.git
cd libnd4j
bash buildnativeoperations.sh cpu 
echo "export LIBND4J_HOME=`pwd`" >> ~/.profile
source ~./profile
cd ..
git clone https://github.com/deeplearning4j/nd4j.git
cd nd4j
mvn clean install -DskipTests -Dmaven.javadoc.skip=true -pl '!:nd4j-cuda-7.5,!org.nd4j:nd4j-tests'
cd ..
git clone https://github.com/deeplearning4j/deeplearning4j.git
cd deeplearning4j
mvn clean install -DskipTests -Dmaven.javadoc.skip=true
this is what i do to build everything
all that works if openblas i installed from before
and yeah javaccp 1.2
raver119
@raver119
May 12 2016 22:21
anyone here has any ideas, how 4 (four) pointers can use 896 bytes of memory?
Ruben Fiszel
@rubenfiszel
May 12 2016 22:27
well still Caused by: java.lang.UnsatisfiedLinkError: no nd4j in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
Patrick Skjennum
@Habitats
May 12 2016 22:35
what are you trying to run?
i had that problem, but for me i needed to include org.nd4j:nd4j-native:0.4-rc3.9-SNAPSHOT:windows-x86_64 as a dep, but that doesn't make sense if you're on unix
swap windows for unix and cross your fingers?:P
they split up the native lib last week, so you need another dep
Ruben Fiszel
@rubenfiszel
May 12 2016 22:38
oh
what do I need now ?
Patrick Skjennum
@Habitats
May 12 2016 22:39
well you need what i pasted, but the unix version of it
Ruben Fiszel
@rubenfiszel
May 12 2016 22:39
is it written somewhere ?
Patrick Skjennum
@Habitats
May 12 2016 22:39
no idea
Ruben Fiszel
@rubenfiszel
May 12 2016 22:40
what you've pasted it's a pom dependency ?
Patrick Skjennum
@Habitats
May 12 2016 22:40
oh yeah, i pasted gradle version
Ruben Fiszel
@rubenfiszel
May 12 2016 22:40
I need a sbt version in fact but I'm not sure where to put the last part :D
, "org.nd4j" % "nd4j-native" % "0.4-rc3.9-SNAPSHOT"
Patrick Skjennum
@Habitats
May 12 2016 22:41
add another %? idno
Ruben Fiszel
@rubenfiszel
May 12 2016 22:41
will try
Patrick Skjennum
@Habitats
May 12 2016 22:41
you need both btw
both normal and the funny one
Ruben Fiszel
@rubenfiszel
May 12 2016 22:42
oh
well add % isn't what I need
Patrick Skjennum
@Habitats
May 12 2016 22:42
xD
Ruben Fiszel
@rubenfiszel
May 12 2016 22:42
what is the name of the last part ?
the part before it is the version
Patrick Skjennum
@Habitats
May 12 2016 22:43
hm?=
Ruben Fiszel
@rubenfiszel
May 12 2016 22:47
well ... the joy of being an early adopter :D
no one using sbt ?
even just a pom
Patrick Skjennum
@Habitats
May 12 2016 22:49
i haven't seen anyone using sbt here ever
but yeah, early adopters<3
you don't realize what you've done until it's too late
@atollFP i think they are called "classifiers"
@atollFP "classifier is defined by Maven as the fifth element of a project coordinate, after groupId, artifactId, version and packaging." https://stackoverflow.com/questions/18571534/what-is-a-classifier-in-sbt
so you just write "classifier" basically
Ruben Fiszel
@rubenfiszel
May 12 2016 22:54
saw that, tried, didn't work
this worked , "org.nd4j" % "nd4j-native" % "0.4-rc3.9-SNAPSHOT-linux_x86-64"
but I still have the same error
...
I guess I will wait a few days that the version is officially released
or I will become crazy
Patrick Skjennum
@Habitats
May 12 2016 22:56
wish i could help you out. unfortunately i'm having my share of issues getting it to work on windows
it's pretty stable now though
Ruben Fiszel
@rubenfiszel
May 12 2016 22:56
I can understand
Well it probably is. But it doesn't run
:D
Patrick Skjennum
@Habitats
May 12 2016 22:57
have you tried running the examples with the native code? i used that to sanity check
download the examples repo and swap nd4jx86 with native, and it should work
Ruben Fiszel
@rubenfiszel
May 12 2016 22:58
I believe I would still have the same error
Patrick Skjennum
@Habitats
May 12 2016 22:58
i'm not saying you're wrong:P
Adam Gibson
@agibsonccc
May 12 2016 22:58
What os are you running on?
Ruben Fiszel
@rubenfiszel
May 12 2016 22:58
linux
Patrick Skjennum
@Habitats
May 12 2016 22:58
ah, @agibsonccc to the rescue:D
Ruben Fiszel
@rubenfiszel
May 12 2016 22:58
archlinux
Adam Gibson
@agibsonccc
May 12 2016 22:58
You shouldnt be having these many problems
Sbt?
Ruben Fiszel
@rubenfiszel
May 12 2016 22:59
yes
the .1 is because I added the IDENTITY locally
Patrick Skjennum
@Habitats
May 12 2016 23:00
you wnt to remove anything with sonatype from your build config though
Adam Gibson
@agibsonccc
May 12 2016 23:00
How did you end up with a dash?
No
Stop Patrick is wrong
Patrick Skjennum
@Habitats
May 12 2016 23:01
i had to do that with gradle:s
Adam Gibson
@agibsonccc
May 12 2016 23:01
So add a fourth declaration
Not a dash
Ruben Fiszel
@rubenfiszel
May 12 2016 23:01
Doesn't work with sbt, doesn't work with classifier
Adam Gibson
@agibsonccc
May 12 2016 23:01
So separate the dash with Linux with anothe semicolon
Sorry percent
Ruben Fiszel
@rubenfiszel
May 12 2016 23:02
That's what I did
It's not the syntax of sbt
Adam Gibson
@agibsonccc
May 12 2016 23:02
Sec meeting will do this later
Ruben Fiszel
@rubenfiszel
May 12 2016 23:02
correct syntax is with "classifier"
but doesn't work. So I inspected the .m2/repository manually
found the 0.4-rc3.9-SNAPSHOT-linux_x86-64.jar
Ruben Fiszel
@rubenfiszel
May 12 2016 23:10
I made it work
yeaaaah
, "org.nd4j" % "nd4j-native" % "0.4-rc3.9-SNAPSHOT" classifier "linux-x86_64"
Patrick Skjennum
@Habitats
May 12 2016 23:11
but that's what i told you:P
Ruben Fiszel
@rubenfiszel
May 12 2016 23:11
Yes
I confused - and _
Patrick Skjennum
@Habitats
May 12 2016 23:12
oh boy:D well neat that you got it working
Ruben Fiszel
@rubenfiszel
May 12 2016 23:12
ye ty :)