These are chat archives for deeplearning4j/deeplearning4j/earlyadopters

27th
Jun 2016
Kshitij Jain
@jain98
Jun 27 2016 01:12
I'm currently building nd4j and when I'm running the given maven command I'm getting this-
Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce (libnd4j-checks) on project nd4j-native: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]
Would anyone know why?
Alex Black
@AlexDBlack
Jun 27 2016 01:13
yep
need LIBND4J_HOME env variable
should be in the guide
Kshitij Jain
@jain98
Jun 27 2016 01:13
oh boy...
I ran the export command
So i set LIBND4J_HOME as the libnd4j location
Alex Black
@AlexDBlack
Jun 27 2016 01:15
right

this is the process I use on my mindows machines:
In mys2/mingw64 shell:

  • ./buildnativeoperations.sh
  • ./buildnativeoperations.sh -c cuda

In cmd:

  • cd c:/dl4j/git/nd4j
  • vcvars64
  • SET LIBND4J_HOME=C:/DL4J/Git/libnd4j/
  • mvn clean install -DskipTests
paths will be different for you obviously
it's also possible to skip cuda build if you don't need that
but then need extra args for last mvn step
Kshitij Jain
@jain98
Jun 27 2016 01:25
ok I set the LIBND4J_HOME env var
but I still couldn't get rid of the error I was getting
Alex Black
@AlexDBlack
Jun 27 2016 01:26
did libnd4j build complete succesfully?
Kshitij Jain
@jain98
Jun 27 2016 01:27
is that when I had to run the vcvars64
file
Alex Black
@AlexDBlack
Jun 27 2016 01:27
./buildnativeoperations.sh
no, that
Kshitij Jain
@jain98
Jun 27 2016 01:31
$ ./buildnativeoperations.sh
eval cmake
./buildnativeoperations.sh: line 25: VCINSTALLDIR: unbound variable
Got this when I ran it
Samuel Audet
@saudet
Jun 27 2016 01:41
Looks like I didn't my changes properly... Will fix that later today, but older versions of the script should work
Kshitij Jain
@jain98
Jun 27 2016 02:01
Ok, so I finished building nd4j
But i got this little thing at the end-
LINK : fatal error LNK1181: cannot open input file 'nd4j.lib'
Anyone knows what it means?
Adam Gibson
@agibsonccc
Jun 27 2016 02:02
uh..huh
Is this after mvn clean install or clicking run on an example?
Need more context
Kshitij Jain
@jain98
Jun 27 2016 02:03
maven clean install
Adam Gibson
@agibsonccc
Jun 27 2016 02:03
Did you set LIBND4J_HOME ?
Kshitij Jain
@jain98
Jun 27 2016 02:03
yeah
Adam Gibson
@agibsonccc
Jun 27 2016 02:04
Can you do a https://gist.github.com of a libnd4j build?
both cpu and gpu
Kshitij Jain
@jain98
Jun 27 2016 02:05
you want the whole log after I run maven clean
Adam Gibson
@agibsonccc
Jun 27 2016 02:05
LIB nd4j
not java
It sounds like your libnd4j build is screwed up
Kshitij Jain
@jain98
Jun 27 2016 02:07
so I should run buildnativeoperations.sh again and upload the log?
Adam Gibson
@agibsonccc
Jun 27 2016 02:07
well again
for both cpu and gpu
yes
Kshitij Jain
@jain98
Jun 27 2016 02:08
I didn't build the gpu backend
Adam Gibson
@agibsonccc
Jun 27 2016 02:08
Did you make sure to follow build instructions for ignoring the gpu backend?
Still want the cpu one too anyways
Kshitij Jain
@jain98
Jun 27 2016 02:09
wait...I have been building the gpu backend
the CUDA backend
I've created the gist
@agibsonccc , you should see it now
Adam Gibson
@agibsonccc
Jun 27 2016 02:14
link?
Adam Gibson
@agibsonccc
Jun 27 2016 02:15
right ok
LIBND4J_HOME ?
Kshitij Jain
@jain98
Jun 27 2016 02:15
I set it
Adam Gibson
@agibsonccc
Jun 27 2016 02:15
to what though?
Kshitij Jain
@jain98
Jun 27 2016 02:16
the libnd4j locatio
*location
Adam Gibson
@agibsonccc
Jun 27 2016 02:16
specifics please?
considering you use msys2 on windows there could be all sorts of screw ups there
Kshitij Jain
@jain98
Jun 27 2016 02:16
C:\Users\User2\Desktop\git\github\libnd4j
raver119
@raver119
Jun 27 2016 06:44
@jain98 at least for CUDA your issue is obvious: you don't have CUDA Toolkit installed on your system, or you hadn't rebooted after install
s103451
@s103451
Jun 27 2016 12:58
I'm trying to read my 300x300 images into an DataSetIterator, however when I call iterator.next(), it comes with the following error:
Exception in thread "main" java.lang.IllegalStateException: Indeterminant state: record must not be null, or a file iterator must exist
at org.canova.image.recordreader.BaseImageRecordReader.hasNext(BaseImageRecordReader.java:257)
at org.deeplearning4j.datasets.canova.RecordReaderDataSetIterator.hasNext(RecordReaderDataSetIterator.java:332)
at dtu.thesis.Driver.main(Driver.java:48)
Do you have an idea to what I have done wrong ?
raver119
@raver119
Jun 27 2016 13:27
damn, 1070 launch is mess..
Paul Dubs
@treo
Jun 27 2016 13:27
why?
raver119
@raver119
Jun 27 2016 13:27
looks like 367 driver do not support 1070
for windows 1070 driver is 368.39
Paul Dubs
@treo
Jun 27 2016 13:28
should the 367 have supported it?
raver119
@raver119
Jun 27 2016 13:28
so ubuntu just dies with 1070, since no driver is available
Paul Dubs
@treo
Jun 27 2016 13:28
oh... I see
raver119
@raver119
Jun 27 2016 13:28
yea i've thought that too
had to install windows
windows required to be full-updated too
otherwise nvidia installer says "os isn't compatible"
so pretty soon i'll have 1070 timings + hopefully will finally get multi-gpu stuff working properly
Paul Dubs
@treo
Jun 27 2016 13:31
multi gpu with two different speeds even, so that's an even more interesting setup
raver119
@raver119
Jun 27 2016 13:34
nope
dual 1070
Paul Dubs
@treo
Jun 27 2016 13:35
oh, I thought you were keeping the 970
raver119
@raver119
Jun 27 2016 13:35
nah, that's bad idea for cuda
for sli it's bad too though
you'll have to hold-off fastest gpu all the time
Paul Dubs
@treo
Jun 27 2016 13:36
ok, I already wondered how well a setup like that would have worked
raver119
@raver119
Jun 27 2016 13:36
1070 + 970?
Paul Dubs
@treo
Jun 27 2016 13:37
unequal cards in general
raver119
@raver119
Jun 27 2016 13:37
it'll be bad even for data parallelism
and for other scaling approaches - it'll become nightmare
Samuel Audet
@saudet
Jun 27 2016 13:42
@s103451 Try to use a PathLabelGenerator instead of a list of labels
s103451
@s103451
Jun 27 2016 14:08
Samuel do you have a short code somewhere showing what I should return in the inner methods of the interface ?
@saudet
Adam Gibson
@agibsonccc
Jun 27 2016 15:31
@s103451 just search the Canova repo on github
raver119
@raver119
Jun 27 2016 16:04
@treo still here?
Paul Dubs
@treo
Jun 27 2016 16:04
yep, still here
raver119
@raver119
Jun 27 2016 16:04
could you please time your copy of my branch with LenetMnistExample again, without pulling my branch?
i need 970 time on windows
but don't pull my branch please, there's way too much debug spam, just use one you have from last check
Paul Dubs
@treo
Jun 27 2016 16:05
so, you want me to just run the LenetMnistExample again, without pulling anything?
raver119
@raver119
Jun 27 2016 16:06
yep
neet time per epocj
epoch*
your windows time was close to my time on linux
Paul Dubs
@treo
Jun 27 2016 16:07
still on cuda, right?
raver119
@raver119
Jun 27 2016 16:07
yea
i see strange behavior here, wonder what's on your side
Paul Dubs
@treo
Jun 27 2016 16:07
and with the configuration you had set up
raver119
@raver119
Jun 27 2016 16:07
i'm running with defaults atm
Paul Dubs
@treo
Jun 27 2016 16:08
ok, I'll run it with defaults then aswell
28228, 27423, 27587
first three epochs
raver119
@raver119
Jun 27 2016 16:09
great
now please increase batchSize
to 256
Paul Dubs
@treo
Jun 27 2016 16:11
I'm seeing a better gpu utilization
raver119
@raver119
Jun 27 2016 16:11
yea, me too
that's pretty obvious
larger dimensions
more tads to process
etc
with default settings and batchSize = 64 gpu util is 60%
Paul Dubs
@treo
Jun 27 2016 16:11
21934, 21409, 21288
raver119
@raver119
Jun 27 2016 16:12
great.
so your time on gtx970 is pretty equal to my time on gtx 1070
Paul Dubs
@treo
Jun 27 2016 16:12
65 to 75 for me with default and 80 to 90 with 256
now that's pretty odd
raver119
@raver119
Jun 27 2016 16:13
55-60% for me with default
75-90% with 256
however, for me it's pcie 3.0 x8
instead of x16 :(
Paul Dubs
@treo
Jun 27 2016 16:13
oh, why?
but even with pci 3.0 8x you should be seeing better speeds than that
raver119
@raver119
Jun 27 2016 16:14
hm
it's even worse
Paul Dubs
@treo
Jun 27 2016 16:15
even less than 8 lanes?
raver119
@raver119
Jun 27 2016 16:15
pcie 1.1 x8
1111.gif
Paul Dubs
@treo
Jun 27 2016 16:16
ok, you know pretty well why you are seeing the speeds that you see now
raver119
@raver119
Jun 27 2016 16:16
see that 1.1?
it changes over time
either x8 3.0 or x8 1.1
probably energy saving..
but anyway, it's not as fast as expected.
Paul Dubs
@treo
Jun 27 2016 16:18
blob
you can click the ? to force the card into its highest power state
I also see the 1.1 when Idling
raver119
@raver119
Jun 27 2016 16:19
aha
but i still have pcie line split :(
Paul Dubs
@treo
Jun 27 2016 16:19
but it also goes back to 3.0 when I start the test
raver119
@raver119
Jun 27 2016 16:19
between two devices
Paul Dubs
@treo
Jun 27 2016 16:22
hmm... you still should be faster
raver119
@raver119
Jun 27 2016 16:22
yep. but i'm not, and that's pretty surpising
Paul Dubs
@treo
Jun 27 2016 16:23
so, what does the profiler have to say about that?
raver119
@raver119
Jun 27 2016 16:23
just installed it
but you're right, thats clearly a sign of incoming profiling pass...
Paul Dubs
@treo
Jun 27 2016 16:27
By the way, I heard back from the current Yeppp! developer, on how to build from master... and it looks like it doesn't support linux in the current state.
Adam Gibson
@agibsonccc
Jun 27 2016 16:27
o_0
Don't c code bases usually ONLY support linux?
that's a weird turn around
Paul Dubs
@treo
Jun 27 2016 16:28
he's refactoring to a new code generator there, and works on an osx machine
Adam Gibson
@agibsonccc
Jun 27 2016 16:28
ah
Paul Dubs
@treo
Jun 27 2016 17:10
and building from the old version also doesn't seem to work... maybe there is a reason why there was no binary release in 3 years
10bitomaroof
@10bitomaroof
Jun 27 2016 18:42
can anyone suggest from where to download training dataset for word to vec
??
raver119
@raver119
Jun 27 2016 18:43
hm
that's unusual question
usually people building w2v models specially for their own needs
Paul Dubs
@treo
Jun 27 2016 18:44
I'm using wikipedia as the base
raver119
@raver119
Jun 27 2016 18:44
if they need some "general" models - they just grab precomputed
10bitomaroof
@10bitomaroof
Jun 27 2016 18:44
I need general can u recommend from where to dowload
some link?
thnx alot
10bitomaroof
@10bitomaroof
Jun 27 2016 18:48
thnx i have this links but they dont have links to data sets
Paul Dubs
@treo
Jun 27 2016 18:49
they certainly do have links to the pretrained models and glove links to at least 2 sources
raver119
@raver119
Jun 27 2016 18:55
sec
we had nice set of inks few days ago
scroll down
there's collection of precomputed models
DesmondYuan
@DesmondYuan
Jun 27 2016 21:20
May I ask how dl4j organize the parameters of a network? Using model.params() only gives a 1*n array. Does it go through w11,w12,w13 or w11,w21,w31?
Adam Gibson
@agibsonccc
Jun 27 2016 21:28
it gathers them yes
Adam Gibson
@agibsonccc
Jun 27 2016 22:15
Has anyone seen this before?
symbol lookup error: /usr/local/mkl/lib/intel64/libmkl_intel_thread.so: undefined symbol: omp_get_num_procs
Weird thing is: I have my ldconfig and LD_LIBRARY_PATH and mkl setup
This is happening to me out of nowhere on a docker container
I'm using spark standalone
nothing fancy here
it's 1 jvm o_0
Eduardo Gonzalez
@wmeddie
Jun 27 2016 22:16
Spark always seems to give me problems.
Adam Gibson
@agibsonccc
Jun 27 2016 22:16
yeah
runs fine on my laptop though
same setup
mkl etc
Eduardo Gonzalez
@wmeddie
Jun 27 2016 22:17
same setup including the site-config.xml and stuff?
raver119
@raver119
Jun 27 2016 22:17
i'm pretty sure i've saw such issues in general chat before
Adam Gibson
@agibsonccc
Jun 27 2016 22:18
right so again just using spark standalone
not even distributed
so spark.extra libs or w/e shouldn't be needed
raver119
@raver119
Jun 27 2016 22:18
and as far as i remember solution was libnd4j recompilation.
but that was month+ ago, before native dust settled
Adam Gibson
@agibsonccc
Jun 27 2016 22:18
this was a fresh clone o_0
meh will try again
Eduardo Gonzalez
@wmeddie
Jun 27 2016 22:19
Sometimes that works.
raver119
@raver119
Jun 27 2016 22:19

@treo
add

/opt/intel/lib/intel64
/opt/intel/mkl/lib/intel64

to your /etc/ld.so.conf
@crockpotveggies
you'll need to rebuild libnd4j
@Habitats
ooooh
@treo
and run ldconfig

that's from history
Adam Gibson
@agibsonccc
Jun 27 2016 22:19
huh
Eduardo Gonzalez
@wmeddie
Jun 27 2016 22:20
Spark support is awesome though. If torchnet is "Torch on steroids" DL4J comes with steroids built in because of Spark.
Adam Gibson
@agibsonccc
Jun 27 2016 22:20
haha
welll @raver119 needs to finish multi gpu support with spark first ;)
raver119
@raver119
Jun 27 2016 22:21
i'm alsmost there
Adam Gibson
@agibsonccc
Jun 27 2016 22:21
we merged cudnn
so we'll have cudnn on spark next release
Eduardo Gonzalez
@wmeddie
Jun 27 2016 22:21
:heart:
Adam Gibson
@agibsonccc
Jun 27 2016 22:21
not all of it (still need to do batch norm)
it's getting there :smile:
Alex is still profiling some of our spark stuff yet
raver119
@raver119
Jun 27 2016 22:22
yea, that's way more important thing... spark has too much issues, even without multi-gpu issues :)
Adam Gibson
@agibsonccc
Jun 27 2016 22:22
yeah
I added streaming btw
@wmeddie
Eduardo Gonzalez
@wmeddie
Jun 27 2016 22:23
Spark Streaming?
Adam Gibson
@agibsonccc
Jun 27 2016 22:23
lotta support for kafka
yeah we have spark streaming too
Rebuilding still didn't work
fml
Eduardo Gonzalez
@wmeddie
Jun 27 2016 22:26
Yes awesome work. Thinking about using it in an upcoming project that needs to parse legal documents out of peoplesoft. (I'll go in to details when I'm in SF)
Adam Gibson
@agibsonccc
Jun 27 2016 22:26
oh awesome
Samuel Audet
@saudet
Jun 27 2016 23:58
@agibsonccc Figured on the MKL issue? We often need to preload manually an OpenMP implementation: https://github.com/deeplearning4j/libnd4j#linking-with-mkl that's libgomp.so there but libiomp5.so should also work if available according to @treo