These are chat archives for deeplearning4j/deeplearning4j/earlyadopters

2nd
May 2016
Alex Black
@AlexDBlack
May 02 2016 00:40
@raver119 @treo re: LSTMs, still some more to do there on the dl4j side (eliminate some unnecessary copies, etc). I should be able to get that done today
@treo it'd be great to get benchmarks (and, hotspots) up for CNNs if you have the time. things should be better than a couple of weeks ago, but I suspect there's still a lot of room for improvement
Alex Black
@AlexDBlack
May 02 2016 01:43
@atollFP "...run 10 different Neural Network with the same input or one computation graph that branch 10 times on the input layer ?"
current compgraph implementation does forward pass sequentially in a topographical sort order. So if only the input is shared (not any other layers), you are probably better off with separate networks - especially if you aren't doing batching (i.e., doing one example at a time)
Romeo Kienzler
@romeokienzler
May 02 2016 03:08
Hi, just not giving up...here an output on Ubuntu 15.10, gcc 4.8, any ideas?

gcc: error trying to exec 'cc1plus': execvp: No such file or directory
CMake Error at nd4j_generated_NativeOps.cu.o.cmake:198 (message):
Error generating
/libnd4j/blasbuild/cuda/blas/CMakeFiles/nd4j.dir/cuda/./nd4j_generated_NativeOps.cu.o

blas/CMakeFiles/nd4j.dir/build.make:66: recipe for target 'blas/CMakeFiles/nd4j.dir/cuda/nd4j_generated_NativeOps.cu.o' failed
make[2]: [blas/CMakeFiles/nd4j.dir/cuda/nd4j_generated_NativeOps.cu.o] Error 1
make[2]: Leaving directory '/libnd4j/blasbuild/cuda'
CMakeFiles/Makefile2:108: recipe for target 'blas/CMakeFiles/nd4j.dir/all' failed
make[1]:
[blas/CMakeFiles/nd4j.dir/all] Error 2
make[1]: Leaving directory '/libnd4j/blasbuild/cuda'
Makefile:78: recipe for target 'all' failed
make: * [all] Error 2

Adam Gibson
@agibsonccc
May 02 2016 03:09
You need gcc 4.9 I believe
That doesn't make any sense
Everything builds on all 3 devs linux machines
Do you have nvcc etc?
Either way: do ./buildnativeoperations.sh blas cuda release/debug (depending on what you want to run)
linux is where we test the most
this is a bit weird.
@romeokienzler btw are you using this on power boxes?
Romeo Kienzler
@romeokienzler
May 02 2016 03:12
@agibsonccc apt-get install nvcc
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'cuda-core-7-5' instead of 'nvcc'
cuda-core-7-5 is already the newest version.
cuda-core-7-5 set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Adam Gibson
@agibsonccc
May 02 2016 03:13
you've already lost if you're using the ubuntu version
I'd strongly suggest using the .bin from nvidia
Either way - according to the forums there I'm guessing your problem is related to not having g++ maybe?
Romeo Kienzler
@romeokienzler
May 02 2016 03:13
@agibsonccc ok, thanks, I'll reinstall and try again - I'm on DOCKER any problem with that?
Adam Gibson
@agibsonccc
May 02 2016 03:13
hmm not sure
We haven't really played with it in docker
Romeo Kienzler
@romeokienzler
May 02 2016 03:14
@agibsonccc g++
g++: fatal error: no input files
compilation terminated.
Adam Gibson
@agibsonccc
May 02 2016 03:14
yeah..not sure
Romeo Kienzler
@romeokienzler
May 02 2016 03:14
@agibsonccc ok, I'll try and let you know
Adam Gibson
@agibsonccc
May 02 2016 03:14
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
That's the error
I'd google search that
we'd love feedback
I haven't seen that
Romeo Kienzler
@romeokienzler
May 02 2016 03:14
of course!
Adam Gibson
@agibsonccc
May 02 2016 03:14
thanks
this is helpfl
Romeo Kienzler
@romeokienzler
May 02 2016 03:21

ah, problem was hat I was using gcc 4.8 and g++ 5.X, now updated the link, now I get :/libnd4j/blas/cuda/NativeOps.cu(3312): warning: conversion from a string literal to "char *" is deprecated

Killed
-- Removing /libnd4j/blasbuild/cuda/blas/CMakeFiles/nd4j.dir/cuda/./nd4j_generated_NativeOps.cu.o
/usr/bin/cmake -E remove /libnd4j/blasbuild/cuda/blas/CMakeFiles/nd4j.dir/cuda/./nd4j_generated_NativeOps.cu.o
CMake Error at nd4j_generated_NativeOps.cu.o.cmake:257 (message):
Error generating file
/libnd4j/blasbuild/cuda/blas/CMakeFiles/nd4j.dir/cuda/./nd4j_generated_NativeOps.cu.o

I'll go for the .bin file instead of .deb ok?

Adam Gibson
@agibsonccc
May 02 2016 03:22
That's weird
yeah
try the bin
That usually "just works"
That's weird though
"killed?"
it just had a warning
that's strange
we see that all the time
(granted warnings aren't good but they aren't show stoppers"
)
Romeo Kienzler
@romeokienzler
May 02 2016 03:23
:9
ok, will try the bin and tty, thanks!!
Romeo Kienzler
@romeokienzler
May 02 2016 03:44
Ok, same error now with the .run (.bin doesn't exist on nvidia download page) - now also on gcc/g++ 4.9

Killed
-- Removing /libnd4j/blasbuild/cuda/blas/CMakeFiles/nd4j.dir/cuda/./nd4j_generated_NativeOps.cu.o
/usr/bin/cmake -E remove /libnd4j/blasbuild/cuda/blas/CMakeFiles/nd4j.dir/cuda/./nd4j_generated_NativeOps.cu.o
CMake Error at nd4j_generated_NativeOps.cu.o.cmake:257 (message):
Error generating file
/libnd4j/blasbuild/cuda/blas/CMakeFiles/nd4j.dir/cuda/./nd4j_generated_NativeOps.cu.o

blas/CMakeFiles/nd4j.dir/build.make:66: recipe for target 'blas/CMakeFiles/nd4j.dir/cuda/nd4j_generated_NativeOps.cu.o' failed
make[2]: [blas/CMakeFiles/nd4j.dir/cuda/nd4j_generated_NativeOps.cu.o] Error 1
make[2]: Leaving directory '/libnd4j/blasbuild/cuda'
CMakeFiles/Makefile2:108: recipe for target 'blas/CMakeFiles/nd4j.dir/all' failed
make[1]:
[blas/CMakeFiles/nd4j.dir/all] Error 2
make[1]: Leaving directory '/libnd4j/blasbuild/cuda'
Makefile:78: recipe for target 'all' failed
make: * [all] Error 2

Adam Gibson
@agibsonccc
May 02 2016 03:46
Mind giving the full run log in a gist?
Something is crazy off with your env
Romeo Kienzler
@romeokienzler
May 02 2016 03:48
nop, one sec
mhotrx
@mhotrx
May 02 2016 06:37
I have two questions on importing time series via CSV: 1) Are the rows in chronological order. i.e. is row 1 of the csv file the earliest ts observation and the last row of the csv the last (most recent observation)? 2) If I have 243 time series of one feature in my feature set does that mean I should use 243 separate csv files, each of one column? Thanks
Alex Black
@AlexDBlack
May 02 2016 06:38
1) yes, 2) if you are using CSVSequenceRecordReader, that's the format it assumes, so yes
mhotrx
@mhotrx
May 02 2016 06:40
thank you. and if I have five time series features, then would the CSV file have five columns?
Alex Black
@AlexDBlack
May 02 2016 06:41
yep. number of columns equal to number of features, number of rows equal to number of time steps
mhotrx
@mhotrx
May 02 2016 06:42
awesome. Now I can get cracking! BTW Your website is very informative.
Adam Gibson
@agibsonccc
May 02 2016 06:43
@mhotrx from the sounds of it we need to add some more info on how to setup a time series problem though
Mind filing an issue?
Detailing what you found missing would help ALOT
mhotrx
@mhotrx
May 02 2016 06:44
Of course. Doing it now. I'll try to write so you can copy text into the section,
Adam Gibson
@agibsonccc
May 02 2016 06:44
thank you sir
very helpful
mhotrx
@mhotrx
May 02 2016 06:49
You're welcome. thank you for the fast response
mhotrx
@mhotrx
May 02 2016 07:04
Just finishing the issue: Is this correct: " For example if you have five features in time series, each with 120 observations, and a training & test set of size 53 then there will be 106 input csv files(53 input, 53 labels). The 53 input csv files will each have five columns and 120 rows. The label csv files will have one column (the label) and one row. "
Alex Black
@AlexDBlack
May 02 2016 07:09
yeah, sounds correct, specifically for sequence classification (i.e., one to many, with class label at the last time step)
@mhotrx thanks
Paul Dubs
@treo
May 02 2016 07:17
@AlexDBlack I'm currently not using CNN's so I'm not naturally running into issues with them
Adam Gibson
@agibsonccc
May 02 2016 07:18
@treo could you maybe just run the Lenet example?
Paul Dubs
@treo
May 02 2016 07:18
@romeokienzler how much free memory do you have when you are building? And are you building from within a docker container? If so, what's the docker file?
Adam Gibson
@agibsonccc
May 02 2016 07:18
It'd be awesome to give that the same treatment as you gave the other stuf
stuff*
since you're poking around with raver
Paul Dubs
@treo
May 02 2016 07:19
sure, if the lenet example is representative, I can give it a go
Adam Gibson
@agibsonccc
May 02 2016 07:19
it's the main one we're profiling for now yes
It's more just: "Do cnns actually run?"
We mainly just want the low hanging fruit right now
They're representative "enough"
Paul Dubs
@treo
May 02 2016 07:33
ok, filled an issue on it :D Most of the time is spent on .assign calls
Adam Gibson
@agibsonccc
May 02 2016 07:34
Awesome thanks!
When you run cuda stuff with raver poke around with those benchmarks as well
kinda like what you've been doing with syntheticrnn
interesting
so lots of duplications in the tensor mmul reshape
wth?
Alex Black
@AlexDBlack
May 02 2016 07:37
@treo thanks
Adam Gibson
@agibsonccc
May 02 2016 07:37
it's weird that that still happens
Alex Black
@AlexDBlack
May 02 2016 07:37
I recall there was an unnecessary assign on reshape happening previously with tensor mmul
thought you fixed that :/
Adam Gibson
@agibsonccc
May 02 2016 07:38
I did too?
there WERE dups in there
Andreas Eberle
@andreas-eberle
May 02 2016 07:40
hey guys, Is the cuda build still broken? I thought I read that it is working again...
Paul Dubs
@treo
May 02 2016 07:40
@andreas-eberle should be working
Adam Gibson
@agibsonccc
May 02 2016 07:42
So I'm wondering if we need 'c' in there now for the reshapes?
@AlexDBlack
What were those in there for do you remember?
My gut instinct is those SHOULDN'T need to be there
Andreas Eberle
@andreas-eberle
May 02 2016 07:43
Alex Black
@AlexDBlack
May 02 2016 07:43
yeah, that might be what's causing it
Adam Gibson
@agibsonccc
May 02 2016 07:43
GemmParams can handle all the cases now
@treo could you try building it on yours?
Andreas Eberle
@andreas-eberle
May 02 2016 07:43
I just pulled the master and followed the windows.md... (called vcvars64.bat)
Adam Gibson
@agibsonccc
May 02 2016 07:44
You're the windows guru
@andreas-eberle that windows.md is fairly complete..
meh..
Andreas Eberle
@andreas-eberle
May 02 2016 07:44
Furthermore, the log is really long... with tons of warnings...
Paul Dubs
@treo
May 02 2016 07:44
@andreas-eberle are you are right, I forgot to push the fix
Adam Gibson
@agibsonccc
May 02 2016 07:44
oh thank god
lol
Andreas Eberle
@andreas-eberle
May 02 2016 07:45
tell me when I can pull it
Adam Gibson
@agibsonccc
May 02 2016 07:45
I was going to say we aren't using the cuda heavily yet asside from benchmarking
raver and I are both on linux
Paul Dubs
@treo
May 02 2016 07:45
now :)
@agibsonccc you keep reintroducing the same build error on windows :P
Adam Gibson
@agibsonccc
May 02 2016 07:46
Which one?
I'm not the one modifying the kernels though ;)
Paul Dubs
@treo
May 02 2016 07:46
error: expression must have a constant value
all of the last windows build errors were that one :)
Andreas Eberle
@andreas-eberle
May 02 2016 07:47
:D at least it's easier to find than a all new error
Adam Gibson
@agibsonccc
May 02 2016 07:47
wait..
we HAVE to use heap allocation on windows?
uh..huh
@treo do you know what's going on there?
or does T[variableLengthVariable] get compiled down to that with linux?
Why would that not throw an error on linux?
Paul Dubs
@treo
May 02 2016 07:48
@raver119 can probably tell you more, but that is only on cuda, for cpu it works like it was
Adam Gibson
@agibsonccc
May 02 2016 07:49
Weird
@treo I'm curious - in your benchmarking have you ever ran in to concat?
Andreas Eberle
@andreas-eberle
May 02 2016 07:50
@treo: thanks, that fixed it. Still, there are tons of warnings... but I guess you already know that...
Adam Gibson
@agibsonccc
May 02 2016 07:50
Nd4j.concat I mean
@andreas-eberle unless you're going to help us fix them don't worry about it right now :P
Andreas Eberle
@andreas-eberle
May 02 2016 07:51
ok, I would like to help you, but my C is tooo rusty...
Adam Gibson
@agibsonccc
May 02 2016 07:51
A lot of it is actually nvidia's code
string literals
Andreas Eberle
@andreas-eberle
May 02 2016 07:51
k
Adam Gibson
@agibsonccc
May 02 2016 07:52
Mainly just need to do some cleanup ;/
Paul Dubs
@treo
May 02 2016 07:52
don't remember running into concat
https://gist.github.com/treo/e447c8ab881c0dfa82415477f81dd90d I've updated my microbenchmark results
Adam Gibson
@agibsonccc
May 02 2016 07:54
oh thanks
Romeo Kienzler
@romeokienzler
May 02 2016 07:56
@treo It works now (I think), nothing changed, only laptop and me slept for 5h ...weired, can you please have a look and confirm that it worked? https://gist.github.com/e53a8192e0f84e45cebdc4df64940653
@agibsonccc
Andreas Eberle
@andreas-eberle
May 02 2016 07:56
is the GPU memory used by libnd4j still limitted to a certain value?
Paul Dubs
@treo
May 02 2016 07:57
@romeokienzler the killed in your log looks pretty much like the OOM killer got to it, so I guess something also freed up some ram
Romeo Kienzler
@romeokienzler
May 02 2016 07:58
@treo cool, so this means I have a working documentation now for Ubuntu 15.10 native + docker, will update the documentation in git
@treo but this one still intimidates me...
-- Found OpenMP: -fopenmp
-- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY)
Paul Dubs
@treo
May 02 2016 07:59
That pretty much looks like you are missing environment variables
Romeo Kienzler
@romeokienzler
May 02 2016 08:00
@treo ok, will chech, thanks a lot!
check
Adam Gibson
@agibsonccc
May 02 2016 08:02
@romeokienzler I would suggest taking a look at the cmake lists.txt under the root cmake directory
There Is A findcuda.cmake THRTE
There
That will have the env variables you need
Paul Dubs
@treo
May 02 2016 08:13
@agibsonccc the new micro benchmark results show that CPU has gotten slower over the previous rc3.9 benchmarks
Adam Gibson
@agibsonccc
May 02 2016 08:14
Not sure what to tell you
I'll find the problems later
It actually works
That's my first priority
I'm working on concatenation first
I'm not ready to do a cpu perf pass yet
Paul Dubs
@treo
May 02 2016 08:15
:) I know, just telling my observations, also that RC3.8 is holding up pretty well
Adam Gibson
@agibsonccc
May 02 2016 08:15
That's why I told you to just give me a burn sown list later
One thing I hate hearing about is "in general"
What exactly?
Even a vague summary is useful info
Like blas ops?
Transform
Reductions?
Paul Dubs
@treo
May 02 2016 08:20
@raver119 I also added the cuda results to https://gist.github.com/treo/e447c8ab881c0dfa82415477f81dd90d - it is now within 2x of current CPU for medium arrays (2^16 elements), and within 1.25x for large arrays (2^28 elements). The small sizes are as expected
Adam Gibson
@agibsonccc
May 02 2016 08:20
Is that with sync on?
Paul Dubs
@treo
May 02 2016 08:20
oh right!
Adam Gibson
@agibsonccc
May 02 2016 08:20
Yeah..
Going to say
Alex Black
@AlexDBlack
May 02 2016 08:21
@treo also: any chance of getting that info in a shared google doc (spreadsheet)? might make it easier to review/compare
Adam Gibson
@agibsonccc
May 02 2016 08:22
Yeah there we go
Paul Dubs
@treo
May 02 2016 08:22
yeah, will have to take a look how to export that into csv or something
Andreas Eberle
@andreas-eberle
May 02 2016 08:32
@treo: @raver119: With the current master, I don't have to call GLProfile.initSingleton(); any more. Cuda works without any JOGL hack
Did you change something regarding this or was it just a nice sideeffect of a change?
Paul Dubs
@treo
May 02 2016 08:37
probably just a side effect, or maybe you did get a driver update? :D
@AlexDBlack I'll recollect the data in the evening, with some longer runs
Adam Gibson
@agibsonccc
May 02 2016 08:51
@andreas-eberle are you using jogl as well?
Andreas Eberle
@andreas-eberle
May 02 2016 08:57
No, I didn't use jogl, but when I tried to use CUDA, Optimus didn't give my java application access to the GPU. A dirty fix for that was to use GLProfile.initSingleton()
that caused Optimus to give my Java application access to the real GPU
However, now I don't need that dirty fix any more.
:+1:
Adam Gibson
@agibsonccc
May 02 2016 08:58
Not sure why that was ever a problem o_0
Alex Black
@AlexDBlack
May 02 2016 09:00
@treo thanks. having columns with this data side by side would be great
Andreas Eberle
@andreas-eberle
May 02 2016 09:01
@treo: Driver version is still the same (just checked)
Andreas Eberle
@andreas-eberle
May 02 2016 09:17
@treo @raver119: Too bad, the cuda stuff is only working in examples project... not my own... I will compare the other dependencies and test if they somehow enable cuda...
Paul Dubs
@treo
May 02 2016 09:18
@agibsonccc raver and me also couldn't find any reason why that should ever be a problem, raver even tried to find out using a remote session, in the end only the jogl init helped
Adam Gibson
@agibsonccc
May 02 2016 09:22
Shouldn't that be logged somewhere?
Paul Dubs
@treo
May 02 2016 09:32
it is
Valerio Zamboni
@vzamboni
May 02 2016 09:34
I'm still getting NullPointerException on (SparkDl4jMultiLayer.java:453) : aggregator.getUpdater(); when doing sparkModel.fitDataSet(sparkDataTrain); I just compiled the latest master version but no way to get rid of this. Any ideas?
Alex Black
@AlexDBlack
May 02 2016 09:35
@vzamboni there was an issue a long time ago with that, but I thought it was pretty solid now
open an issue with something I can run to reproduce that
Alex Black
@AlexDBlack
May 02 2016 11:03
just pushed up some more LSTM optimizations, stuff I missed in the last pass. That gives us another 30% or so reduction
Andreas Eberle
@andreas-eberle
May 02 2016 15:02
@treo: Is it bad to get a lot of NaN scores during the first epoch(s)? (I didn't get further yet, so I can't tell if it will be the same in the second epoch)
When I ran the LenetMnistExample with nd4j-native, it did not show any NaNs... with cuda it only gets NaNs after iteration 53
Paul Dubs
@treo
May 02 2016 15:36
NaN Scores are never good, and with the same random seed you should see about the same scores
Andreas Eberle
@andreas-eberle
May 02 2016 15:37
I get different scores between cpu and cuda with the same seed and the same input data order.
Paul Dubs
@treo
May 02 2016 15:40
Then it may be necessary to run some gradient checks on cuda
Andreas Eberle
@andreas-eberle
May 02 2016 15:42
what should I do?
Paul Dubs
@treo
May 02 2016 15:45
Don't really know how to run gradient checks