These are chat archives for deeplearning4j/deeplearning4j/earlyadopters

13th
May 2016
Justin Long
@crockpotveggies
May 13 2016 00:24
@Habitats finally got the cluster working
had to use http://weave.works/
I can see Spark jobs being deployed across the cluster
Samuel Audet
@saudet
May 13 2016 00:26
@crockpotveggies @agibsonccc About dynamic linking with MKL, we need to figure out how to make libnd4j link with libblas.so.3 on Linux and libblas3.dll on Windows. Then we can use the same trick as netlib-java, that is rename MKL's library to that name and put it somewhere in java.library.path, and JavaCPP will pick it up before the bundled OpenBLAS libraries. But right now libnd4j links hard with libopenblas.so.0 and libopenblas.dll...
Justin Long
@crockpotveggies
May 13 2016 00:27
when you say it links hard, does that mean I need to change my cluster configuration? @saudet
right now I'm building a fatjar for linux locally, then deploying via spark-submit
Patrick Skjennum
@Habitats
May 13 2016 00:28
@crockpotveggies awesome! how's the performance?
Samuel Audet
@saudet
May 13 2016 00:28
No, I mean it just links with libopenblas.so.0 instead of libblas.so.3. Actually, if we renamed MKL's library to libopenblas.so.0 it might work
Justin Long
@crockpotveggies
May 13 2016 00:28
ah I see
@Habitats looks like YARN isn't distributing the tasks across the entire cluster
@Habitats so I'm going to see if I can change that in the yarn-site.xml conf
anyone know any Hadoop experts?
Patrick Skjennum
@Habitats
May 13 2016 00:30
type$afe does
Justin Long
@crockpotveggies
May 13 2016 00:31
actually a buddy of mine I'm having lunch with tomorrow works at Lightbend (previously Typesafe)
though he is more versed in Akka than Hadoop
Patrick Skjennum
@Habitats
May 13 2016 00:31
:P
well good luck with that
i'm trying to understand how to tune the net.fitdataset(rdd, examples, partitions) method
some configs absolutely mutilates my ram
graph is kind of funny though
blob
Justin Long
@crockpotveggies
May 13 2016 00:38
Screen Shot 2016-05-12 at 5.38.33 PM.png
if you want to trade and figure out why only 3/5 nodes are being utilized I'll do it gladly
actually I set the number of partitions to 100000 and it finished very quickly
hmmm
Justin Long
@crockpotveggies
May 13 2016 00:46

figured it out:

<configuration>

  <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.1</value>
    <description>
      Maximum percent of resources in the cluster which can be used to run 
      application masters i.e. controls number of concurrent running
      applications.
    </description>
  </property>

</configuration>

that 0.1 needs to be 1

capacity-scheduler.xml
Patrick Skjennum
@Habitats
May 13 2016 00:47
aha!"
100000 partitions!?
what kind of cluster is this lol
Justin Long
@crockpotveggies
May 13 2016 00:48
hahaha
just SparkPi example so I think it's OK?
probably all consist of nothingness
Patrick Skjennum
@Habitats
May 13 2016 00:49
wait i'm confused
Adam Gibson
@agibsonccc
May 13 2016 00:49
@crockpotveggies you have your executors setup with libnd4j?
Patrick Skjennum
@Habitats
May 13 2016 00:49
i'm talking about spark training
Justin Long
@crockpotveggies
May 13 2016 00:49
I'm using the SparkPi (calculates Pi) example to tune the cluster
@agibsonccc I haven't installed libnd4j on any nodes
Patrick Skjennum
@Habitats
May 13 2016 00:50
you don't have to afaik
as long as openblas is installed just passing the fat jar around should work
Adam Gibson
@agibsonccc
May 13 2016 00:50
Shouldn't need to afaik
It'd be openblas
Justin Long
@crockpotveggies
May 13 2016 00:51
yea openblas is installed on all nodes
I just got the open source license for MKL so will redeploy all my docker containers once I'm convinced the parameters are tuned
Adam Gibson
@agibsonccc
May 13 2016 00:51
Interesting
Justin Long
@crockpotveggies
May 13 2016 00:52
I might throw ImageNet at this thing before I start processing my own stuff...I'll have more baseline comparisons
there's some strange issues with YARN, for some reason it doesn't discover all nodes in the cluster
Adam Gibson
@agibsonccc
May 13 2016 00:56
We also have imagenet in an s3 bucket
Can't redistro though
:/
I feel like you are re replicating stuff we have. I appreciate the help this is great
I feel bad we can't help in some way
Justin Long
@crockpotveggies
May 13 2016 00:57
did you use a virtual network like weave?
Adam Gibson
@agibsonccc
May 13 2016 00:58
@nyghtowl did it
Justin Long
@crockpotveggies
May 13 2016 00:58
I found it fixed all my strange port problems. made it easier. although you mentioned kubernetes so I think that uses weave or a similar virtual net
Adam Gibson
@agibsonccc
May 13 2016 00:58
I think we used a normal external
Oh for ports?
Justin Long
@crockpotveggies
May 13 2016 00:58
yea I had a bunch of issues where YARN was trying to contact nodes on ephemeral ports
Adam Gibson
@agibsonccc
May 13 2016 00:58
Ah
Justin Long
@crockpotveggies
May 13 2016 00:59
but because I was trying to publish those ports using -p it sent Docker into a thread lock (more than 20,000 ports)
Adam Gibson
@agibsonccc
May 13 2016 00:59
We'd love a write up of some of the problems you faces
Faced
Justin Long
@crockpotveggies
May 13 2016 00:59
so instead I just used EXPOSE 35000-49000 and put it in a virtual network
Adam Gibson
@agibsonccc
May 13 2016 00:59
I wasn't sure how docker interfaced with yarn
Justin Long
@crockpotveggies
May 13 2016 00:59
yea I've endured a ton of pain
happy to share
Adam Gibson
@agibsonccc
May 13 2016 00:59
:smile:
Justin Long
@crockpotveggies
May 13 2016 01:00
also why I created the docker image https://hub.docker.com/r/bernieai/docker-spark/
Adam Gibson
@agibsonccc
May 13 2016 01:00
Right
Justin Long
@crockpotveggies
May 13 2016 01:00
so it looks like this cluster will be able to handle itself no problem, I threw 1,000,000 slices at it this time and so far: 382256/1000000
performance aint too shabby
and that's on just a few nodes
Patrick Skjennum
@Habitats
May 13 2016 01:03
wait @crockpotveggies you made tinderbox? haha
Justin Long
@crockpotveggies
May 13 2016 01:03
hahaha yea that was me :santa:
Patrick Skjennum
@Habitats
May 13 2016 01:03
dear god
so you're the one responsible for all the tinderspam:D
Justin Long
@crockpotveggies
May 13 2016 01:04
HA
actually I did catch an Italian company use tinderbox for spammy reasons
I tried to shut it down but it looks like they went and evolved and developed something smarter
Patrick Skjennum
@Habitats
May 13 2016 01:05
yeah i didn't even know it was a thing until today, when some "hot model" msg me and tried to get me to do weird shit
Justin Long
@crockpotveggies
May 13 2016 01:06
we're developing filters for that in Bernie http://www.bernie.ai/
Patrick Skjennum
@Habitats
May 13 2016 01:07
quick google search brought up tech crunch
Adam Gibson
@agibsonccc
May 13 2016 01:07
Lol
Patrick Skjennum
@Habitats
May 13 2016 01:07
haha this is awesome
bet this was exactly what adam had in mind when making dl4j;p tinderbots.
the future is now
Justin Long
@crockpotveggies
May 13 2016 01:12
in all honesty Bernie found my current girlfriend so thus far the future is okay for me
Adam Gibson
@agibsonccc
May 13 2016 01:12
Lol
Patrick Skjennum
@Habitats
May 13 2016 01:13
that's awesome:D here's to me hopin' for the same
Justin Long
@crockpotveggies
May 13 2016 01:13
the end goal is to make him deeply understand photos (what does a selfie mean? narcissistic? extroverted?) so that the people he finds fit your personality
Adam Gibson
@agibsonccc
May 13 2016 01:13
Huh
Patrick Skjennum
@Habitats
May 13 2016 01:14
bernie or adam?:P
but yeah, that's a brilliant idea though. research on this kind of stuff.
Adam Gibson
@agibsonccc
May 13 2016 01:18
I'm an ai
I'm actually just a sarcastic chatbot that pushes code
Cv would be too hard
Patrick Skjennum
@Habitats
May 13 2016 01:19
that wouldn't even surprise me at this point. you whole bunch are crazy:D
Adam Gibson
@agibsonccc
May 13 2016 01:19
Lol
Patrick Skjennum
@Habitats
May 13 2016 01:19
@crockpotveggies bernie sounds ridicously creepy hits signup
Adam Gibson
@agibsonccc
May 13 2016 01:20
Yeah
Justin Long
@crockpotveggies
May 13 2016 01:20
ha he does have a way of pulling you in :fire:
Patrick Skjennum
@Habitats
May 13 2016 01:21
literally reducing your future to regression
but hey, how's that really different from any other future anyway
Adam Gibson
@agibsonccc
May 13 2016 01:21
Invent the inevitable
You profit rather than watch
Patrick Skjennum
@Habitats
May 13 2016 01:22
yeah, musk may buil as many hyperloops he wants; this is where the real money's at
this is going to make great stories for my grand kids
Justin Long
@crockpotveggies
May 13 2016 01:24
personally I was hoping to spend this generation exploring the stars but I think we're going to have to settle for A.I.
I actually wanted to eventually start using either deep learning or genetic algorithms to explore resolving General Relativity with Quantum Mechanics
Adam Gibson
@agibsonccc
May 13 2016 01:28
Yeah
Justin Long
@crockpotveggies
May 13 2016 01:28
There's a lot of interesting stuff such as negative energy (which now has data available in QM) that I think will start providing enough data to resolve via programmatic methods
the other really cool thing is that gravitational wave production on a tabletop is now theoretically possible, so you could set up experiments that could generate interesting data that's usable for it
I was trying to finish a paper on the topic but Bernie became priority, it's still a draft https://www.authorea.com/users/58765/articles/77812/_show_article
Patrick Skjennum
@Habitats
May 13 2016 01:36
this is so over my head i don't even know where to begin
Justin Long
@crockpotveggies
May 13 2016 01:38
yea sorry I kind of verbal diarrhea'd
are you aware of Einstein's General Relativity and the other physics theory Quantum Mechanics?
Patrick Skjennum
@Habitats
May 13 2016 01:39
i know nothing of quantum mechanics beyond the grand design
Justin Long
@crockpotveggies
May 13 2016 01:40
ha fair enough
Patrick Skjennum
@Habitats
May 13 2016 01:41
yeah, i'm by no means a physicist:p
Justin Long
@crockpotveggies
May 13 2016 01:42
long story short, two big physics theories which explain so much of how the "everything" operates, but they can't be resolved. there's a huge effort to unify them together but physicists are struggling with it
it's mostly because the idea of gravity in Einstein's doesn't make sense in Quantum and vice-versa
Patrick Skjennum
@Habitats
May 13 2016 01:43
ah yeah you meant that
yeah i'm aware of that issue, however nothing in depth
Justin Long
@crockpotveggies
May 13 2016 01:45
yea the idea of my paper was to set a precedent of gravitational wave experiments that are relatively cheap, so the research data becomes accessible
because once you resolve the two theories you very easily open up advanced propulsion AKA warp drive type stuff
Patrick Skjennum
@Habitats
May 13 2016 01:46
oh
Justin Long
@crockpotveggies
May 13 2016 01:47
in other news this spark cluster is nearly done processing 1,000,000 partitions which is exciting
Patrick Skjennum
@Habitats
May 13 2016 01:49
yeah, this does sound pretty neat i agree!
know what else is neat, i just got spark training running on google:D!
(until i crashed at least)
Justin Long
@crockpotveggies
May 13 2016 01:51
NICE
you know what crashed it?
Patrick Skjennum
@Habitats
May 13 2016 01:51
yeah, my horrible code
i'm trying to save to local file system with a spark worker
go figure:P
i've just been using spark standalone for so long that i kind of disregarded the whole clustered environment part
Justin Long
@crockpotveggies
May 13 2016 01:55
you don't have something like HDFS on it?
Patrick Skjennum
@Habitats
May 13 2016 01:55
sure i do' but i havent' used it up until now
also my dataset is like 40gb so it's not superfast to load from hdfs
Justin Long
@crockpotveggies
May 13 2016 01:56
same here, I realized once I went into cluster mode I'd have to use HDFS
Patrick Skjennum
@Habitats
May 13 2016 01:56
(huge by my standards:P)
i need to pull that stuff into ram before my training can even begin
but yeah, i need to figure out how to store and log stuff
Justin Long
@crockpotveggies
May 13 2016 01:57
are you just serializing a model somewhere?
Patrick Skjennum
@Habitats
May 13 2016 01:58
well atm i'm just storing the dl4j models
with standard java.io
Justin Long
@crockpotveggies
May 13 2016 01:58
regarding logs, if you use YARN it handles that for you
I'll show you....
Screen Shot 2016-05-12 at 6.59.03 PM.png
Patrick Skjennum
@Habitats
May 13 2016 02:00
yeah i have no direct interaction with yarn or zookeeper or anything
all that's behind the scenes on gcloud
maybe i'll just write up a single REST endpoint for my stupid logging and saving:P
would be nice to use the same code on both my standalone and cluster
Justin Long
@crockpotveggies
May 13 2016 02:05
that could work well, just thinking you may not have to re-invent the wheel on this one
Patrick Skjennum
@Habitats
May 13 2016 02:06
ah wait it was working all along, i was just being a moron
also it's 4am maybe that has something to do with it
SPARK+DL4J=WORKING
god damn didn't expect this
haha
@crockpotveggies yeah but i feel like re-inventing the wheel is a lot easier than actually understanding all of these crazy interactions:p
Justin Long
@crockpotveggies
May 13 2016 02:07
:thumbsup:
so true
Patrick Skjennum
@Habitats
May 13 2016 02:07
seeing i have to deliver my thesis in a month and just want things to work
Justin Long
@crockpotveggies
May 13 2016 02:07
I love having "my own" interface
mostly because it always has what I need and I can only blame myself
on that note Weaver is a hog on this spark cluster
Patrick Skjennum
@Habitats
May 13 2016 02:08
yeah, what you need is key here
logging is working and everything:D
the stars aligned
Justin Long
@crockpotveggies
May 13 2016 02:10
tomorrow I'll be throwing neural nets on this cluster
Patrick Skjennum
@Habitats
May 13 2016 02:10
sounds good!
Justin Long
@crockpotveggies
May 13 2016 02:10
looking forward to (finally) getting it going at full speed
alright done for the day, chat later :)
Patrick Skjennum
@Habitats
May 13 2016 02:10
same, later dude:>
Justin Long
@crockpotveggies
May 13 2016 02:53
Please.s.
P.s. Heard from a spark guy apparently we have to set as few executors as possible to maximize resources
Valerio Zamboni
@vzamboni
May 13 2016 07:56
Hello, I am trying to have javacv-1.1 as maven dependency in the same project as nd4j 0.4-rc3.9-SNAPSHOT but I can see that it already depends on javacpp-1.2-SNAPSHOT so I have incompatibility problems in using javacv-1.1 . Do you think there's a way to downgrade to javacpp-1.1? I tried and I got some incompatibility problems in nd4j. Is the only way to compile by myself javacv-1.2-SNAPSHOT (which depends on javacpp-1.2-SNAPSHOT)?
Samuel Audet
@saudet
May 13 2016 07:58
@vzamboni Yes, you'll have to built it, but we should be making a release next week, so you could also wait a little bit :)
Valerio Zamboni
@vzamboni
May 13 2016 07:59
Oh that's great! :)
Paul Dubs
@treo
May 13 2016 08:51
http://www.overclock3d.net/articles/gpu_displays/gtx_1080_3dmark_performance_leaked/1 that isn't compute performance, but looks pretty interesting never the less
raver119
@raver119
May 13 2016 09:36
hm
that’s clear x2 boost for 970 -> 1080 shift
even for overclocked gpu that’s awesome
Paul Dubs
@treo
May 13 2016 09:39
How's the performance difference between the 970 under linux and on windows?
raver119
@raver119
May 13 2016 09:39
no idea
:)
btw
right now it works for me, without TDR disable
d
Paul Dubs
@treo
May 13 2016 09:40
oh, that's great!
raver119
@raver119
May 13 2016 09:40
yep
Paul Dubs
@treo
May 13 2016 09:40
less people asking why their driver is crashing :D
raver119
@raver119
May 13 2016 09:40
but damn
i cant decrease memory use
whatever i do - memory footprint stays the same
ridiculous
Paul Dubs
@treo
May 13 2016 09:41
that seems odd...
raver119
@raver119
May 13 2016 09:41
odd??
Paul Dubs
@treo
May 13 2016 09:41
weird :)
raver119
@raver119
May 13 2016 09:42
i’ve compressed test kernel twice!
reduce kernel got split
and still 54 registers, 896 bytes of shared
but i personally use 7 (seven) static variables in shared memory
if i remove 3 of them
Paul Dubs
@treo
May 13 2016 09:42
inefficient compilation?
raver119
@raver119
May 13 2016 09:42
guess what changes?
nothing!
Paul Dubs
@treo
May 13 2016 09:42
nothing
raver119
@raver119
May 13 2016 09:43
still 54 registers, and 896 bytes static shared memory
Paul Dubs
@treo
May 13 2016 09:44
maybe a slowdown loop introduced by nvidia? :P
raver119
@raver119
May 13 2016 09:44
whoever introduces that
something should use that shared memory
and 896 bytes is something i can’t afford
cmon, that’s almost 1 kb
Patrick Skjennum
@Habitats
May 13 2016 09:46
@crockpotveggies define "few"?
btw anyone has experience with spark training? i'm getting extremely bad results, however it's working
Paul Dubs
@treo
May 13 2016 09:47
I'm going afk for a bit
raver119
@raver119
May 13 2016 09:47
@Habitats increase LR.
Patrick Skjennum
@Habitats
May 13 2016 09:48
yeah?
i've used 0.05 with my normal net
tried 0,005, 0.05, 0.5, 1 with spark
0.5 and 1 couldn't learn anything and just assumed everything was false
raver119
@raver119
May 13 2016 09:49
try 0.1
Patrick Skjennum
@Habitats
May 13 2016 09:50
what about minibatch and the numexamples thnigy?
i'm using minibatsize * numexecutors atm
saw that somewhere in the examples
raver119
@raver119
May 13 2016 09:51
ask that in tuninghelp better. i know that LR should be higher in spark env
maybe there’s something else
Patrick Skjennum
@Habitats
May 13 2016 09:52
yeah i wasn't sure if the spark i'm using applies to older ver
alright
Adam Gibson
@agibsonccc
May 13 2016 10:17
@Habitats yes parameter averaging is the same no matter what :D
Paul Dubs
@treo
May 13 2016 12:08
@raver119 have you found the culprit for the memory usage?
raver119
@raver119
May 13 2016 12:09
check my reports in my cuda issue
spoiler: everything is bad, and everyone will die
like in GoT
Adam Gibson
@agibsonccc
May 13 2016 12:10
Hahaha
Paul Dubs
@treo
May 13 2016 12:10
in between that there is some good story telling :P
Adam Gibson
@agibsonccc
May 13 2016 12:10
How do you even begin to diagnose that?
Paul Dubs
@treo
May 13 2016 12:11
probably by Rubber-hose cryptanalysis with a nvidia engineer
raver119
@raver119
May 13 2016 12:14
main problem is time right now
i don't see any chance to rewrite multiple required things in next 2 days
Paul Dubs
@treo
May 13 2016 12:15
what would need to be rewritten?
raver119
@raver119
May 13 2016 12:16
compare those two ptxas outputs
bounded and unbounded
answer is diff between them
ptxas info : Function properties for _ZN5shape3TAD7tad2SubEiPv
80 bytes stack frame, 108 bytes spill stores, 244 bytes spill loads
Paul Dubs
@treo
May 13 2016 12:17
unbounded has less in spill stores
raver119
@raver119
May 13 2016 12:17
yes
because compiler uses registers
to cache intermediate stuff
but more registers used = less parallel threads running on the same sm
Paul Dubs
@treo
May 13 2016 12:18
oh, I see
raver119
@raver119
May 13 2016 12:18
there's nothing uncommon in this situation
everyone out there passes through it
first you write code
and only when it works - you optimize
and optimization goes iteratively
so, right now i'm at the point, where best result will be retrieved from general code optimization
i'm looking on reduceFloat1D, and i don't see anything i really can improve right now
it's primitive like a stone lol
Paul Dubs
@treo
May 13 2016 12:21
it's not pushed yet? Want to see the stone :D
raver119
@raver119
May 13 2016 12:21
yea, i don't want to commit it because i'm not sure it works better then previous commit
at previous commit i was able to get 1.3s
and that kinda surprises me
there's the stone
reduce kernel was split into 4 parts
dedicated reduceScalar, dedicated reduce along 1 dimension, along multiple dimensions below 6, and all other along multiple dimensions
however 6 number will be probably reduced to 3
Adam Gibson
@agibsonccc
May 13 2016 12:25
So where do you think this is going to end?
Faster than mkl overall?
raver119
@raver119
May 13 2016 12:25
it's already faster
Adam Gibson
@agibsonccc
May 13 2016 12:25
or more work ahead?
raver119
@raver119
May 13 2016 12:25
15-17%
at least for rnn i'm testing with
Adam Gibson
@agibsonccc
May 13 2016 12:25
is every part of this faster?
raver119
@raver119
May 13 2016 12:26
what do you mean by "part"? i'm just measuring time for full rnn iteration
Adam Gibson
@agibsonccc
May 13 2016 12:26
each of the ops?
I guess is the way to put this?
raver119
@raver119
May 13 2016 12:26
no, i'm measuring real time required for nn training
Adam Gibson
@agibsonccc
May 13 2016 12:26
ah
raver119
@raver119
May 13 2016 12:27
so, cuda backend is already faster then mkl
but i'm still NOT happy with performance
Adam Gibson
@agibsonccc
May 13 2016 12:27
right
well I don't think you will ever be happy :P
raver119
@raver119
May 13 2016 12:27
i will
Adam Gibson
@agibsonccc
May 13 2016 12:27
keep going I"m not complaining
raver119
@raver119
May 13 2016 12:27
i even know how i'll get there
Adam Gibson
@agibsonccc
May 13 2016 12:27
:D
raver119
@raver119
May 13 2016 12:28
its all there, in ptxas output
if i need to rewrite some code to reduce spills - that's not a big deal. only a matter of time
also, alex is still working on cnn improvements, so i hadn't yet checked cnn perf there
but with proper code on java side - cnn should show better speed margins
obviously, UnifiedSharedMemory suck, it was like ad-hoc fix... time to make it inlined
i'm 99% sure some spills are coming out from ABI
raver119
@raver119
May 13 2016 12:39
tbh, i know that being perfectionist is bad. but i can't imagine how to compete against other libs in speed without being perfectionist :)
in cuda every single byte matters
Alex Black
@AlexDBlack
May 13 2016 12:40
right. also keep in mind we'll be doing releases a lot more frequently after this next one
raver119
@raver119
May 13 2016 12:40
hope so :)
3.8 -> 3.9 was a looooong road
Alex Black
@AlexDBlack
May 13 2016 12:40
yeah, way longer than anyone expected, but that's planning fallacy for you :)
raver119
@raver119
May 13 2016 12:41
well, if you wanna make god laugh - tell him about your plans
100% guarantee
Justin Long
@crockpotveggies
May 13 2016 14:31
Regarding spark optimization I posted this question here and got this response, though originally I think the responder thought I was running a production system (not a training system) @Habitats http://stackoverflow.com/questions/37199791/force-yarn-to-deploy-spark-tasks-across-all-slaves/37199950?noredirect=1#comment61932475_37199950
@raver119 a long road but worthy one. I picked up on 3.8 back in October (?) and immediately saw the value
Adam Gibson
@agibsonccc
May 13 2016 14:33
:D
Patrick Skjennum
@Habitats
May 13 2016 15:01
@crockpotveggies yeah it makes sense though
@crockpotveggies after spending the entire day yelling at spark i think i'll just stick to single node training though:| i don't have time to tune another 10 hyperparameters
Justin Long
@crockpotveggies
May 13 2016 15:04
@Habitats perhaps you can just launch a beefy EC2 instance with GPU support?
Patrick Skjennum
@Habitats
May 13 2016 15:05
i've only been given a google cloud account
also cuda isn't really worth it atm?
Justin Long
@crockpotveggies
May 13 2016 15:06
They have something of equal value?
Patrick Skjennum
@Habitats
May 13 2016 15:06
maybe on their ml platform, but that was launched after i picked my frameworks
i don't have time to do fundamental changes atm
Justin Long
@crockpotveggies
May 13 2016 15:07
I get that but instead of toasting your laptop for 45 days why not just launch a 16 core instance?
Patrick Skjennum
@Habitats
May 13 2016 15:07
yeah i have 32 cores available on google cloud
i'm benching whether those can do better than my desktop i7
testing a 16c master and 3 x4c workers atm
btw; how much experience you have with spark/rdd's in general @crockpotveggies ?
Justin Long
@crockpotveggies
May 13 2016 15:20
I was experimenting with Spark a year ago for some graph processing. This is the first time I'm fully utilizing the distributed computing part of it
Never touched YARN until now
Patrick Skjennum
@Habitats
May 13 2016 15:21
i see
how many cores do you have in your cluster?
Patrick Skjennum
@Habitats
May 13 2016 15:29
my i7 is was faster than a 16c google instance at least ><
but that's with openblas
Paul Dubs
@treo
May 13 2016 15:33
mkl scales better on more cores
Patrick Skjennum
@Habitats
May 13 2016 15:34
it's using 100% on all cores with openblas
Paul Dubs
@treo
May 13 2016 15:35
have you compiled openblas for that machine? If not mkl will be faster, if yes, mkl will probably still will be faster... cat /proc/cpuinfo just to know what you are working with
Patrick Skjennum
@Habitats
May 13 2016 15:36
google instances comes preconfigured with openbals
Paul Dubs
@treo
May 13 2016 15:37
but is it actually optimized for the machines it runs on?
Patrick Skjennum
@Habitats
May 13 2016 15:38
idno? i guess they didn't just put it there randomly as these isntances are production ready
Justin Long
@crockpotveggies
May 13 2016 15:40
Late response....I have 64 cores in my cluster
Patrick Skjennum
@Habitats
May 13 2016 15:41
alright
Justin Long
@crockpotveggies
May 13 2016 15:41
I'll be retrofitting it to 32Gb per node and MKL in about two weeks, just waiting for a shipment
Paul Dubs
@treo
May 13 2016 15:41
Me neither, but apt-get install is so close, that I wouldn't necessarily put it past them to simply install whatever is available precompiled
Patrick Skjennum
@Habitats
May 13 2016 15:41
did you install mkl?
well yeah i can't seem to figure out how to install mkl on linux
can't find the free link:S
Justin Long
@crockpotveggies
May 13 2016 15:44
No MKL yet I'll wait to retrofit
I'll fork my docker container with MKL support and just get it all done at once
Paul Dubs
@treo
May 13 2016 15:45
if you have an account already, simply login on https://registrationcenter.intel.com/en/ and download from there
Justin Long
@crockpotveggies
May 13 2016 15:45
Already got the download 😎
Paul Dubs
@treo
May 13 2016 15:45
thats for @Habitats
Patrick Skjennum
@Habitats
May 13 2016 15:48
is it trivial to install with cli?
Paul Dubs
@treo
May 13 2016 15:48
every chicken can do it if you place enough corn on the enter key
Patrick Skjennum
@Habitats
May 13 2016 15:49
neat
Paul Dubs
@treo
May 13 2016 15:51
the only thing you may need to do after that is the setup step for mkl from netlib
Patrick Skjennum
@Habitats
May 13 2016 15:52
may
Justin Long
@crockpotveggies
May 13 2016 16:07
Screen Shot 2016-05-13 at 9.07.31 AM.png
all nodes reporting in
can't wait to fire DL4J at it
just uploading my dataset to HDFS
by the way if I run Spark in cluster mode, how am I going to be able to access the histogram UI?
raver119
@raver119
May 13 2016 16:11
in rc3.9 you can setup remote ui server
or you can just use it on master node, if it allows external connections
Justin Long
@crockpotveggies
May 13 2016 16:23
what's the port number it uses? I can expose it on all nodes
it's all a private network so I'm not worried about security at this point
Justin Long
@crockpotveggies
May 13 2016 16:29
oh crap it's a random port isn't it?
any chance that's configurable?
raver119
@raver119
May 13 2016 16:49
just use your own dropwizard conf
and you'll have your own specified port
it's up-to-date?
will it read labels by directory similar to ImageRecordReader?
to be honest, it's unclear to me the best way to load the dataset in a cluster using a distributed file system
raver119
@raver119
May 13 2016 17:07
most probably that's two different tasks
task A) distributed dataset preparation and saving into hdfs
task B) distributed training
Justin Long
@crockpotveggies
May 13 2016 17:08
the dataset is already in HDFS, so task B seems a bit incomplete?
basically, does Canova address reading from HDFS file system in the exact same way ImageRecordReader does?
raver119
@raver119
May 13 2016 17:10
yea, i've heard we have something there, but i'm not sure if that's oss or not
i bet @agibsonccc knows
Justin Long
@crockpotveggies
May 13 2016 17:11
yea it seems simple enough and I think it's all there

quite basic I just need to convert this:

trainRecordReader.initialize(new FileSplit(new java.io.File("./cnn_dataset")))

into "this":

trainRecordReader.initialize(new FileSplit(new java.io.File("hdfs://master.cluster:9000/datasets/cnn_dataset")))
hmmm....
new InputStreamInputSplit(new PortableDataStream(org.apache.hadoop.mapreduce.lib.input.CombineFileSplit isplit, org.apache.hadoop.mapreduce.TaskAttemptContext context, java.lang.Integer index))
Justin Long
@crockpotveggies
May 13 2016 17:17
I'm going to file an issue and see if anyone picks up on it
Justin Long
@crockpotveggies
May 13 2016 17:52
is this a consequence of platform-specific binaries?
Could not resolve all dependencies for configuration ':runtime'.
> Could not find nd4j-native-linux-x86_64.jar (org.nd4j:nd4j-native:0.4-rc3.9-SNAPSHOT).
  Searched in the following locations:
      file:/Users/justin/.m2/repository/org/nd4j/nd4j-native/0.4-rc3.9-SNAPSHOT/nd4j-native-0.4-rc3.9-SNAPSHOT-linux-x86_64.jar
Paul Dubs
@treo
May 13 2016 17:52
right
Justin Long
@crockpotveggies
May 13 2016 17:52
hmmmm
so because I built ND4J on OS X, the linux binary is not available?
Paul Dubs
@treo
May 13 2016 17:53
right
Justin Long
@crockpotveggies
May 13 2016 17:53
I'm guessing there's no workaround for this?
Paul Dubs
@treo
May 13 2016 17:54
build on linux is the only thing you can do
Justin Long
@crockpotveggies
May 13 2016 17:54
except to move everything to a linux machine?
yep
Paul Dubs
@treo
May 13 2016 17:54
not necessarily everything
it is pretty easy to build on linux, takes about 30 minutes to setup everything, if you don't count the download times of mkl and a current gcc
Justin Long
@crockpotveggies
May 13 2016 17:57
you have a GIST of the latest build script?
just setting it up now, I think I saw one from @Habitats
Paul Dubs
@treo
May 13 2016 17:58
It just assumes that all of the dependencies are already setup
Justin Long
@crockpotveggies
May 13 2016 17:58
perfect thanks
sudo apt-get install maven gcc openblas anything else? I'll leave out MKL for now until I retrofit the cluster
Paul Dubs
@treo
May 13 2016 17:59
depending on your release you may get old versions that way
Justin Long
@crockpotveggies
May 13 2016 18:00
good point
Paul Dubs
@treo
May 13 2016 18:00
wait a second, I'll get everything out of my bash history
Paul Dubs
@treo
May 13 2016 18:08
ok, I've updated that gist
that should set up everything that you need to build
it uses the precompiled openblas though
Justin Long
@crockpotveggies
May 13 2016 18:25
perfect thanks!
I noticed you have oracle 8 in there, but 7 is okay?
Paul Dubs
@treo
May 13 2016 18:25
7 should be ok as well
Justin Long
@crockpotveggies
May 13 2016 18:26
solid thanks!
Paul Dubs
@treo
May 13 2016 18:26
I simply prefer to use the latest version :D
raver119
@raver119
May 13 2016 18:43
At iteration 10 a single iteration takes 1287 MILLISECONDS
Paul Dubs
@treo
May 13 2016 18:43
about 20% faster than mkl :)
raver119
@raver119
May 13 2016 18:44
so that's 200ms improvement on top of yesterday
i think i'm going to merge right now
Paul Dubs
@treo
May 13 2016 18:45
great, I'll run my microbenchmark on it then
raver119
@raver119
May 13 2016 18:48
i think in 2-3 weeks we'll have pretty stable and performant codebase there
Paul Dubs
@treo
May 13 2016 18:48
i.e. for 3.10 :D
raver119
@raver119
May 13 2016 18:49
well
for 3.10-SNAPSHOT i guess
for 3.9 there will be only sequential executor
async is disabled right now
etc
Paul Dubs
@treo
May 13 2016 18:49
so the default config is going to be slower?
raver119
@raver119
May 13 2016 18:49
no
this is default config
this is sequential
            .setExecutionModel(Configuration.ExecutionModel.SEQUENTIAL)
            .setFirstMemory(AllocationStatus.DEVICE)
            .setAllocationModel(Configuration.AllocationModel.CACHE_ALL)
            .setMaximumBlockSize(64)
            .setMaximumGridSize(128)
            .enableDebug(false)
            .setVerbose(false);
thats config was used right now
Paul Dubs
@treo
May 13 2016 18:50
how fast is it async?
raver119
@raver119
May 13 2016 18:50
i'm not sure if it will run now
latest changes to TAD cachning etc were not tested with it
Paul Dubs
@treo
May 13 2016 18:51
ok, then I'll pretend that it doesn't exist :)
raver119
@raver119
May 13 2016 18:51
yep, i'll comment it out in config for release
and will bring it back after stuff is shipped
i still have 2 more days till release
so we'll see
Paul Dubs
@treo
May 13 2016 18:54
hehe :clap:
raver119
@raver119
May 13 2016 18:54
i want async thing back, at least for shady copies
aka dup aetc
Justin Long
@crockpotveggies
May 13 2016 19:06
@treo this is new?
[INFO] Scanning for projects...
[ERROR] [ERROR] Could not find the selected project in the reactor: :nd4j-cuda-7.5 @
[ERROR] Could not find the selected project in the reactor: :nd4j-cuda-7.5 -> [Help 1]
used that script verbatim
ah figured it out, there's a typo
git clone is typed twice
for nd4j
Paul Dubs
@treo
May 13 2016 19:14
thats what I get for copy pasting it line by line :D
raver119
@raver119
May 13 2016 19:58
@/all i've just merged my branches to masters, tests on cuda are more then welcome
Paul Dubs
@treo
May 13 2016 20:08
If I'm going to benchmark it, do I need any special config?
raver119
@raver119
May 13 2016 20:15
no
default config is fine for now
sure, you're free to play with grid/block sizes, but since you have 970 tooo
:)
Justin Long
@crockpotveggies
May 13 2016 20:17
back from lunch, still dealing with this though:
No CMAKE_CXX_COMPILER could be found.
I installed cmake but I assume it's not on the PATH?
raver119
@raver119
May 13 2016 20:17
@treo there’s only 1 pending improvement left, @eraly works on it now
if we have enough time - it should greatly improve all reduce calls performance
if not - we’ll see it in 3.10-SNAPSHOT
together with other planned things
Justin Long
@crockpotveggies
May 13 2016 20:22
Google-fu says this fixes C++ issue:
sudo apt-get install build-essential
so here's a possible strategy for those peeps like me who are building on OS X but are deploying to linux
Paul Dubs
@treo
May 13 2016 20:23
@crockpotveggies that should have been taken care of with the install of gcc
Justin Long
@crockpotveggies
May 13 2016 20:23
@treo didn't strangely
I can create a fork of my docker-spark Docker image with an ND4J linux specific build on it
Paul Dubs
@treo
May 13 2016 20:24
you did run the update alternatives thing?
Justin Long
@crockpotveggies
May 13 2016 20:24
probably slap MKL on that container as well, and all you'd need to do when using spark-submit is include the ND4J JAR file like --jars /path/to/nd4j.jar
@treo yea ran the update alternatives
@ treo OH you know what's missing from those build steps?
javacpp
Paul Dubs
@treo
May 13 2016 20:27
lol, right, I took it from my bash history on a vm that isn't updated to the newest build
Justin Long
@crockpotveggies
May 13 2016 20:27
I'll paste an updated GIST
Paul Dubs
@treo
May 13 2016 20:27
great do it :)
Paul Dubs
@treo
May 13 2016 20:34
@raver119 you've got some log spam still in there
raver119
@raver119
May 13 2016 20:34
oh
give me some
i’ll turn that off
i mean: show me that spam :)
Paul Dubs
@treo
May 13 2016 20:35
I"d love to.... but the cmd terminal in intellij is kind of stupid, didn't quite catch it and when I scroll up it is something else
raver119
@raver119
May 13 2016 20:36
oh
i guess it’s «new traced 11»
7, 11, 19
nah 7 11 29
raver119
@raver119
May 13 2016 20:37
yea
and that one too
i’ll make pass there in 30 minutes. need to write up some stuff on that...
raver119
@raver119
May 13 2016 21:42
@treo pushed
Justin Long
@crockpotveggies
May 13 2016 21:43
in the meantime it would appear that as long as I am running YARN in client mode, I can just load the dataset from the local filesystem
until we get HDFS sorted out
UGH
Caused by: java.lang.ClassCastException: org.slf4j.impl.Log4jLoggerAdapter cannot be cast to ch.qos.logback.classic.Logger
Paul Dubs
@treo
May 13 2016 21:44
different loggers on the classpath?
Justin Long
@crockpotveggies
May 13 2016 21:44
yea, I thought I fixed this
it's bare bones build, let's see
is DL4J using jersey?
Paul Dubs
@treo
May 13 2016 21:47
@raver119 https://gist.github.com/treo/bfb7ff408fc9bdd4191a44e72fc57a0e I guess I should mix up the benchmark a bit so it covers more of the underlying operations
Justin Long
@crockpotveggies
May 13 2016 22:00
I think it is indeed deeplearning4j-ui causing these logger headaches
org.slf4j:log4j-over-slf4j:1.7.10 is being pulled in by it via dropwizard
Adam Gibson
@agibsonccc
May 13 2016 22:02
@crockpotveggies re: hdfs look at ml lib utils
We can use a record reader ad a lambda :smile:
As
Justin Long
@crockpotveggies
May 13 2016 22:02
spark MLLib?
ah
Justin Long
@crockpotveggies
May 13 2016 22:14
regarding SparkEarlyStoppingTrainer what are the parameters examplesPerFit, totalExamples based on?
in other words, is that arbitrary per Spark node?
Justin Long
@crockpotveggies
May 13 2016 22:31
SparkEarlyStoppingTrainer doesn't seem to be doing much? deeplearning4j/deeplearning4j#1535
number of partitions != partitions in the logs
Justin Long
@crockpotveggies
May 13 2016 23:46
Screen Shot 2016-05-13 at 4.45.22 PM.png
really wish I had a UI, but with some additional tweaking I can fully pump everything out of this cluster
retrofit to 32GB per node will happen soon
Adam Gibson
@agibsonccc
May 13 2016 23:48
Should be awesome to see