These are chat archives for deeplearning4j/deeplearning4j/earlyadopters

27th
Apr 2016
Cryddit
@Cryddit
Apr 27 2016 00:46
Are there any examples using the LBFGS optimization algorithm which actually work? I can't get anything to work with it.
Adam Gibson
@agibsonccc
Apr 27 2016 00:46
What do you ean?
mean*
You specify LBFGS as the optimization algo
That's it
Then you combine it with whatever updater you want
Cryddit
@Cryddit
Apr 27 2016 00:47
Well yes but when I do that the example stops working completely.
Adam Gibson
@agibsonccc
Apr 27 2016 00:47
it's just an enum
You have to change the updater and the like as well
It's like any of the other 600000 knobs DL has
different combos work well
pair it with the different updaters
Cryddit
@Cryddit
Apr 27 2016 00:48
Hmm. Okay, I'll start trying other combos. Are the incompatibilities documented anywhere?
Adam Gibson
@agibsonccc
Apr 27 2016 00:48
There's no incompatibilies
incompatibilities
Cryddit
@Cryddit
Apr 27 2016 00:48
But certain combinations don't work together.
Adam Gibson
@agibsonccc
Apr 27 2016 00:48
We NEVER state absolutes on this stuff
It seriously depends on the problem
LBFGS works well on smaller problems
Cryddit
@Cryddit
Apr 27 2016 00:49
Heh. All that means is that people conclude that certain things don't work at all.
Adam Gibson
@agibsonccc
Apr 27 2016 00:49
CG works well on larger ones
Your mileage varies
Cryddit
@Cryddit
Apr 27 2016 00:49
If you don't state anything about incompatibilities and they try something and it doesn't work, they blame the new thing.
Whichever thing it is.
Adam Gibson
@agibsonccc
Apr 27 2016 00:50
yup
I mean seriously, most people just recco doing momentum + SGD
different optimization algos with updaters isn't usually a hyper parameter people try too much
we just allow it
Cryddit
@Cryddit
Apr 27 2016 00:51
Right. I usually do momentum and Steepest (as opposed to Stochastic) gradient descent. But I'm trying to figure out what all the parts are.
Adam Gibson
@agibsonccc
Apr 27 2016 00:51
yeah so we separate the update step from the optimization algo
"way to step" vs "way to search"
Cryddit
@Cryddit
Apr 27 2016 00:52
Momentum has most of the same benefits as second-order methods, and most second-order methods don't scale - so I haven't gone there yet myself. But it was implemented here so I thought I'd try it.
I will state categorically that nothing in the examples folder makes any progress whatsoever when its optimization method is changed to LBFGS.
Adam Gibson
@agibsonccc
Apr 27 2016 00:54
You can't just say "I changed 1 thing and tried it for 5 minutes nothing works"
You have to change the updater in combination with the optimization algo
Experiment with different ones
Cryddit
@Cryddit
Apr 27 2016 00:55
No, that's not what I'm saying. I'm on a 2-cpu machine here with 16 cores total and 64 Gbytes of memory, and I've been working through ALL of the examples iteratively. I mean that NOTHING in the examples folder works when its optimization method is changed to LBFGS.
I have 2 good graphics cards too but CUDA's not doing anything beneficial now either.
Adam Gibson
@agibsonccc
Apr 27 2016 00:56
it's not supposed
cuda thinks 2+ 2 = 5 still
to
There's still failing tests yet
and yes we are working on those this nanosecond
We are mainly profiling cuda and finishing up other tests yet
Cuda is HARD to get right
You don't right numpy for cuda and cpu from scratch and expect it to be easy ;/
Cryddit
@Cryddit
Apr 27 2016 00:57
I know. I've been going over it. CUDA's one of the known issues.
Adam Gibson
@agibsonccc
Apr 27 2016 00:57
cool :D
I mean there's not much left
@raver119 and I have been working on reduce etc
we're about there with it no
now
Cryddit
@Cryddit
Apr 27 2016 00:59
Well, that'll be awesome.
Seriously, I'm really looking forward to CUDA support.
Adam Gibson
@agibsonccc
Apr 27 2016 01:02
we all are
Cryddit
@Cryddit
Apr 27 2016 01:06
Yeah, I didn't include an emote, and on rereading I was afraid it would just look sarcastic. It wasn't. That's why I added the second line.
sangav
@sangav
Apr 27 2016 03:55
trying to build nd4j from source .. getting error .. I m sure I am missing something basic? .. Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce (libnd4j-checks) on project nd4j-native: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed
Adam Gibson
@agibsonccc
Apr 27 2016 03:56
You didn't even read the docs
You need to build libnd4j: github.com/deeplearning4j/libnd4j
If you're not up for building c++ code I'd wait, a lot of things have changed
We'll have a release out on the 16th if you can wait
you will also need javacpp
Paul Dubs
@treo
Apr 27 2016 09:29
@agibsonccc do you have some minutes to go through the static analyzer results?
Adam Gibson
@agibsonccc
Apr 27 2016 09:29
in a bit, working with raver on something atm
Paul Dubs
@treo
Apr 27 2016 09:29
ok, ping me when you are ready :)
Adam Gibson
@agibsonccc
Apr 27 2016 10:21
@treo k
Paul Dubs
@treo
Apr 27 2016 10:23
does this mean you are ready, or that you are going to ping me then? :D
Adam Gibson
@agibsonccc
Apr 27 2016 10:23
yes I'm ready
Patrick Skjennum
@Habitats
Apr 27 2016 17:26
my RNN suddenly outputs "NaN" on score()
not really descriptive, i know, just wondering if that's something that got introduced yestarday, or if i screwed up something
(training seems to work, but there's no graph in the histogram, and NaN values in he iterationlistener)
Susan Eraly
@eraly
Apr 27 2016 17:29
@Habitats How do you know training works?
Paul Dubs
@treo
Apr 27 2016 17:30
do you use regularization?
because there was a bug, and apparently it is fixed by now: deeplearning4j/deeplearning4j#1438
Patrick Skjennum
@Habitats
Apr 27 2016 17:33
@eraly F-score and the other graphs
neat @treo yeah, that fits my problem.
finally something that wasn't my fault:D
Susan Eraly
@eraly
Apr 27 2016 17:33
@treo :thumbsup:
Patrick Skjennum
@Habitats
Apr 27 2016 17:35
@treo i need to rebuild everything?
been fixing one bug, and getting two new ones everytime i do that lately:D
Paul Dubs
@treo
Apr 27 2016 17:37
you can work around it, by setting both l1 and l2 regularisation (if you don't want one of them, set it to 0)
but, I would rather update
That's what you get for being an early adopter :)
Patrick Skjennum
@Habitats
Apr 27 2016 17:39
yeah some days i wonder what you guys talked me into
cd ..
wrong window, lol
Paul Dubs
@treo
Apr 27 2016 17:40
:D
Patrick Skjennum
@Habitats
Apr 27 2016 17:42
everyone at my uni is wondering why on earth i'm not just using theano
what should i tell them:P i never used it, and also i hate python. but those are invalid arguments.
Susan Eraly
@eraly
Apr 27 2016 17:43
Tell them you are helping usher in the future
And lie and tell them it's fun
Justin Long
@crockpotveggies
Apr 27 2016 17:43
@Habitats in my own situation, I chose DL4J for the specific reasons that we can productionize it in an environment that we know well - the JVM - and we can keep all of our backend engineering in a single language - Scala
Patrick Skjennum
@Habitats
Apr 27 2016 17:44
yeah, i'm in the same boat, but these guys doesn't even know scala is a language
researchers are fun people
@eraly haha, yeah, that's bascially what i've been saying
Paul Dubs
@treo
Apr 27 2016 17:45
They tend to write horrible scala code, when they do know it :D
(at least all of whom I've been working with)
Justin Long
@crockpotveggies
Apr 27 2016 17:45
@treo let's be honest, implicits haven't been useful for Scala ;)
(and begin the language wars)
Patrick Skjennum
@Habitats
Apr 27 2016 17:46
and there's an increasing number of people in the DL network in my town that're saying keras is going to be the new cool kid on the block
one of them (amund tveit) was supposedly going to meet up with you folks in san f. last year, but something got in the way
Paul Dubs
@treo
Apr 27 2016 17:47
Nah, I've written a decent amount of Scala, Haskell, Clojure, Java, Python, Ruby, PHP, Javascript... I don't join in to such things anymore :P
Justin Long
@crockpotveggies
Apr 27 2016 17:47
DL4PHP
Patrick Skjennum
@Habitats
Apr 27 2016 17:47
please don't
Paul Dubs
@treo
Apr 27 2016 17:48
php has a java bridge after all :D
Justin Long
@crockpotveggies
Apr 27 2016 17:48
oh the possibilities!
Susan Eraly
@eraly
Apr 27 2016 17:48
:joy:
Patrick Skjennum
@Habitats
Apr 27 2016 17:49
hooray, i got scores now @treo
Justin Long
@crockpotveggies
Apr 27 2016 17:56

anyone seen this issue when starting up a Spark local?

Caused by: java.lang.IllegalStateException: Detected both log4j-over-slf4j.jar AND slf4j-log4j12.jar on the class path, preempting StackOverflowError. See also http://www.slf4j.org/codes.html#log4jDelegationLoop for more details.
    at org.apache.log4j.Log4jLoggerFactory.<clinit>(Log4jLoggerFactory.java:51)
    at org.apache.log4j.Logger.getLogger(Logger.java:41)
    at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:75)

for this project I pretty much cleaned out all dependencies in my build.sbt with the exception of DL4J, ND4J, and Apache Spark


libraryDependencies ++= Seq(
  "commons-io" % "commons-io" % "2.4",
  "org.deeplearning4j" % "deeplearning4j-core" % "0.4-rc3.9-SNAPSHOT",
  "org.deeplearning4j" % "deeplearning4j-ui"  % "0.4-rc3.9-SNAPSHOT",
  "org.deeplearning4j" % "dl4j-spark" % "0.4-rc3.9-SNAPSHOT",
  "org.deeplearning4j" % "deeplearning4j-scaleout-api" % "0.4-rc3.9-SNAPSHOT",
  "org.nd4j"          % "nd4j-x86"            % "0.4-rc3.9-SNAPSHOT",
  "org.nd4j" % "canova-nd4j-codec" % "0.0.0.15-SNAPSHOT",
  "org.nd4j" % "canova-nd4j-image" % "0.0.0.15-SNAPSHOT",
  "com.github.fommil.netlib" % "netlib-native_ref-osx-x86_64" % "1.1",
  "org.apache.spark" % "spark-core_2.10" % "1.6.1"
)
Paul Dubs
@treo
Apr 27 2016 17:57
I'd guess dl4j includes a logger, and spark includes a logger
take a look at your dependency graph
and then exclude one of them
Justin Long
@crockpotveggies
Apr 27 2016 17:58
Yea thinking the same thing
Patrick Skjennum
@Habitats
Apr 27 2016 17:58
yeah, i had the same issue
dl4j+spark = dependency hell
Justin Long
@crockpotveggies
Apr 27 2016 17:58
@Habitats any quick fixes?
this might give you some hints
it's ugly, but it works:P
Justin Long
@crockpotveggies
Apr 27 2016 18:01
on that note I might dump SBT in favor of Gradle. Already switched our API to it
yea not pretty
was jersey responsible for a ton of the headaches?
Patrick Skjennum
@Habitats
Apr 27 2016 18:01
yes
don't ask me why excluding it twice works
nothing else did
but jersey only became a problem when using the UiServer
Justin Long
@crockpotveggies
Apr 27 2016 18:04
does DL4J also pull in Guava? I wonder if it's worth some extra time refactoring DL4J to be on par with the latest Spark version? (willing to do this)
ah yup, found it in a couple of POMs
let's see what Spark requires...
oh boy DL4J is already pulling in two different versions of guava
yea Spark uses 0.14 while DL4J uses 0.11 and 0.19
if questions regarding where and why:P
Justin Long
@crockpotveggies
Apr 27 2016 18:07
thanks was just about to do that :)
Patrick Skjennum
@Habitats
Apr 27 2016 18:08
spent the better part of a week trying to get all of these things to talk together
i fear that upping the spark version will break everything for me:D
i don't dare touching a thing atm
Cryddit
@Cryddit
Apr 27 2016 18:14
@Habitats That dependency tree is amazing. We'd never have even accepted a project that has more than about a dozen dependencies. Policy from on high is that building out a dependency tree with more than a dozen leaves means too many potential sources of bugs and incompatibility failures.
That and we're required to have every line of source code plus build environments locally so we can opt out of any project-breaking change. If I can't go to the boss with a whole build environment on the system to build everything with no network connection I don't even get to pitch for it.
Which is damned annoying because I still haven't got dl4j to build with under a dozen dependencies. :-(
Patrick Skjennum
@Habitats
Apr 27 2016 18:17
yeah that sounds pretty corporate!
Cryddit
@Cryddit
Apr 27 2016 18:18
They got burned several times by "upgrades" that broke things that other deps depended on, and eventually made policy.
Patrick Skjennum
@Habitats
Apr 27 2016 18:18
and yes, that dep tree is ridiculous, but luckily i don't care as long as it works:P
things get scary when you include transitive deps
Cryddit
@Cryddit
Apr 27 2016 18:20
Yup. I get to include everything I can get from a single linux distro before I have to start counting - so whatever I can install from Debian Testing is "free" against the 12 limit.
Paul Dubs
@treo
Apr 27 2016 18:20
Java tends to have a lot of deps, it isn't quite as bad as nodejs
Cryddit
@Cryddit
Apr 27 2016 18:20
After that? 12 sources or more of build artifacts from the network, and it's disqualified.
Patrick Skjennum
@Habitats
Apr 27 2016 18:21
sounds like a lot of re-inventing the wheel
Paul Dubs
@treo
Apr 27 2016 18:21
That is a good policy as long as your language environment of choice relies on os package management
for C/C++ and even python and ruby that can work quite well...
Cryddit
@Cryddit
Apr 27 2016 18:22
Mostly it's because we produce things our CUSTOMERS have to be able to build. The cost of support goes up exponentially with deps.
Yah, we've been mostly a C++ shop up til now.
Paul Dubs
@treo
Apr 27 2016 18:22
Usually java with maven is quite good in that regard
as long as you don't try to be an early adopter of dl4j at least :P
Cryddit
@Cryddit
Apr 27 2016 18:23
No. No it really isn't. Maven pulls in stuff from ALL OVER. Any single bit of it going down and we've got a hundred pissed-off clients the next day.
Paul Dubs
@treo
Apr 27 2016 18:23
All over = Maven Central
if you rely on stable releases
Cryddit
@Cryddit
Apr 27 2016 18:24
As I read it though, maven central is mostly just a collection of redirects to other projects isn't it?
Paul Dubs
@treo
Apr 27 2016 18:24
not really
you can have it like that
i.e. you can install something like sonatype nexus, and make that your main repository
that can just redirect, cache or completely mirror stuff for you
but I've yet to see a day when maven central wasn't available
and after you have fetched your dependencies, they are cached locally
Cryddit
@Cryddit
Apr 27 2016 18:26
Mirrors are policy. But it's a damned hard sell when the deps include stuff from dozens of teams that aren't working together.
Paul Dubs
@treo
Apr 27 2016 18:27
That is what you get from open source
Cryddit
@Cryddit
Apr 27 2016 18:27
ie. if it's more important to us that they work together than it is to them, my people won't trust it.
If somebody can make a change THEY don't think is a problem because to them working with foo is minor, but we absolutely depend on their stuff working with foo.... They pass on it.
Paul Dubs
@treo
Apr 27 2016 18:29
If you need that kind of assurance from your upstream libraries, you will get that probably only from commercial ones
Cryddit
@Cryddit
Apr 27 2016 18:29
Sad but true. That or we'll have to do wheel reinvention so we have ONE dependency.
Patrick Skjennum
@Habitats
Apr 27 2016 18:31
place i worked at last summer solved this issue by literally stealing the soruce code:P
Cryddit
@Cryddit
Apr 27 2016 18:31
SSH changed their signature scheme a while ago to rule out a particular form of signature that was vulnerable to an obscure attack. Not a problem for libs that use SSH to do communications (data in flight). But we had an immutable back-store of data at rest with a proof of integrity that depended on recorded SSH signatures. Their upgrade broke the whole system.
We wound up having to write a library based on a previous version of SSH to do integrity checks on old data - verifying that it hadn't changed meant verifying SIGNATURES that hadn't changed, which were under other signatures down to a merkle tree root.
Cryddit
@Cryddit
Apr 27 2016 18:37
It just came down to we were using their code in a way where what was important to us, wasn't important to them. They made a "minor" change and our world caved in.
So they don't trust more than a dozen different sources to not make "minor" changes that will screw us over - if we do, we can't be reliable.
Paul Dubs
@treo
Apr 27 2016 18:41
That is a hard requirement for you, but not for DL4J (as far as I know). I get that it probably has more dependencies than it really needs, and those can probably go pretty easily. But I don't think @agibsonccc (currently) thinks it is a good use of development time to remove as many dependencies as possible.
In Java land, if you want to stay on an older version: just do it. Don't change the version in your pom.xml, and you are done.
The only things that are allowed to change are -SNAPSHOT releases
Justin Long
@crockpotveggies
Apr 27 2016 20:37
@Habitats tried your exclusions in SBT but still having problems. looks like I'll need to switch to Gradle or Maven for this project
the SBT resolver mechanism is very tough to handle
Patrick Skjennum
@Habitats
Apr 27 2016 21:03
@crockpotveggies hope you get it figured out!
Justin Long
@crockpotveggies
Apr 27 2016 21:05
@Habitats by the way are you using shadow to eventually compile a fatjar for spark-submit?
Patrick Skjennum
@Habitats
Apr 27 2016 21:05
indeed i am
Justin Long
@crockpotveggies
Apr 27 2016 21:06
nice. is it still necessary to include spark-core in that fatjar? I was thinking if the versions match then surely spark should already be available in scope
Patrick Skjennum
@Habitats
Apr 27 2016 21:06
nope, it's not
but versions can be a trouble
Justin Long
@crockpotveggies
Apr 27 2016 21:07
I've just been updating to latest versions of everything: java 1.8 and spark 1.6.1
I'll post it up if/when it's done
Patrick Skjennum
@Habitats
Apr 27 2016 21:07
everything i used broke into pieces when i tried to update to java 1.8/scala 11
Justin Long
@crockpotveggies
Apr 27 2016 21:08
you might also need this if you have any Lightbend/Typesafe libraries in there:
shadowJar {
  transform(com.github.jengelman.gradle.plugins.shadow.transformers.AppendingTransformer) {
    resource = 'reference.conf'
  }
}
yikes
Patrick Skjennum
@Habitats
Apr 27 2016 21:08
wut
Justin Long
@crockpotveggies
Apr 27 2016 21:08
for instance if you're using Akka, it requires a default reference.conf that shadow doesn't pull by default
Patrick Skjennum
@Habitats
Apr 27 2016 21:09
oh like that
Justin Long
@crockpotveggies
Apr 27 2016 21:09
yea for some reason it doesn't transit with shadow
you may need this, though I used it because I have some mixed Scala/Java projects:
jar {
  from sourceSets.main.allScala
}
Patrick Skjennum
@Habitats
Apr 27 2016 21:10
i haven't had any issues with my current config, though
my main issue is getting dl4j to actually run on spark atm:P
Justin Long
@crockpotveggies
Apr 27 2016 21:21
are you trying with Spark local?
Patrick Skjennum
@Habitats
Apr 27 2016 21:23
atm i'm just running spark locala yes
i used clustered when i still used mllib for machine learning
but after switching to dl4j i haven't really gotten spark to work that well:P
for the actual training that is. i still use spark for data processing
but i've gotten the imrpession spark training isn't really worth it atm anyway, so i haven't tried that hard to fix it either
Justin Long
@crockpotveggies
Apr 27 2016 21:29
when you say "isn't worth it" do you mean performance-wise? are you using your own hardware or spinning up AWS instances?
Patrick Skjennum
@Habitats
Apr 27 2016 21:30
i've been using google cloud
Justin Long
@crockpotveggies
Apr 27 2016 21:30
how many instances?
Patrick Skjennum
@Habitats
Apr 27 2016 21:30
32 cores
Justin Long
@crockpotveggies
Apr 27 2016 21:30
I'm asking because I've got my own hardware, 8 machines each with 8 cores
Patrick Skjennum
@Habitats
Apr 27 2016 21:30
tried a bunch of configurations
Justin Long
@crockpotveggies
Apr 27 2016 21:31
and that's with the CPU improvements?
I'm also wondering if because there's virtualization in Google Cloud if it's affected at all
Patrick Skjennum
@Habitats
Apr 27 2016 21:31
like i said, haven't gotten around to get dl4j to even work with spark:p
Justin Long
@crockpotveggies
Apr 27 2016 21:31
I'd be very curious to compare it to a physical setup
ah gotcha
okay we're seriously pioneering this shit
"earlyadopters"
Adam Gibson
@agibsonccc
Apr 27 2016 21:32
@Habitats bit confused here
we've had it working the last few years now
At least a basic version
What's not working exactly?
Patrick Skjennum
@Habitats
Apr 27 2016 21:32
yeah but everytime i mention performance people say "wait for the parameter server"
ah it's not working because i didn't do it right, obviously:P
Adam Gibson
@agibsonccc
Apr 27 2016 21:33
oh for spark?
Patrick Skjennum
@Habitats
Apr 27 2016 21:33
spark yes
Adam Gibson
@agibsonccc
Apr 27 2016 21:33
I mean you can do basic param averaging
All I can say is file an issue with what you're trying to run in to
Patrick Skjennum
@Habitats
Apr 27 2016 21:33
but afaik the current spark+dl4j+training stuff is very slow compared to running training without spark, on the same hardware?
Adam Gibson
@agibsonccc
Apr 27 2016 21:33
We can improve the spark docs if there's not something working
there's more tuning involved that's why
even something as simple as the number of threads matters
Justin Long
@crockpotveggies
Apr 27 2016 21:34
I'm very curious if it's related to Google Cloud
as well
Adam Gibson
@agibsonccc
Apr 27 2016 21:34
eg: openmp threads with spark can cause resource thrashing
There's also GC with the executors vs the driver
ETL is a good part of it as well
You can't extrapolate say: spark local (which doesn't read from hdfs) usually
You have to decompose the timing
"What" is slow is what we need
Patrick Skjennum
@Habitats
Apr 27 2016 21:35
yeah, and seeing as i can barely get stuff to work without spark, i don't think i have the patience to go through all of that:P
Adam Gibson
@agibsonccc
Apr 27 2016 21:35
We're just spark-submit though
What exactly is the problem?
Package uber jar
spark submit
configure as normal
Patrick Skjennum
@Habitats
Apr 27 2016 21:36
yeah i have spark working on google cloud for all my other stuff
it's working great for mllib etc
Adam Gibson
@agibsonccc
Apr 27 2016 21:36
I'm shaking you down for "what"
All I hear is "it's not working"
not an actual concrete reason though
That doesn't help any of us
Patrick Skjennum
@Habitats
Apr 27 2016 21:36
i know why it's not working, it has nothing to do with dl4j
it's irrelevant
for this discussion:P
Adam Gibson
@agibsonccc
Apr 27 2016 21:37
if you say so
Justin Long
@crockpotveggies
Apr 27 2016 21:37
Bernie rack.jpg
Adam Gibson
@agibsonccc
Apr 27 2016 21:37
Anyways, for tuning
Look closely at the number of open mp threads being used
that will be huge for spark
Justin Long
@crockpotveggies
Apr 27 2016 21:37
Adam I've got this hardware here that's all dedicated, so if you need independent testing then I'm happy to jump in
it's dedicated for Spark processing so we have plenty to play with
Patrick Skjennum
@Habitats
Apr 27 2016 21:38
whoa, wish i had that:D
Justin Long
@crockpotveggies
Apr 27 2016 21:39
found a steal through eBay, thankfully it was good timing and didn't have to drop a huge budget
a lot of companies go out of business and dump their stuff
in case you're ever looking

okay found an issue when deploying in Spark

Exception in thread "main" com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
 at [Source: {"id":"0","name":"parallelize"}; line: 1, column: 1]
    at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
[bunch of crap]
at ai.bernie.researchtests.TrainNet$.main(TrainNet.scala:78)

and that line points to:
val trainRDD = sc.parallelize(list)

for whatever reason it can't parallelize the dataset
I'm specifically passing it like this:
val list = new util.ArrayList[DataSet](numSamples)
    while(trainingSetIterator.hasNext) list.add(trainingSetIterator.next)
Justin Long
@crockpotveggies
Apr 27 2016 21:44
I have a feeling this is Scala/Java incompatibility issues
I imported import scala.collection.JavaConversions._
Patrick Skjennum
@Habitats
Apr 27 2016 21:49
easier to understand your problem if you just post gist with full logs
Justin Long
@crockpotveggies
Apr 27 2016 21:49
one sec
Patrick Skjennum
@Habitats
Apr 27 2016 21:49
not sure how json serialization is sparks fault though:P
ah nevermind looks like dependency conflicts again @Habitats
Patrick Skjennum
@Habitats
Apr 27 2016 21:56
if you're just trying to get spark to work i'd recommend getting the examples to work first, btw
Justin Long
@crockpotveggies
Apr 27 2016 21:56
see earlier comment
Patrick Skjennum
@Habitats
Apr 27 2016 21:57
i have a bunch of json4s related deps the gradle i showed you
i had simialr issues
Justin Long
@crockpotveggies
Apr 27 2016 21:59
yea I thought that was just to get json4s to work, but since I'm not using it I removed it from my build.gradle
so here I am adding it back ;)
Adam Gibson
@agibsonccc
Apr 27 2016 22:06
@crockpotveggies I'd look at the spark version you're using's jackson vs yours
You should either do an exclude or a dependencymanagement to force the version
Patrick Skjennum
@Habitats
Apr 27 2016 22:10
@agibsonccc ran into an actual spark problem now: https://gist.github.com/Habitats/9f86775c13dbfe958e1817142398dd9a
i'm using the same datasetiterators that i've been using with my non-spark stuff
Adam Gibson
@agibsonccc
Apr 27 2016 22:11
Are you using cnns?
Patrick Skjennum
@Habitats
Apr 27 2016 22:11
nope, feedforward and rnns
same error for both
Adam Gibson
@agibsonccc
Apr 27 2016 22:12
"rnns"
hmm
at org.nd4j.linalg.api.ndarray.BaseNDArray.columns(BaseNDArray.java:3443) ~[nd4j-api-0.4-rc3.9-SNAPSHOT.jar:na]
Patrick Skjennum
@Habitats
Apr 27 2016 22:13
well i have a simple lstm network, but as it crashes with my feedforward i guess it's not related?
Adam Gibson
@agibsonccc
Apr 27 2016 22:13
I mean it just doesn't matter
It's because its not a matrix
Patrick Skjennum
@Habitats
Apr 27 2016 22:13
yeah
Adam Gibson
@agibsonccc
Apr 27 2016 22:13
sec it's an easyfix
easy fix*
Patrick Skjennum
@Habitats
Apr 27 2016 22:14
this is the code i'm using
Adam Gibson
@agibsonccc
Apr 27 2016 22:14
@Habitats file an issue on nd4j
Mention the fact that datasets should generalize to any shape
It's DataSet.merge causing the issue
dump that exception in there
Patrick Skjennum
@Habitats
Apr 27 2016 22:20
done
Adam Gibson
@agibsonccc
Apr 27 2016 22:20
thanks
Patrick Skjennum
@Habitats
Apr 27 2016 22:21
i assume my network config isn't relevant?
can post it if you want
Justin Long
@crockpotveggies
Apr 27 2016 22:38
@Habitats I've got everything going in what I think is scala 2.11 and your upgrade to Jackson 2.6.3 was indeed helpful. I tried using the Spark version but it pooped on the same error you had
@agibsonccc here's the gist and I'll be adding this to the same issue as @Habitats https://gist.github.com/crockpotveggies/f4eed87aa73be622d84f488128f6e045
Adam Gibson
@agibsonccc
Apr 27 2016 22:40
cool
Patrick Skjennum
@Habitats
Apr 27 2016 22:41
@agibsonccc is this 3.9 related, or just a general problem? all of the examples run fine on 3.8:s
Adam Gibson
@agibsonccc
Apr 27 2016 22:41
Not sure what's going on there
DataSet.merge is the problem
I'm just going to fix the root cause
rather than worry about specifics
DataSet.merge shouldn't assume matrices
Justin Long
@crockpotveggies
Apr 27 2016 22:42
@Habitats curious why are you creating a Scala collection and feeding it into a JavaRDD?
I did the opposite:
val list = new util.ArrayList[DataSet](numSamples)
    while(trainingSetIterator.hasNext) list.add(trainingSetIterator.next)
    val trainRDD = sc.parallelize(list)
Patrick Skjennum
@Habitats
Apr 27 2016 22:43
i just tried to reproduce the example code as closely as possible in order to avoid introducing subtle mistakes
Adam Gibson
@agibsonccc
Apr 27 2016 22:43
This SHOULD be using Nd4j.concat
since it's just arrays underneath
Justin Long
@crockpotveggies
Apr 27 2016 22:44
@agibsonccc is that referring to the ND4J .merge fix or my implementation?
Adam Gibson
@agibsonccc
Apr 27 2016 22:44
first one
Justin Long
@crockpotveggies
Apr 27 2016 22:45
ah gotcha. mind if you poke us when you think it's fixed and I can test?
Patrick Skjennum
@Habitats
Apr 27 2016 22:46
you can always subscribe to the issue:p
but yeah, what did you mean exactly @crockpotveggies?
fitdataset takes a javardd, so i was just being explicit
Justin Long
@crockpotveggies
Apr 27 2016 22:48
earlier I was thinking that perhaps that this error: java.lang.IllegalStateException: Unable to get number of of columns for a non 2d matrix
was perhaps caused by a Scala collection being converted somehow, which I wouldn't put past scala
I'm going to go back and use a JavaRDD without scala conversions of collections
just curious if anything
Patrick Skjennum
@Habitats
Apr 27 2016 22:49
but you can't paralleize a java-collection with a scala spark context
Justin Long
@crockpotveggies
Apr 27 2016 22:51
yea I'm going to turn it into a java spark context
Patrick Skjennum
@Habitats
Apr 27 2016 22:52
i had issues parallizing INDArrays with the scala spark context as well. information was lost in the arrays
maybe it's a scala-related issue somehow?
Justin Long
@crockpotveggies
Apr 27 2016 22:52
exactly what I'm thinking
Patrick Skjennum
@Habitats
Apr 27 2016 22:53
forcing java context is not the solution i'm looking for though:|
Justin Long
@crockpotveggies
Apr 27 2016 22:53
because Scala is built around this paradigm of immutability, the collections don't always do what's expected when interacting with Java libs
agreed I don't want to force JavaSparkContext either, but DL4J is very heavily a Java thing
yea same error
okay well that rules it out
Patrick Skjennum
@Habitats
Apr 27 2016 22:55
phew? hehe
Justin Long
@crockpotveggies
Apr 27 2016 22:56
not conclusive yet since you mentioned you were losing information in the arrays?
Patrick Skjennum
@Habitats
Apr 27 2016 22:56
right, it's 1 am, so goto look into it more in the morning. nice to see more scala people though @crockpotveggies :D
Justin Long
@crockpotveggies
Apr 27 2016 22:56
cheers :)
Patrick Skjennum
@Habitats
Apr 27 2016 22:56
yeah i can't reproduce that stuff atm
Adam Gibson
@agibsonccc
Apr 27 2016 23:10
the conversion isn't too bad though?
there's toJava and back etc
Justin Long
@crockpotveggies
Apr 27 2016 23:11
I imagine it depends on the size of the dataset. I'm curious to look at the internals and see what happens when .toJava is called
sometimes in Scala when something like .toList is called, it fully creates a whole new list and adds some operational overhead
so it might be more performant to simply instantiate JavaSparkContext from the get-go
Adam Gibson
@agibsonccc
Apr 27 2016 23:14
huh
Justin Long
@crockpotveggies
Apr 27 2016 23:14
one sec..

I might be over-thinking this. If you look at toJavaRDD here: https://github.com/apache/spark/blob/3c91afec20607e0d853433a904105ee22df73c73/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1757

def toJavaRDD() : JavaRDD[T] = {
    new JavaRDD(this)(elementClassTag)
  }

it's instantiating an entirely new RDD

Adam Gibson
@agibsonccc
Apr 27 2016 23:18
You'd need to look at the constructor
I was going to say
spark is lazy eval
I didn't think it would do anything like copy the data
spark isn't an ideal distributed system but it's not that bad

however, looking at @Habitats implementation on this line:

val trainIter: List[DataSet] = new FeedForwardIterator(...).asScala.toList

I do wonder what happens when it's being converted like that. There must be some overhead associated there?

Adam Gibson
@agibsonccc
Apr 27 2016 23:25
Yeah a lot
Anytime you're not using nd4j internals really
those are large buffers you're copying
(This is why I hate scala)
It's easy to have a lot of overhead
It's an elegant language
but there's an insane amount of object creation that happens
Justin Long
@crockpotveggies
Apr 27 2016 23:49
yea it all comes from immutability :P
Adam Gibson
@agibsonccc
Apr 27 2016 23:49
== overhead
Justin Long
@crockpotveggies
Apr 27 2016 23:49
there's always Kotlin
but I digress
Adam Gibson
@agibsonccc
Apr 27 2016 23:50
meh
straight c