These are chat archives for deeplearning4j/deeplearning4j/earlyadopters

28th
Apr 2016
Justin Long
@crockpotveggies
Apr 28 2016 00:56
@agibsonccc what timeline are you looking at for the ND4J .merge issue? I'll tackle it if the timeline is > 2 days since my main focus right now is to get our DL4J project up and running, and I'm willing to contribute here
Referencing this: deeplearning4j/nd4j#861
Adam Gibson
@agibsonccc
Apr 28 2016 00:57
line 120
already being worked on
it's seriously an easy fix
I'm in tokyo atm
so my day is just starting
I'll be able to poke at this
Justin Long
@crockpotveggies
Apr 28 2016 00:58
ah gotcha. holy crap did you just pull an all-nighter?
Adam Gibson
@agibsonccc
Apr 28 2016 00:58
oh no
I got up at 7am
I"m always up early
Justin Long
@crockpotveggies
Apr 28 2016 00:58
yea same here that's my usual start time
Adam Gibson
@agibsonccc
Apr 28 2016 00:58
right
Justin Long
@crockpotveggies
Apr 28 2016 00:58
okay cool thanks for the update :)
Adam Gibson
@agibsonccc
Apr 28 2016 00:58
yep
Justin Long
@crockpotveggies
Apr 28 2016 00:59
Vancouver, Canada here
6pm and my day is half-over :P
wobu
@wobu
Apr 28 2016 06:19
Hey guys, i am currently using your snapshot libraries with manually build nd4j and libnd4j. Our target platform is spark so we are using the dl4j-spark package as well. I found some strange bug which didn't existg in 0.4-rc3.8. As soon as using a Dataset or NDArray within a RRD the rank() information gets lost. Maybe some serialization problem.
i could post a issue with sample scala code, but i don't know if in dl4j or in nd4j
Alex Black
@AlexDBlack
Apr 28 2016 06:21
@wobu probably an nd4j issue. code to allow us to reproduce this would be great
Justin Long
@crockpotveggies
Apr 28 2016 06:27
@wobu is that the same issue that @Habitats and I have been having? deeplearning4j/nd4j#861
wobu
@wobu
Apr 28 2016 06:27
yea it seems, discovered it just yet this issue
Alex Black
@AlexDBlack
Apr 28 2016 06:28
hm, can you post the full stack trace, just to be sure?
I'm working on that issue right now, fyi
wobu
@wobu
Apr 28 2016 06:29
is there a way to post the stacktrace without spamming this chat? sry i am kinda new to gitter
Justin Long
@crockpotveggies
Apr 28 2016 06:29
Gist
Alex Black
@AlexDBlack
Apr 28 2016 06:30
yep, probably the same issue I think
should have something up within an hour or two... I'll post something here when it's done
wobu
@wobu
Apr 28 2016 06:31
spark is trying to merge the datasets, and this methods checks for rank() == 3 but getting 0
Alex Black
@AlexDBlack
Apr 28 2016 06:32
hm, so you checked with a break point I assume?
if so, it's the same location as the other issue, but probably not the same issue...
wobu
@wobu
Apr 28 2016 06:32
yeah i have. Also since rc3.8 the shapeInformation was marked with a transient annotation
and this isn't preserverd in spark when serializing
and the method rank() is using this field
Alex Black
@AlexDBlack
Apr 28 2016 06:33
ok, got it. I'll have to look into that separately
could you open an issue with that stack trace and your observations there?
thanks
wobu
@wobu
Apr 28 2016 06:34
i will do. Thank you
wobu
@wobu
Apr 28 2016 06:40
deeplearning4j/nd4j#863 done
Alex Black
@AlexDBlack
Apr 28 2016 06:42
great
Patrick Skjennum
@Habitats
Apr 28 2016 07:24
I posted some more specific shape info, but I'm not exactly sure if it was what you're after @AlexDBlack
Alex Black
@AlexDBlack
Apr 28 2016 07:27
thanks, that's fine
so fyi there's two separate issues here, one with merging generally (for 4d/cnn data) and one with serialization I guess
wobu
@wobu
Apr 28 2016 07:40
but it could be that the serialization problem is also the cause for the merge issue
Alex Black
@AlexDBlack
Apr 28 2016 07:43
yes. that's one of the two issues. the other is that CNN data (in 4d non-flattened format) isn't handled by the merge method
Patrick Skjennum
@Habitats
Apr 28 2016 07:47
@AlexDBlack alright, let me know if you want more stuff
Patrick Skjennum
@Habitats
Apr 28 2016 08:23
is there any built in conversion between spark vectors and indarrays?
(not that it's hard to do, just wondering)
wobu
@wobu
Apr 28 2016 08:25
org.deeplearning4j.spark.util.MLLibUtil.toVector
Patrick Skjennum
@Habitats
Apr 28 2016 08:31
ah neat! didn't know about mlliibutil
Patrick Skjennum
@Habitats
Apr 28 2016 09:09
recommended way to take element wise absolute value of an INDArray?
Adam Gibson
@agibsonccc
Apr 28 2016 09:09
Transforms.abs
Patrick Skjennum
@Habitats
Apr 28 2016 09:09
ah! another utility class i never knew about
awesome:D
it'd be helpful to include a note about that in the INDArray docs
Adam Gibson
@agibsonccc
Apr 28 2016 09:13
file an issue
The docs are in my head :P
Patrick Skjennum
@Habitats
Apr 28 2016 09:16
done:p
Adam Gibson
@agibsonccc
Apr 28 2016 09:17
thanks
Adam Gibson
@agibsonccc
Apr 28 2016 09:22
@Habitats could you take a crack at writing a test for the merge?
for your multi dimensional data?
Add that to the issue, I think I fixed it but that'd make life a bit easier
Abdullah-Al-Nahid
@lasin02_twitter
Apr 28 2016 09:23
Hi I am very new user of DL4J. I am using eclips. How can I read CSV file ? How can I run the existinf IRIS problem?
Adam Gibson
@agibsonccc
Apr 28 2016 09:24
Type t
Start searching csv
You'll see it right in there
You're in the wrong channel btw
This isn't for beginners
Abdullah-Al-Nahid
@lasin02_twitter
Apr 28 2016 09:25
thanks\
Patrick Skjennum
@Habitats
Apr 28 2016 09:25
not entirely sure how i'd go on about doing that though @agibsonccc
Adam Gibson
@agibsonccc
Apr 28 2016 09:26
Take part of your sample data and add some asserts to it
Patrick Skjennum
@Habitats
Apr 28 2016 09:32
yeah but what exactly should i test? i haven't been within the mechanis of this than net.fitDataSet
you have a branch i can pull and test?
Adam Gibson
@agibsonccc
Apr 28 2016 09:35
so no - model concatneating 2 datasets
that's what you'd be testing here
in this case build a dataset object and call merge
take a sample of your existing data
That's really all I want here
Real world is always helpful :D
Alex Black
@AlexDBlack
Apr 28 2016 09:36
you mean the merge stuff? I'm working an that now
Adam Gibson
@agibsonccc
Apr 28 2016 09:36
Oh I already have it done
sec
deeplearning4j/nd4j#867
If anything - the ND4j.concat would be another great one to do
Test it on the same branch
Nd4j.concat could probably use set maybe?
idk
I'm going to g back to the c++ stuff
Alex Black
@AlexDBlack
Apr 28 2016 09:40
though I did hit this when I was doing it: deeplearning4j/nd4j#864
which is why I haven't merged it already
Adam Gibson
@agibsonccc
Apr 28 2016 09:41
right ok
Could you conslidate some of that to my branch?
I'll let you own this from here
I need to poke at sam's memory corruption stuff
Alex Black
@AlexDBlack
Apr 28 2016 09:41
yeah, can do
Adam Gibson
@agibsonccc
Apr 28 2016 09:44
thanks!
Patrick Skjennum
@Habitats
Apr 28 2016 09:47
i tried with the mergefix branch of got a new exception
Alex Black
@AlexDBlack
Apr 28 2016 09:55
my (new) tests aren't passing on that branch, will push up something soon
also there's a serialization issue I think there too
Patrick Skjennum
@Habitats
Apr 28 2016 09:56
i've assumed there was some serialization issue for a while now
indarrays + spark = bad stuff happening
Alex Black
@AlexDBlack
Apr 28 2016 09:56
btw, spark local? or some other master?
Patrick Skjennum
@Habitats
Apr 28 2016 09:56
local atm
Alex Black
@AlexDBlack
Apr 28 2016 09:57
hm, odd. I've run some local stuff pretty recently
Patrick Skjennum
@Habitats
Apr 28 2016 09:57
yeah examples work fine, but this doesn't
so maybe it's scala?
Alex Black
@AlexDBlack
Apr 28 2016 09:57
and afaik there's a lot less serialization in spark local
i.e., it's all one JVM, so no need to
Patrick Skjennum
@Habitats
Apr 28 2016 09:57
yeah but my dataset is huge, so there is always some
Alex Black
@AlexDBlack
Apr 28 2016 09:58
that could explain it
anyway, I'll look into it
Patrick Skjennum
@Habitats
Apr 28 2016 09:58
there's a lot of shuffling, and it caches like crazy to disk
Paul Dubs
@treo
Apr 28 2016 10:08
@raver119 if you want me to test anything, just ping me :)
raver119
@raver119
Apr 28 2016 10:08
yea, will poke you later today\
blockwise stuff should greatly improve two major bottlenecks
Paul Dubs
@treo
Apr 28 2016 10:09
I hope, I'm done training my word vectors until then :D
raver119
@raver119
Apr 28 2016 10:10
well, at least tests are passing on prototype, and it works on blocks instead of threads. and do not uses a single byte of global memory for anything but data
so that definitely should help
Paul Dubs
@treo
Apr 28 2016 10:10
:)
Patrick Skjennum
@Habitats
Apr 28 2016 10:11
btw @treo any idea of how i can use that squashed document vector of mine as features for Naive Bayes? it only accepts positive values, but w2v has values all over the place
tried taking the abs and adding 1 and dividing by 2, but that didn't work at all
Paul Dubs
@treo
Apr 28 2016 10:12
How exactly have you squashed it?
Patrick Skjennum
@Habitats
Apr 28 2016 10:12
v.reduce(.add()).div(length)
Paul Dubs
@treo
Apr 28 2016 10:13
what are your most extreme values?
Patrick Skjennum
@Habitats
Apr 28 2016 10:15
not really sure how to interpret the vector values of my w2v vecotrsw
Paul Dubs
@treo
Apr 28 2016 10:16
they are just vectors after all, so you can do whatever you want with them
Patrick Skjennum
@Habitats
Apr 28 2016 10:16
most of them are super tiny, like 0.e-4, but then there's suddenly -4
Paul Dubs
@treo
Apr 28 2016 10:16
you could add 100 to all of them, and they would mean the same thing
that wouldn't work well for nns, but for your bayes, I think it should be too bad
Patrick Skjennum
@Habitats
Apr 28 2016 10:18
yeah idno, atm my bayes is just saying everything is "not anything" :P
regardless of how i manipulate these vectors
Paul Dubs
@treo
Apr 28 2016 10:19
how have you fed it before, for your baseline?
Patrick Skjennum
@Habitats
Apr 28 2016 10:19
for bayes i've been using bag of words and one-hot vectors, based on the underlying entities
but i thought i'd test it with the 1000d vectors as well
but yeah, not working out
that is; i've been using TF-IDF based on entities
so the values have been normalized between 0 and 1
but how do i normalize a -10/+10 range
where i don't really know if -10 is super unimportant, or if 0 is the most unimportant
Paul Dubs
@treo
Apr 28 2016 10:21
neither of both
on the word vector, its elements symbolize meaning, not importance to the document
so when you sum them, you get a document meaning
Patrick Skjennum
@Habitats
Apr 28 2016 10:22
yeah but with 1000d vector, that implies 1000 features, right?
thought the values indicated which features are important for this document
i guess what i'm saying is that i interpret w2v vectors as i would interpret PCA vectors
but that might be completely wrong
Paul Dubs
@treo
Apr 28 2016 10:24
that may be not a too bad idea: take the pca of them
Patrick Skjennum
@Habitats
Apr 28 2016 10:24
but that also kind of implies my naive bayes won't understand this anyway, as all it cares about is frequencies
Adam Gibson
@agibsonccc
Apr 28 2016 10:24
RBM/AutoEncoder?
If you're going to use neural nets you might as well go whole hog
:P
Alex Black
@AlexDBlack
Apr 28 2016 10:24
@Habitats new merge stuff is up... I'll look into serialization stuff now
Patrick Skjennum
@Habitats
Apr 28 2016 10:25
@agibsonccc i'm using two different baselines for my thesis; naive bayes and a simple feedforward net, and i'm benching them against an LSTM
but i'm having trouble making all of the methods understand the same representations of my data:P
@AlexDBlack neat, i'll check right away
@AlexDBlack which branch exactly?
Alex Black
@AlexDBlack
Apr 28 2016 10:27
same one
mergetest
Patrick Skjennum
@Habitats
Apr 28 2016 10:27
i don't see any new commits
Paul Dubs
@treo
Apr 28 2016 10:27
you can see the last pushes on the right here :)
Patrick Skjennum
@Habitats
Apr 28 2016 10:27
nvm i'm retarded
Paul Dubs
@treo
Apr 28 2016 10:27
:P
Patrick Skjennum
@Habitats
Apr 28 2016 10:27
was looking at pull requests, not commits
Patrick Skjennum
@Habitats
Apr 28 2016 10:37
btw i'm experiencing something really weird with the dl4j repos on git
it constantly says i changed a bunch of files but i changed nothing, and reset --hard HEAD og git stash doesn't work
and it's the same for all dl4j repos, but never had this issue with any other repo
Alex Black
@AlexDBlack
Apr 28 2016 10:38
could be intellij perhaps?
or are they .java files?
Patrick Skjennum
@Habitats
Apr 28 2016 10:38
java and .so
Alex Black
@AlexDBlack
Apr 28 2016 10:39
hm... only thoughts are file permissions or maybe file/character encoding?
Patrick Skjennum
@Habitats
Apr 28 2016 10:40
yeah, probably something like that
and git diff shows no changes
but it does show that all files changes
changed*
so now the only way i can pull is to clone the repo:P
Paul Dubs
@treo
Apr 28 2016 10:49
Are you putting this on a fat32 formated drive?
Patrick Skjennum
@Habitats
Apr 28 2016 10:49
no, are you crazy:D
NTFS
Paul Dubs
@treo
Apr 28 2016 10:50
then I don't really know what might change your permissions like that
Patrick Skjennum
@Habitats
Apr 28 2016 10:50
never had this issue, and i've cloned a lot of repos
Paul Dubs
@treo
Apr 28 2016 10:51

Sequences checked: [201427296], Current vocabulary size: [4329542]

looks like this will take some time... :D (training w2v on the extracted sentences from the german wikipedia)

Patrick Skjennum
@Habitats
Apr 28 2016 10:51
neat!
raver119
@raver119
Apr 28 2016 10:52
yea, that definitely will take some time :)
Paul Dubs
@treo
Apr 28 2016 10:52
it took me two hours to just extract them from the dump
Patrick Skjennum
@Habitats
Apr 28 2016 10:52
i updated git and the problem seems to be gone:s
@AlexDBlack still crashing, but another new exception:P
i'll clean and rebuild to make sure
Alex Black
@AlexDBlack
Apr 28 2016 11:00
just pushed up some serialization tests... 3 of 4 failing currently :(
Patrick Skjennum
@Habitats
Apr 28 2016 11:01
yeah, definitely failing
this seems like my original error
Alex Black
@AlexDBlack
Apr 28 2016 11:01
yeah, that's a new check/error message I added
Patrick Skjennum
@Habitats
Apr 28 2016 11:01
INDArrays are simply messed up
Alex Black
@AlexDBlack
Apr 28 2016 11:01
same underlying cause
anyway, I'll let you know when I have something there
Patrick Skjennum
@Habitats
Apr 28 2016 11:05
awesome
why are the dl4j and nd4j repos so crazy huge btw? 400 mb for repo is pretty overkill
Paul Dubs
@treo
Apr 28 2016 11:10
history
Patrick Skjennum
@Habitats
Apr 28 2016 11:10
of course
that makes sense
Paul Dubs
@treo
Apr 28 2016 11:15
you mean in order to transform it to something that your bayes takes?
Patrick Skjennum
@Habitats
Apr 28 2016 11:15
yeah
Paul Dubs
@treo
Apr 28 2016 11:16
may work, may not work :D
Patrick Skjennum
@Habitats
Apr 28 2016 11:16
:P
yeah i don't know
Patrick Skjennum
@Habitats
Apr 28 2016 11:23
i have a feeling the values follow a kind of logarithmic scale though
which might be the reason it just sucks
Patrick Skjennum
@Habitats
Apr 28 2016 12:51
@treo yeah taht didn't work at all either. Fscore of 0.001 :P
Paul Dubs
@treo
Apr 28 2016 12:52
put a sigmoid on them (will also map it from 0 to 1) next :D (it probably will not work as well, but it may be fun to do any way)
Patrick Skjennum
@Habitats
Apr 28 2016 12:53
i'm not entirely sure if your definition of fun matches mine
Patrick Skjennum
@Habitats
Apr 28 2016 12:54
oh dear, df
Paul Dubs
@treo
Apr 28 2016 12:54
blob
Patrick Skjennum
@Habitats
Apr 28 2016 12:55
hahahaha
Paul Dubs
@treo
Apr 28 2016 12:56
But really, it is simple to put a sigmoid on them
Transform.sigmoid
Patrick Skjennum
@Habitats
Apr 28 2016 12:56
it can't be worse than this
blob
good example of why accuracy is a worthless metric most of the time:P
@treo sigmoid made no difference
Paul Dubs
@treo
Apr 28 2016 13:00
as expected :D
Patrick Skjennum
@Habitats
Apr 28 2016 13:01
blob
all of my document vectors look like this
basically
after that normalization i mentioned
holy shit, i did Transforms.round() on it
and that worked
haha wtf
Paul Dubs
@treo
Apr 28 2016 13:02
wait a second... You reduce your word vectors to a single number?
Patrick Skjennum
@Habitats
Apr 28 2016 13:02
no, that's 1 vector
with 1000 values
just a scatter of all of the values in one vector
Paul Dubs
@treo
Apr 28 2016 13:02
the picture is one vector?
Patrick Skjennum
@Habitats
Apr 28 2016 13:03
yes
so i rounded it
Paul Dubs
@treo
Apr 28 2016 13:03
and how does it look now? (and your results?)
Patrick Skjennum
@Habitats
Apr 28 2016 13:03
making that vector turn into a one-hot encoded one
Paul Dubs
@treo
Apr 28 2016 13:04
i.e. features > 0.5 are now 1 and everything else is 0?
Patrick Skjennum
@Habitats
Apr 28 2016 13:04
basically
i mean, the results aren't awesome by any measure
blob
but comparing that to the other one
Paul Dubs
@treo
Apr 28 2016 13:05
Does your naive bayes expect one hot encoded vectors?
Patrick Skjennum
@Habitats
Apr 28 2016 13:06
well it's not like i hardcoded it
but no, it shouldn't expect that
it expects frequencies, but anything should work
i've been giving it TF-IDF vectors before
Paul Dubs
@treo
Apr 28 2016 13:08
that may explain why it doesn't work with non rounded input: everything seems to be more or less likely
You maybe should create a t-sne visualisation of your document vectors
maybe k-nn will work way better for you
Patrick Skjennum
@Habitats
Apr 28 2016 13:10
doesn't really matter if anything else would work better. this is the baseline i chose, so i have to stick to it:P
Paul Dubs
@treo
Apr 28 2016 13:10
anyway, create the t-sne visualisation, it is a fine thing to include in your thesis
Patrick Skjennum
@Habitats
Apr 28 2016 13:11
hmm
yeah, is that super easy or what
i tried it once and everything crashed, but that was 6 months ago
Paul Dubs
@treo
Apr 28 2016 13:11
it should be very easy, there is an example for it
Patrick Skjennum
@Habitats
Apr 28 2016 13:11
i'll look into it
not sure how to interpret it though
i know what it does, but uncertain about what it would do in this context
Paul Dubs
@treo
Apr 28 2016 13:12
if my intuition is correct, you should have clusters for your news categories
and the result (as in the examples) is simply a csv file, with x, y, and label
Alex Black
@AlexDBlack
Apr 28 2016 13:17
just merged some fixes for serialization
Paul Dubs
@treo
Apr 28 2016 13:18
great :+1:
Alex Black
@AlexDBlack
Apr 28 2016 13:18
@Habitats could you pull nd4j master and let me know if that's fixed things?
Patrick Skjennum
@Habitats
Apr 28 2016 13:24
oh master
right, checked wrong thing
@treo is there a gui for this thing, or do i have to do it myself
Paul Dubs
@treo
Apr 28 2016 13:26
@Habitats just do a scatter plot with the color being determined by the label
Patrick Skjennum
@Habitats
Apr 28 2016 13:26
yeah i just wondered if i had to do it myself:P
Patrick Skjennum
@Habitats
Apr 28 2016 13:40
@treo yeah i don't understand the tsne API apparently
Paul Dubs
@treo
Apr 28 2016 13:41
There is an example how to use it in the examples
Patrick Skjennum
@Habitats
Apr 28 2016 13:41
yeah i know, i'm looking at it
ah i get it, the matrix is just a concatinated set of vectors
and i use doc vector, label pairs
Paul Dubs
@treo
Apr 28 2016 13:45
all you have to do, is to just vstack your doc vectors and collect your labels into a list
Patrick Skjennum
@Habitats
Apr 28 2016 13:46
yeah i realized that
Paul Dubs
@treo
Apr 28 2016 13:47
And I'm wondering if there is a way to speed up my word vector learning (using only my single machine), at the current speed it will take about 9 days to go through all of the german wikipedia
raver119
@raver119
Apr 28 2016 13:49
well
short answer no, is something you don't want to hear, right?
show me your w2v config please
Paul Dubs
@treo
Apr 28 2016 13:50
It is the expected answer though
Word2Vec vec = new Word2Vec.Builder()
                .minWordFrequency(5)
                .iterations(1)
                .layerSize(300)
                .seed(42)
                .windowSize(8)
                .iterate(iter)
                .tokenizerFactory(t)
                .build();
raver119
@raver119
Apr 28 2016 13:51
and yo have 4m words in vocab, right?
Paul Dubs
@treo
Apr 28 2016 13:52
right
raver119
@raver119
Apr 28 2016 13:52
well, if you'd have something like tesla with 20gb ram onboard, you might be able to go for all-in-one kernel for w2v
but for consumer gpu that's useless - just not enough ram to fit syn0/syn1
other option is spark.
on high-end gpu you should be probably able to train w2v blockwise
the only problem there is memory requirements
Paul Dubs
@treo
Apr 28 2016 13:58
You gave me an idea, it is probably bad and a waste of time, but I can't just sit around and wait for days :D
raver119
@raver119
Apr 28 2016 13:59
aaaaand? :)
what's the idea? :)
Paul Dubs
@treo
Apr 28 2016 14:00
An all-in-one kernel that runs on the cpu and fully utilizes the vector units
raver119
@raver119
Apr 28 2016 14:01
heh
waste of time
you have 4 millions of vectors
each one has length of 300
  • syn1
with the same params
9.6gb worth of data
Paul Dubs
@treo
Apr 28 2016 14:05
so?
raver119
@raver119
Apr 28 2016 14:07
too much sparsity to hope for L1/L2/L3
even knowledge of frequency won't help - too much sparsity :(
wobu
@wobu
Apr 28 2016 14:10
@AlexDBlack deeplearning4j/nd4j#863 seems to be fixed now, thanks :)
Abdullah-Al-Nahid
@lasin02_twitter
Apr 28 2016 14:11
hi where can I get " IRIS Classifed With a DBN "code?
Patrick Skjennum
@Habitats
Apr 28 2016 14:12
@treo i can't get it to work
it's spamming INFO - Error at iteration 18 is NaN
raver119
@raver119
Apr 28 2016 14:13
hm
treo, please try original TSNE
not a Barnes
before getting to nd4j i've checked that, and it was producing +- sensible results
@Habitats sry
Patrick Skjennum
@Habitats
Apr 28 2016 14:16
in the Tsne docs it says DECOMPOSED VERSION, DO NOT USE IT EVER @raver119
raver119
@raver119
Apr 28 2016 14:16
oops
that to be removed :)
Patrick Skjennum
@Habitats
Apr 28 2016 14:16
also the example using barnes runs fine
raver119
@raver119
Apr 28 2016 14:17
i've rebuilt TSNE from scratches
hm.
anyway, please just give it a try, and tell me what you'll get theny
there*
Alex Black
@AlexDBlack
Apr 28 2016 14:23
@wobu cool. thanks
Paul Dubs
@treo
Apr 28 2016 14:24
@raver119 Would Glove work faster than word2vec?
wobu
@wobu
Apr 28 2016 14:41
@AlexDBlack nice the bug also don't happen on my mesos cluster anymore. Thanks a lot!
raver119
@raver119
Apr 28 2016 14:42
@treo definitely no.
glove is slower
mainly due to precalculation phase
Patrick Skjennum
@Habitats
Apr 28 2016 14:43
my tsne looks like an absolute mess
lol
@treo
blob
@raver119 using only tsne worked, but i have no idea whether it worked
should i run it on all of my data, or just some, i don't know how to do this really
and how low should the error be? it drops to like 0.6 pretty quick, but then takes another 900 iterations to get to 0.5, but still decreasing
raver119
@raver119
Apr 28 2016 14:48
at least i see some "sport" cluster at the bottom
Patrick Skjennum
@Habitats
Apr 28 2016 14:48
yeah, but everything else is a mess
raver119
@raver119
Apr 28 2016 14:49
and other overlapping things looks like mini-clusters too
Patrick Skjennum
@Habitats
Apr 28 2016 14:49
yeah, overlaps of different things:P
wish there was some jitter option to the labels
could jitter the points i guess
Paul Dubs
@treo
Apr 28 2016 14:51
you could simply not draw the labels, and instead draw colors (and add the meaning of them as a legend)
Patrick Skjennum
@Habitats
Apr 28 2016 14:52
i could, if i knew how
i'm using excel atm:P
Paul Dubs
@treo
Apr 28 2016 15:04
Don't know how to do that in excel :D
Patrick Skjennum
@Habitats
Apr 28 2016 15:08
yeah i'll have to create another format than the csv stuff
Justin Long
@crockpotveggies
Apr 28 2016 15:35
Causing: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.OutOfMemoryError: Cannot allocate 2227282722 bytes
I assume this is easily resolved using Spark settings on spark local using spark.executor.memory=4g?
it originated from ND4J, but that seems to be irrelevant here
raver119
@raver119
Apr 28 2016 15:36
hm, pretty strange, for 3.9 all allocations are bypassing java
or at least should bypass
Paul Dubs
@treo
Apr 28 2016 15:37
what is the full stack trace?
Justin Long
@crockpotveggies
Apr 28 2016 15:39
pasting it into a Gist one sec
Paul Dubs
@treo
Apr 28 2016 15:41
so it is thrown by javacpp, not the jvm
Justin Long
@crockpotveggies
Apr 28 2016 15:41
ah ya just got it again. I added sparkConf.set("spark.executor.memory","4g") with no dice
raver119
@raver119
Apr 28 2016 15:41
how much memory your node has?
Justin Long
@crockpotveggies
Apr 28 2016 15:42
my local laptop, 8GB
raver119
@raver119
Apr 28 2016 15:42
i mean, are there any chances there's really oom?
Patrick Skjennum
@Habitats
Apr 28 2016 15:42
@treo so apparently i'm an excel wizard now
blob
it's still a giant mess though:D
Justin Long
@crockpotveggies
Apr 28 2016 15:43
raver119
@raver119
Apr 28 2016 15:43
oom = out of memory
Paul Dubs
@treo
Apr 28 2016 15:43
haha :D
I vote for calling memory, mana from now on :D
raver119
@raver119
Apr 28 2016 15:43
however, pretty similar to everquest/wow definition
Justin Long
@crockpotveggies
Apr 28 2016 15:43
haha doubtful on memory, I've run this neural net before in 3.8 no issues
p.s. I have a feeling the devs of those games called it oom for a reason :P
Paul Dubs
@treo
Apr 28 2016 15:46
@Habitats you are using the pretrained google word vectors, right? And you usually have 5 to 15 entities per article?
Patrick Skjennum
@Habitats
Apr 28 2016 15:46
yeah
i have no idea how to use this tsne stuff though
should i use it on all of my training data? (that would take hours)
just doing samples atm
Paul Dubs
@treo
Apr 28 2016 15:48
try something that will take about half an hour to an hour :)
Patrick Skjennum
@Habitats
Apr 28 2016 15:48
trying 2k articles now, the image i showed waso nly trained on 300
but the plot is amazing though. my supervisor is going to love me.
Paul Dubs
@treo
Apr 28 2016 15:48
:D
Justin Long
@crockpotveggies
Apr 28 2016 15:52
Screen Shot 2016-04-28 at 8.52.31 AM.png
Screen Shot 2016-04-28 at 8.51.50 AM.png
Screen Shot 2016-04-28 at 8.51.58 AM.png
@raver119 here's my memory profile for:
1) before gradle run
2) after gradle run
3) at exact moment of memory allocation error
Paul Dubs
@treo
Apr 28 2016 15:54
can you tell us what the result of Runtime.getRuntime().maxMemory() is?
raver119
@raver119
Apr 28 2016 15:54
so, that definitely looks like oom
however still a question: where exactly all those gigabytes were used
Justin Long
@crockpotveggies
Apr 28 2016 15:55
@treo going to run that now
Paul Dubs
@treo
Apr 28 2016 15:56
it is thrown because DeallocatorReference.totalBytes + r.bytes > maxBytesand maxBytes is Runtime.getRuntime().maxMemory() . And DeallocatorReference.totalBytes + r.bytes = 2227282722 (from the exception)
raver119
@raver119
Apr 28 2016 15:56
@treo yea but he physically had 7gb used out of 8
Justin Long
@crockpotveggies
Apr 28 2016 15:57
actually @treo is on to something here, output from .maxMemory() is 1908932608
weird
I've had little issue running this before, why now?
so what I've done is closed everything in the background and reduced used memory to 5GB
Screen Shot 2016-04-28 at 8.59.39 AM.png
that should've been enough to run without issue, so it appears to be a config problem, likely Gradle
Justin Long
@crockpotveggies
Apr 28 2016 16:14
@raver119 okay got it working for a much longer period until the greedy thing came back and asked for 4575158770 bytes...is it so common for ND4J to constantly come back asking for more?
it's like feeding the stray cat in the neighborhood
Paul Dubs
@treo
Apr 28 2016 16:17
it really depends on what you are doing
it tries to allocate an array of 4.5gb in size, thats quite a lot
also 8 gb is quite little ram :D
Patrick Skjennum
@Habitats
Apr 28 2016 16:18
@raver119 i still get NaN if i train with many examples. no problems when i use like 5k
Justin Long
@crockpotveggies
Apr 28 2016 16:20
@treo yea I get it's not exactly the biggest of RAM, I'm a little bewildered by I was able to run it before in the same environment but now it complains. my only guess is that Spark is adding a lot of overhead and I'll need to shrink my dataset for testing purposes
Paul Dubs
@treo
Apr 28 2016 16:21
the strange thing is that you say that it worked fine with 3.8, which in general used a lot more memory
Justin Long
@crockpotveggies
Apr 28 2016 16:21
must be Spark related
which is fine, I have the capability to 1) shrink the dataset or 2) run it on bigger hardware
by the way got this, but I think this is because I allocated too much memory:
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGFPE (0x8) at pc=0x000000012a82c8ba, pid=32002, tid=76291
#
# JRE version: Java(TM) SE Runtime Environment (8.0_65-b17) (build 1.8.0_65-b17)
no stacktrace
Paul Dubs
@treo
Apr 28 2016 16:23
The SIGFPE signal is sent to a process when it executes an erroneous arithmetic operation, such as division by zero (the name "FPE", standing for floating-point exception, is a misnomer as the signal covers integer-arithmetic errors as well).
Justin Long
@crockpotveggies
Apr 28 2016 16:25
not sure if that's my doing or a bug?
raver119
@raver119
Apr 28 2016 16:26
@crockpotveggies file an issue please.
Justin Long
@crockpotveggies
Apr 28 2016 16:26
will do
raver119
@raver119
Apr 28 2016 16:26
there's 0 excuses for serious java framework to throw dump cores here and there
regardless who's responsible for that
Justin Long
@crockpotveggies
Apr 28 2016 16:31
done deeplearning4j/nd4j#869
Patrick Skjennum
@Habitats
Apr 28 2016 16:43
@treo should i care about altering the training rate for tsne?
or anything else for that matter
Paul Dubs
@treo
Apr 28 2016 16:44
Not sure really, I haven't used it for much
Patrick Skjennum
@Habitats
Apr 28 2016 16:45
ended up with this
blob
with a pretty low error
Paul Dubs
@treo
Apr 28 2016 16:46
so sport does cluster pretty well
Patrick Skjennum
@Habitats
Apr 28 2016 16:46
it's pretty much in sync with my results though:P the values that are all over the place are the worst performing ones
but there's an issue with overlapping dots though
as you can see there're no yellow ones visible
Paul Dubs
@treo
Apr 28 2016 16:48
you could try to put them in them up front, but it looks pretty much that they are all overlaid by something else
Patrick Skjennum
@Habitats
Apr 28 2016 16:48
ah wait no, it's me being retarded again
Paul Dubs
@treo
Apr 28 2016 16:48
:D
Patrick Skjennum
@Habitats
Apr 28 2016 16:51
i had messed up the labels
blob
this one's neater
Paul Dubs
@treo
Apr 28 2016 16:51
quite pretty :)
Patrick Skjennum
@Habitats
Apr 28 2016 16:51
probably the coolest thing i created this week, at least
Paul Dubs
@treo
Apr 28 2016 16:52
can you remove society from it?
Patrick Skjennum
@Habitats
Apr 28 2016 16:52
that was interesting
blob
Paul Dubs
@treo
Apr 28 2016 16:53
so you have some stacking
Patrick Skjennum
@Habitats
Apr 28 2016 16:53
yeah i so
do*
so, @treo what kind of insight does such a plot actually yield?
i know it's topology based
but how does it related to deep learning?
Valerio Zamboni
@vzamboni
Apr 28 2016 16:56
I just built the last version of nd4j master branch to overcome the issue 'Unable to get number of of columns for a non 2d matrix' when using dl4j-spark-ml but i'm facing now another exception when I try to do next() on a DataSetIterator:
Exception in thread "main" java.lang.UnsupportedOperationException
at org.canova.api.writable.ArrayWritable.toInt(ArrayWritable.java:47)
at org.deeplearning4j.datasets.canova.RecordReaderDataSetIterator.getDataSet(RecordReaderDataSetIterator.java:236)
at org.deeplearning4j.datasets.canova.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:170)
at org.deeplearning4j.datasets.canova.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:335)
at org.deeplearning4j.datasets.canova.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:47)
Paul Dubs
@treo
Apr 28 2016 16:56
it yields the insight that my intuition was right for sport and arts - and that simply squashing your word vectors together may be not good enough
also that you might have a fun problem :)
Patrick Skjennum
@Habitats
Apr 28 2016 16:58
hmm
right:P
well i'm getting good results with my squashed approach for the feedforward net
Paul Dubs
@treo
Apr 28 2016 16:58
good = ?
Patrick Skjennum
@Habitats
Apr 28 2016 16:59
comparable to those i got with the rnn
but of course, with a ton of more data
@treo how could i create a similar plot with word vectors instead?
Patrick Skjennum
@Habitats
Apr 28 2016 17:04
just concat all word vectors intead of squashing first:P?
Paul Dubs
@treo
Apr 28 2016 17:08
you could hstack them
oh, you mean just word vectors?
Patrick Skjennum
@Habitats
Apr 28 2016 17:08
yeah
word vector -> label
pairs?
i'm trying it
Paul Dubs
@treo
Apr 28 2016 17:09
get your lookup table, and tell it to plotVocab :)
Patrick Skjennum
@Habitats
Apr 28 2016 17:10
huh
Patrick Skjennum
@Habitats
Apr 28 2016 17:25
@treo just stacked them like i stacked doc vectors and i got this
blob
i assume that was a bad idea
Paul Dubs
@treo
Apr 28 2016 17:26
How did you decide which label to apply to a word?
Patrick Skjennum
@Habitats
Apr 28 2016 17:28
well i have articles and vectors, so i just used the articles label(s)
there are probably better ways to do it
Paul Dubs
@treo
Apr 28 2016 17:28
so each word is in there with several labels?
Patrick Skjennum
@Habitats
Apr 28 2016 17:28
like taking the majority or something
ye
i could max by label, though
max by label count, that is
Paul Dubs
@treo
Apr 28 2016 17:29
you could try that
you could also wordVectors.lookupTable().plotVocab(wordVectors.vocab().numWords(), Paths.get("c:/words.csv").toFile());
(which also crashes the jvm for me :D)
Patrick Skjennum
@Habitats
Apr 28 2016 17:31
i don't know what that does
Paul Dubs
@treo
Apr 28 2016 17:32
that creates the same tsne file, but for all of your words
so you can see how your words cluster
Patrick Skjennum
@Habitats
Apr 28 2016 17:33
i still don't get it
how is it different from what i'm doing
Paul Dubs
@treo
Apr 28 2016 17:33
instead of word vector -> label, you will get word vector -> word
Patrick Skjennum
@Habitats
Apr 28 2016 17:34
oh
Paul Dubs
@treo
Apr 28 2016 17:34
the difference is that each word vector is given to tsne only once
and that you don't have labels but actual words
Patrick Skjennum
@Habitats
Apr 28 2016 17:34
that would make an ugly scatter
Paul Dubs
@treo
Apr 28 2016 17:35
you could also try a somewhat different thing...
hstack your sequences, so you get one really long vector (pad it with zeros, if needed) and then try to tsne it again, that way you have something that even respects the order
Patrick Skjennum
@Habitats
Apr 28 2016 17:39
yeah
but it would disregard similarity in the word vectros wouldn't it
but yeah, i suppose it would represent the features i'm actually using the the RNN though
Paul Dubs
@treo
Apr 28 2016 17:40
doesn't hurt to try
Paul Dubs
@treo
Apr 28 2016 17:51
hmm.. performance isn't as good on cpu as it used to be...:
blob
Justin Long
@crockpotveggies
Apr 28 2016 17:56
Regarding deeplearning4j/nd4j#869 I've set up a standalone early stopping trainer without spark to see if I still encounter that fatal java error
Screen Shot 2016-04-28 at 10.56.12 AM.png
it's "doing stuff" but score vs. iteration is NaN
Paul Dubs
@treo
Apr 28 2016 18:00
do you have regularisation on it?
Justin Long
@crockpotveggies
Apr 28 2016 18:02
new NeuralNetConfiguration.Builder()
      .seed(seed)
      .iterations(iterations)
      .activation("relu")
      .weightInit(WeightInit.XAVIER)
      .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
      .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
      .learningRate(0.00005)
      .momentum(0.9)
      .regularization(true)
looks like I do yes I do
Paul Dubs
@treo
Apr 28 2016 18:02
disable it, and see if it is still NaN, if it isn't you should update :)
raver119
@raver119
Apr 28 2016 18:03
@treo can i have hot spots on that thing using yourkit?
there’s probably some optimizations possible for new arch
Paul Dubs
@treo
Apr 28 2016 18:06
blob
Justin Long
@crockpotveggies
Apr 28 2016 18:06
@treo DL4J is already latest on master, and I also rebuilt ND4J as well
raver119
@raver119
Apr 28 2016 18:06
can i have source code of that DocumentVector thing?
Justin Long
@crockpotveggies
Apr 28 2016 18:07
@treo does EarlyStoppingTrainer and specifically DataSetLossCalculator take a while to throw out a score?
Paul Dubs
@treo
Apr 28 2016 18:08
it is still the same old thing, as before... https://gist.github.com/treo/63a406e80e489ec907ace47a17577d1a
@crockpotveggies the DataSetLossCalculator reports a score only after each epoch
raver119
@raver119
Apr 28 2016 18:10
@treo that’s weird
and looks really bad
Paul Dubs
@treo
Apr 28 2016 18:10
also, the NaN with regularisation is supposed to be fixed, so make sure you are really on the most current version
raver119
@raver119
Apr 28 2016 18:11
@treo what’s overall percentage for vector() call? 60%?
Paul Dubs
@treo
Apr 28 2016 18:11
right
60%
Patrick Skjennum
@Habitats
Apr 28 2016 18:11
i still get NaN with big INDarrays + tnse
maybe it's related
Justin Long
@crockpotveggies
Apr 28 2016 18:11
@treo gotcha, let me run this for a while and see if it changes anything
raver119
@raver119
Apr 28 2016 18:12
@treo post issue please.
that’s definitely wrong.
however, i guess i don’t see it on gpu only due to cache
Patrick Skjennum
@Habitats
Apr 28 2016 18:14
should i post issue?
@raver119
Justin Long
@crockpotveggies
Apr 28 2016 18:19
@Habitats how many epochs is your training running for?
@Habitats @treo I also wonder if it's related to that SIGFPE fatal error I was getting while training in Spark
Paul Dubs
@treo
Apr 28 2016 18:23
you mean the NaN thing? may be
Justin Long
@crockpotveggies
Apr 28 2016 18:25
yes indeed. what's different in the Spark environment vs. standalone training?
raver119
@raver119
Apr 28 2016 18:30
@crockpotveggies averaging step
in spark you do stuff in multiple threads for data parallelism
so that multiplies required memory
and adds averaging step
Justin Long
@crockpotveggies
Apr 28 2016 18:32
@raver119 I assume the averaging step has been tested on a single local node? where in the source code can I find where it's executed?
raver119
@raver119
Apr 28 2016 18:32
SparkDl4jMultilayer etc
right in fit method
Justin Long
@crockpotveggies
Apr 28 2016 18:33
thanks going to have a look
raver119
@raver119
Apr 28 2016 18:34
it’s not optimal tbh, and after we finish work on native code - we’ll add parameter server, to optimize massive parallel training
Justin Long
@crockpotveggies
Apr 28 2016 18:38
@raver119 looking at my spark config, I'm averaging each iteration so I'm a little doubtful that the averaging step is ultimately the cause of the SIGFPE error
sparkConf.set(SparkDl4jMultiLayer.AVERAGE_EACH_ITERATION, String.valueOf(true))
Screen Shot 2016-04-28 at 11.40.43 AM.png
yea I ran training for a considerable amount of time, still seeing NaN
regularization was commented out, and I'm running a second training session with regularization(false) and still seeing the same thing
Justin Long
@crockpotveggies
Apr 28 2016 18:44
let me do a hard GIT reset and see if that fixes the problem before I file an issue
Justin Long
@crockpotveggies
Apr 28 2016 18:56
even after a hard GIT reset that appeared to pick up on a couple changes, I'm still getting that NaNwhen regularization is set to both true and false
I will file an issue
Justin Long
@crockpotveggies
Apr 28 2016 19:03
@Habitats is this the same thing you were getting? deeplearning4j/deeplearning4j#1462
Paul Dubs
@treo
Apr 28 2016 19:26
if you are still getting NaNs even without regularisation, it isn't the issue that was fixed that you are seeing
Justin Long
@crockpotveggies
Apr 28 2016 19:41
gotcha, I'm going to put a basic ScoreListener in there and see if I get anything back that's meaningful
it could be a UI issue, which in that case is easily solveable (I hope) and I can jump on it
Justin Long
@crockpotveggies
Apr 28 2016 19:53
yea everything coming out of the ScoreIterationListener is the same...
o.d.o.l.ScoreIterationListener - Score at iteration 0 is NaN
o.d.o.l.ScoreIterationListener - Score at iteration 1 is NaN
Cryddit
@Cryddit
Apr 28 2016 19:55
Well, crap. Discovered we won't be allowed to use dl4j for real projects. Too many dependencies mostly in support of features we wouldn't be using == too much instability risk for not enough value as seen by SQA. We're allowed to use it for proofs-of-concept, but not in production. So it's back to wheel reinvention for us.
ChrisN
@chrisvnicholson
Apr 28 2016 19:58
sorry to hear that!
raver119
@raver119
Apr 28 2016 19:59
@Cryddit just for curiousity: what kind of dependencies are excessive for you?
Cryddit
@Cryddit
Apr 28 2016 20:05
Mostly the requirement to go to bleeding-edge newest versions for everything starting with Maven and dev environments; they are seen as least stable, and their commitment to supporting each other is in question because "importing" Maven projects with the absolute newest kind of pom.xml doesn't yet work even in the newest versions - you have to let mvn do it instead of trusting the other tools. Also the requirement of JRE/JDK that are not yet distributed with stable distros.
Cryddit
@Cryddit
Apr 28 2016 20:11
Also lots of push-back on importing from sixteen different development groups via Maven Central.
Most of which they know nothing about.
Cryddit
@Cryddit
Apr 28 2016 20:21
I'll try to contribute personally, but... I'm going to spend most of my hours working on a C++ framework whose dependencies are basically the language-standard libraries.
Dipendra K. Misra
@dkmisra
Apr 28 2016 20:43
Have people ever seen GC issues due to use of .getDouble(new int[]{i, j})
I am calling it several times (100,000s) and after profiling, I found that GC is spending lots of time getting rid of these temporary int arrays. There is an overloaded .getDouble that does not require creating an array but just one of those things that you think is innocent.
Adam Gibson
@agibsonccc
Apr 28 2016 20:44
@Cryddit to be fair that's why my company exists ;) Hadoop looks exactly the same. Most companies don't use 1 off open source projects on github either. They either role their own or buy support for production. Banks (about the most reserved tech thing in the country) buy support for a "distro" for a reason. If you guys are mostly c++ I'd get it. Most jvm based stacks usually have support associated with them (which is why it's mostly java out there in enterprise land) C++ shops tend to role their "everything" with a 1 off stack for "everything". It'd make sense there
@dkmisra file an issue
Dipendra K. Misra
@dkmisra
Apr 28 2016 20:45
I don't think its an issue tbh
cause you can have high order tensors
Adam Gibson
@agibsonccc
Apr 28 2016 20:45
it's still worth noting though..
Dipendra K. Misra
@dkmisra
Apr 28 2016 20:45
but just wanted to bring to light that these little things can slow one down
yeah
okay, if you say so. I will post it as an issue.
Adam Gibson
@agibsonccc
Apr 28 2016 20:45
file an issue in the docs part?
I mean it can be anything really
Dipendra K. Misra
@dkmisra
Apr 28 2016 20:46
Sure, that makes more sense.
Patrick Skjennum
@Habitats
Apr 28 2016 21:48
@crockpotveggies i'm getting NaN when training tsne with more than 5k examples (1000d features)
spark still doesn't work for me. i get no exceptions, but it just hands after a while
Justin Long
@crockpotveggies
Apr 28 2016 21:54
@Habitats "hands"?
hangs?
are you deploying it to Google Cloud?
Patrick Skjennum
@Habitats
Apr 28 2016 21:56
hangs, yes
no i'm using local
Justin Long
@crockpotveggies
Apr 28 2016 22:01
hmm interesting, I thought maybe it would hang on a Google node if it ran into a fatal error but I doubt it
which JVM? 1.7 or 8?
Patrick Skjennum
@Habitats
Apr 28 2016 22:02
7
Justin Long
@crockpotveggies
Apr 28 2016 22:02
on that thought let me try downgrading my JVM and see if it makes a difference, one sec
Patrick Skjennum
@Habitats
Apr 28 2016 22:03
it's no rush for me though. i don't need spark to work atm
it works for you?
Drew Shaw
@ds923y
Apr 28 2016 22:22
I have a problem building on MacOSX for 0.4-rc3.9-SNAPSHOT and I have followed the MacOSX specific instructions for libnd4j. I followed this step by building nd4j as told in the instructions for libnd4j. I then rebuilt deeplearning4j for good measure. I cloned a fresh copy of the dl4j-examples. did a 'mvn package' and ran GravesLSTMCharModellingExample on the shaded jar on the shaded jar. I ended up with the following exception referencing something that sounds like intel's Math Kernel Library. Caused by: java.lang.UnsatisfiedLinkError: /private/var/folders/tv/klm99r2s62x6kh28p956g6xw0000gn/T/javacpp180325733169764/libjniNativeOps.dylib: dlopen(/private/var/folders/tv/klm99r2s62x6kh28p956g6xw0000gn/T/javacpp180325733169764/libjniNativeOps.dylib, 1): Library not loaded: libmkl_rt.dylib . Does anyone have a clue about a step I have missed?
I have Apache Maven 3.3.9, Java version: 1.8.0_45, vendor: Oracle Corporation, OS name: "mac os x", version: "10.11.4", arch: "x86_64", family: "mac".
Patrick Skjennum
@Habitats
Apr 28 2016 22:34
echo $LIBND4J_HOME prints out the correct path for libnd4j?
Drew Shaw
@ds923y
Apr 28 2016 22:36
The instructions have the j in lower case
let me try your way
Drew Shaw
@ds923y
Apr 28 2016 23:01
I run setup.sh to build deeplearningj4. Is this bad?
Patrick Skjennum
@Habitats
Apr 28 2016 23:03
you really only need to mvn clean install -DskipTests -Dmaven.javadoc.skip=true
Drew Shaw
@ds923y
Apr 28 2016 23:07
it setup.sh definitely overrides mvn clean install -X -DskipTests -Dmaven.javadoc.skip=true -pl '!org.nd4j:nd4j-cuda-7.5'
Justin Long
@crockpotveggies
Apr 28 2016 23:07
well this is interesting. I downgraded to target JVM 1.7 and I am no longer seeing that SIGFPE and just more out of memory errors
@Habitats although it's too early to say if JVM downgrade helped, it did keep the Spark training going longer until an eventual OOM
Drew Shaw
@ds923y
Apr 28 2016 23:14
Isn't it recommended that spark have 32gb of ram
Justin Long
@crockpotveggies
Apr 28 2016 23:17
yea I set up a Spark local just so I can test all the basics of getting things up and running
Drew Shaw
@ds923y
Apr 28 2016 23:19
I don't know of any scala compatible with 1.8
Are you using the right scala version for the distribution of spark you have.
Drew Shaw
@ds923y
Apr 28 2016 23:26
Scala version (2.10.x) Scala version (2.10.x) is compatible with spark 1.6.1
Justin Long
@crockpotveggies
Apr 28 2016 23:28
there's a distro for Spark cross compiled for scala 2.11
I moved all scala versions to 2.10 and set JVM target to 1.7 and the SIGFPE error went away so it looks fine now
just need to get the Score fixed so it's no longer NaN
and I'll throw this on my spark cluster
Drew Shaw
@ds923y
Apr 28 2016 23:34
I have not heard if it was confirmed that deeplearning4j works with cuda at the moment. Although I have not used spark with deeplearning4j this happens to me usually when my hyperperameters are bad.
Does anyone know if the deeplearning4j examples are working 0.4-rc3.9-SNAPSHOT on MacOSX at the moment?
Justin Long
@crockpotveggies
Apr 28 2016 23:39
I had the same net model working on 3.8 and scores weren't an issue
I just tested it to be sure. I can't imagine changing learning rate and other parameters would make a difference
Drew Shaw
@ds923y
Apr 28 2016 23:43
I can't even get 0.4-rc3.9-SNAPSHOT working. Since it is not a release it is probably more for making contributions to deeplearning4j or experimenting with bleeding edge features.
Adam Gibson
@agibsonccc
Apr 28 2016 23:44
@ds923y I would just wait till the release on the 16th
If you actually want to test you have to compile from source
Drew Shaw
@ds923y
Apr 28 2016 23:48
It looks like there are some simple bugs out there to fix. I am interested in contributing. 0.4-rc3.8 is fine for my purposes. I am compiling rc3.9 from source. libnd4j, nd4j, canova, deeplearning4j
Adam Gibson
@agibsonccc
Apr 28 2016 23:50
@ds923y then start by compiling libnd4j from source
The examples work fine with cpu atm
And cuda more or less is there
We are just profliling at this point
You also need javacpp
Drew Shaw
@ds923y
Apr 28 2016 23:51
There is a dependency problem where libmkl_rt.dynlib is not found.
You can use openblas then
Actually - which OS is this?
That doesn't make sense
Drew Shaw
@ds923y
Apr 28 2016 23:52
OSX
Adam Gibson
@agibsonccc
Apr 28 2016 23:52
You're using mkl on osx?
Drew Shaw
@ds923y
Apr 28 2016 23:52
Not that I know of.
Adam Gibson
@agibsonccc
Apr 28 2016 23:52
I'll need more info than that..usually it finds veclib
Yeah we have 2 osx users
Neither of them have bumped in to that
It's trying to find mkl when osx has veclib
File an issue with a full build log but no one has run in tothat
in to that*
Drew Shaw
@ds923y
Apr 28 2016 23:54
If I build javacpp as well and I get the issue I will file it.
I put a more detailed description above.
The only thing I know of on my system that has these sets of libraries is my anaconda python distribution. I don't know how the build system would find it though in my home directory where anaconda installs.
Adam Gibson
@agibsonccc
Apr 28 2016 23:58
uh..huh
so anaconda messes with it
that doesn't surprise me..
Drew Shaw
@ds923y
Apr 28 2016 23:58
I don't know
Adam Gibson
@agibsonccc
Apr 28 2016 23:58
no it does
File an issue mentioning that and we can try to accommodate that being installed