Data Mining Machine Learning AI Artificial Intelligence
People
Repo info
Activity
Keith Aumiller
@keithaumiller
CNNs are useful when you have a giant Dataset and you need to automate the classification.
i.e.
mapping system.
dog = picture A of dog
breaks down if someone does Dog walking in th epart next to a black cat
CNN in theory could map any # of mappings automatically to any number of images and would grow and change along with the content and inputs that people use.
For the vast majority of systems you still really just need a mapping system
CMS systems aren't going anywhere
They are good for when there is only 1 answer to a question
rawan la
@rawan_la_twitter
thank you @keithaumiller :smile:
Aleem Mohamed Firnas
@AMFIRNAS
Hi guys
can anyone tell me to do some practicals related to data mining and ml quickley, which tutorial you preffer ? I just need to play on some Data Science related stuffs.
This will be my first hands on experince, I'd like to do this untill I familiar with the project what I'm gonna do in campus
Pavel Surmenok
@surmenok
@AMFIRNAS This course should be good for getting hands on experience with deep learning: http://www.fast.ai/
Hi All! Can someone please help me with an issue? I'm training a recurrent neural network (with GRU) for a classification problem using rmsprop as an optimizer.
Training loss goes down for the first ~1 million examples, but then starts going up again
Why could it be?
Petru-Daniel Tudosiu
@danieltudosiu
The only reason that I might think of something like that is missclassification
How sure are you that your dataset is right?
And aslso the model might be too small
(I am a student so please take it as a grain of salt)
Pavel Surmenok
@surmenok
The dataset is probably noisy. If I reduce dataset size to few hundred thousand examples I get training accuracy above 90%. But even if the dataset is noisy, can it lead to training error increasing over time? I've thought if model capacity is large enough training error should decrease to near 0 (memorize the training set), if capacity is small it should stay flat at some point.
"Understanding Deep Learning Requires Rethinking Generalization" paper shows how neural networks can memorize even random labels
_
Petru-Daniel Tudosiu
@danieltudosiu
Assuimg you reduce in a random manner, I can only assume that the noisy is made by a model to throw off other models (GAMs by Ian Goodfellow)
Just dropping in a hello in case I sleep and miss the chat again.
Keith Aumiller
@keithaumiller
Sorry I couldn't make it Friday night guys, I was at the Machine Learning in Finance conference
Feel free to read through my notes and if you have any questions, let me know.
The Goldman Sachs Senior Data scientists I talked to was a really cool guy
Keith Aumiller
@keithaumiller
Great story about how he went from sleeping in his car, to winning data hackathons in San Fran, to working at GS
rawan la
@rawan_la_twitter
Hi, I want to classify a multi labeled data using deep learning techniques like CNN without building multiple classifier for each label.. when I read about it they say that I should use multiple sigmoid units on the last layer with binary cross entropy loss function.. actually I didn't understand why this would work and is there a better way to do this?
Keith Aumiller
@keithaumiller
Hey Guys
I'm availabe for at least the next hour to help out with whatever.
And if any of you know an easy way to parallelize my R scripts I'd love to hear it. ;)
Keith Aumiller
@keithaumiller
@rawan_la_twitter I haven't done a multi labeled data classification with CNN, but I have done it with neural nets in general.
first step is to change your label data into a binary set
once you get the data out of a one field with multiple values and into multiple fields with binary values it's much easier
If you are doing image recognition, that isn't really my bag
Yogesh Narayan Singh
@yogids
@keithaumiller hey buddy...so sorry could not make up for last 2 weeks.. have been travelling and is so hectic at weekends now... hope will be able to make it up from next week..
Also for n categorical to n binary columns... are we trying to make dummy variables here?
skklogw7
@skklogw7
Hey all!
Keith Aumiller
@keithaumiller
No worries.
I ended up just using Fork instead of trying to do some complicated multithreading stuff.