- Join over
**1.5M+ people** - Join over
**100K+ communities** - Free
**without limits** - Create
**your own community**

The dataset is probably noisy. If I reduce dataset size to few hundred thousand examples I get training accuracy above 90%. But even if the dataset is noisy, can it lead to training error increasing over time? I've thought if model capacity is large enough training error should decrease to near 0 (memorize the training set), if capacity is small it should stay flat at some point.

"Understanding Deep Learning Requires Rethinking Generalization" paper shows how neural networks can memorize even random labels

hey i just read quickly about machine learning course needed, there were discounts at udemy, i just registered myself : https://www.udemy.com/machinelearning/learn/v4/overview

its limited time

Feel free to read through my notes and if you have any questions, let me know.

The Goldman Sachs Senior Data scientists I talked to was a really cool guy

Hi, I want to classify a multi labeled data using deep learning techniques like CNN without building multiple classifier for each label.. when I read about it they say that I should use multiple sigmoid units on the last layer with binary cross entropy loss function.. actually I didn't understand why this would work and is there a better way to do this?

I'm availabe for at least the next hour to help out with whatever.

And if any of you know an easy way to parallelize my R scripts I'd love to hear it. ;)

first step is to change your label data into a binary set

once you get the data out of a one field with multiple values and into multiple fields with binary values it's much easier

one sec let me see if I can find an example.

Like this:

Is the R way

Also for n categorical to n binary columns... are we trying to make dummy variables here?

I ended up just using Fork instead of trying to do some complicated multithreading stuff.

;)

I'm going to fire up a cyclops.io video chat

if anybody cares to join me

like what are the steps

or so /

?

just saw this.

Natural language processing is the field it is in.

I haven't built one myself, but this is a good place to start:

I was trying to make a program for image compression using k means clustering

Can someone tell me whatâ€™s wrong with this code?

from scipy import misc

import numpy as np

from scipy.misc import toimage

img=misc.imread('bird_small.png')

img=img.reshape((16384,3))

def findc(X,incd) :

c=[]

```
for j in range(0,16384):
k1 = []
for i in range(0,16):
k=X[j]-incd[i]
k1.append(k.dot(k.transpose()))
print(j)
c.append(np.argmax(k1))
return c
```

def findu(X,u):

u=np.zeros((16,3))

a=np.zeros(16)

for j in range(0,16384):

for i in range(0,16):

if(c[j]==i):

u[i]=u[i]+X[j]

a[i]=a[i]+1

```
newc=[]
for i in range(0,16):
newc.append(u[i]/a[i])
return newc
```

incd = np.random.randint(np.size(img,axis=0), size=16)

print(np.size(img,axis=0))

incd = img[incd, :]

incd = incd.reshape((16, 3))

print(incd)

for _ in range(0,10):

c=findc(img,incd)

prevcd=incd

incd=findu(img,c)

for j in range(0,16384):

for i in range(0, 16):

if (c[j] == i):

img[j]=incd[i]

img.reshape((128,128,3))

toimage(img).show()