These are chat archives for FreeCodeCamp/DataScience

13th
Feb 2017
mohit16
@mohit16
Feb 13 2017 13:47
I am totally new to this field and want to make my carrier in it. what MOOC should i do in order to get my initials???
Xavier Sumba
@cuent
Feb 13 2017 21:17
Hi,
I have information from students that they should fill in in the matriculation process. This information is about their socio-economic status of each student. I want to classify and determine groups.
Do you know any technique/tool/some related work to accomplish this?
Amelia
@apottr
Feb 13 2017 21:20
@cuent look into k-means clustering
Xavier Sumba
@cuent
Feb 13 2017 21:21
Yes, I don't know how many groups
@apottr I dont have a k
Also, how to consider variables?
Amelia
@apottr
Feb 13 2017 21:24
a good way to find that "k" is to take the factors you want to cluster by and find all unique values
alternatively you can just keep incrementing kk until you find a suitable fit
but then your visualization is biased
Eric Leung
@erictleung
Feb 13 2017 22:21
@cuent I agree with @apottr . k-means clustering is a pretty well-known clustering algorithm to get you to determine groups. Without really knowing and making iterative "guesses", another method is to use principal component analysis (PCA) to see structure in your data. This can also inform what k in k-means to use. You might run into issues if you have a lot of categorical data though, because k-means and PCA won't work really using categorical data, fyi.
Xavier Sumba
@cuent
Feb 13 2017 22:56
@apottr @erictleung thanks I executed with some experiments with K-means, but I'm lost. First, I would like to graph some data to see how the variables are behaving. Do you know any tool to analyze data? I need someting to multivariate data analysis. Or other one to execute PCA.
CamperBot
@camperbot
Feb 13 2017 22:56
cuent sends brownie points to @apottr and @erictleung :sparkles: :thumbsup: :sparkles:
:star2: 1968 | @apottr |http://www.freecodecamp.com/apottr
:cookie: 454 | @erictleung |http://www.freecodecamp.com/erictleung
Xavier Sumba
@cuent
Feb 13 2017 22:56
Most of my data is categorical