These are chat archives for FreeCodeCamp/DataScience

17th
Mar 2018
Bigyan Karki
@bigyankarki
Mar 17 2018 00:11
@evaristoc Thanks for your detailed response. will go and try to take all you have said. Well, I am interested in AI/ML lately. So I started with regression.
CamperBot
@camperbot
Mar 17 2018 00:11
bigyankarki sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:
:cookie: 408 | @evaristoc |http://www.freecodecamp.org/evaristoc
evaristoc
@evaristoc
Mar 17 2018 11:40

I see, @bigyankarki...

Yea... regression is the basics.

There are excellent online courses about ML, @bigyankarki. AI courses I know less.

It is my impression you are just starting. It might take you a long way so be prepared. Just keep going.

For the time being, I would suggest you to revise any course on linear algebra? Although the problems are usually solved using heuristics, the base of the whole approach to ML/AI problem solving is fundamentally linear algebra in the majority of the cases.

If you are into CS and looking to reveal the tricks behind the heuristics, you will find relevant clues when studying courses on Algorithms and also Discrete Optimization.

evaristoc
@evaristoc
Mar 17 2018 19:43

PEOPLE

Advance for the project about learners who found jobs without previous experience in coding:

What is the median time (days) people said they spent since starting to learn until they found jobs?

320 days. The high quarter spent 661 days (the double) or more while those who did it quick just spent around 90 days.

Notes:
  • The time refers in some cases not only to those who started with fCC - some people reported they started to study even before they found fCC.
  • The majority of those who found jobs were trying careers as front-end web developers.
  • The time includes also delays in finding jobs, or periods in which the students stopped studying for a while.
  • There are other factors, like contextual ones, that helped to reduce or increase the time people spent learning before finding jobs.
evaristoc
@evaristoc
Mar 17 2018 19:55
(Errata: Above I said quarter, but the correct term I wanted to use was quartile - sorry for that. Moreover, the calculation I made is a quick approximation to the quartile. However trust me when I say a proper implementation won't change the results dramatically for the data I am using; 60 people with full data, and after excluding some outliers)
glitz20
@glitz20
Mar 17 2018 20:20
Anyone where familiar with logistic regression in Python?
evaristoc
@evaristoc
Mar 17 2018 20:38

@sabin20 I can help. I warn you there is a lot of material about that.

What do you want to know?

glitz20
@glitz20
Mar 17 2018 20:59
@evaristoc SO, first I want to convert a data in binary success or failure
evaristoc
@evaristoc
Mar 17 2018 21:00
@sabin20 Are you using an specific library to help you? Which one?
glitz20
@glitz20
Mar 17 2018 21:00
evaristoc
@evaristoc
Mar 17 2018 21:01
Ineresting...
glitz20
@glitz20
Mar 17 2018 21:01
So, for the case status column I want to use certified as 1 and rest as 0
data["SUCCESS"] = np.where(data["CASE_STATUS"].str.contains("CERTIFIED"), 1, other=0)
I did this but the console said where() takes no keyword arguments
is there any other way to do this?
evaristoc
@evaristoc
Mar 17 2018 21:03
First, why numpy? I am to say that depending what you are after, that could be ok or not. If you want to learn an implementation from scratch, numpy is ok. Otherwise you might want to use other libraries.
glitz20
@glitz20
Mar 17 2018 21:04
which one should I use then?
I am pretty new to this stuff
Here is the link to my kernel
evaristoc
@evaristoc
Mar 17 2018 21:05
First: what is your goal: learning from scratch or learning an implementation without really knowing what that does?
glitz20
@glitz20
Mar 17 2018 21:05
Right now, I need to get this project done
evaristoc
@evaristoc
Mar 17 2018 21:06
From scratch will take more code, meaning you will have to implement a gradient descent and possibly a regularization yourself.
glitz20
@glitz20
Mar 17 2018 21:06
It's for my economics project
So results is more important than the implementation as of now
glitz20
@glitz20
Mar 17 2018 21:14
Right now I am stuck with converting the strings into int for logit regressions
@evaristoc
evaristoc
@evaristoc
Mar 17 2018 21:18

Bare numpy will require a bit more work coding I am afraid.

The usual go is pandas for data handling. I will advice you to use it.

For the analytics, the most popular one is scikit-learn. For this particular problem sklearn is an overkill but it is where more material exists:

Although not very popular between users, a more useful way to go in your case is statsmodels library:

It resembles R programming and although it doesn't have many features it does a lot with what it has.

For your specific problem, I think you ask about how to vectorize the variables (examples in pandas and scikit-learn):
http://fastml.com/converting-categorical-data-into-numbers-with-pandas-and-scikit-learn/

It is also called more properly encoding.

Let me know if that helps?
If you need more advice about deploying the LR either in scikit-learn or statsmodels let me know.
@sabin20 ^^^
evaristoc
@evaristoc
Mar 17 2018 21:24
@sabin20 ^^^
glitz20
@glitz20
Mar 17 2018 21:25
@evaristoc I was looking at the first link you mentioned but it doesnot state on how to convert the string into numbers.
I am confused about that part\
evaristoc
@evaristoc
Mar 17 2018 21:28

True, @sabin20. It is about LR. I am sorry to say that you won't have the perfect example by hand all the time - it is about combining solutions from different sources how this usually works, I am afraid .

I also handled to you another link above about converting data into numbers with pandas and scikit-learn.

It is stated that way in the link itself.
glitz20
@glitz20
Mar 17 2018 21:33
@evaristoc yeah I get that.
evaristoc
@evaristoc
Mar 17 2018 21:33

With pandas, @sabin20, the manipulation will be easier to manage by simply defining that variable as a categorical one. There I invite you to check the pandas documentation.

I know it might look overwhelming, I mean learning so many things at once, but it is what it is. If you choose to learn more it will become easier.

You might need support with the implementation so please come to ask question whenever you want.

glitz20
@glitz20
Mar 17 2018 21:34
So, I was going through this link http://pbpython.com/categorical-encoding.html
for converting data into binary
data["SUCCESS"] = np.where(data["CASE_STATUS"].str.contains("CERTIFIED"), 1, other=0)
I tried this as per the site, but the console says where() takes no keyword argument
evaristoc
@evaristoc
Mar 17 2018 21:37

Check what this is:
data["CASE_STATUS"].str.contains("CERTIFIED")

It has to be an array. Otherwise numpy might complaint.

The numpy's where is a conditional method.
Josh Goldberg
@GoldbergData
Mar 17 2018 21:38
Just use RšŸ˜„
evaristoc
@evaristoc
Mar 17 2018 21:38
:) :) :)
@sabin20 as you see @GolbergData is a R fan...
glitz20
@glitz20
Mar 17 2018 21:39
yuo
yup
evaristoc
@evaristoc
Mar 17 2018 21:58

I saw the source you mentioned, @sabin20. It is ok for what you are planning for sure.

It seems that the suggested approach will work. However it won't be the first time you have something to debug. It will catch you even with simple things like this one.

My quick advice is to break it all and check you are filling the methods with the right data and print the results at each step.

Success! Come if you need more help.
Or use R... :) then you will have to ask @GoldbergData .
glitz20
@glitz20
Mar 17 2018 22:22
@evaristoc did you find what's the problem?
data["CASE_STATUS"].str.contains("CERTIFIED")
return the correct data but it again says where() takes no argument
found it. thanks @evaristoc
CamperBot
@camperbot
Mar 17 2018 22:27
sabin20 sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:
:cookie: 409 | @evaristoc |http://www.freecodecamp.org/evaristoc
evaristoc
@evaristoc
Mar 17 2018 22:27

I can't figure out that without doing it myself, sorry. It will be up to you to find out ;) .

return the correct data but it again says where() takes no argument

Not sure about your argumentation here...

Ok. What was the error?
@sabin20 ?
glitz20
@glitz20
Mar 17 2018 22:29
data["SUCCESS"] = np.where(data["CASE_STATUS"].str.contains("CERTIFIED"), 1, 0) and i converted the object to string
@evaristoc
evaristoc
@evaristoc
Mar 17 2018 22:30
Ok. :+1:! (I was suspecting the str but I wouldn't know without doing it... :) ). Good! Let us know if you need additional help! Success, @sabin20 !
glitz20
@glitz20
Mar 17 2018 22:31
sure
glitz20
@glitz20
Mar 17 2018 23:09
@evaristoc how can we convert string into unique codes?
for instance my data has numerous locations. I want to convert that into int by providing each location with an unique numerical codes
Bigyan Karki
@bigyankarki
Mar 17 2018 23:42
@evaristoc Thanks for your advice. Really appreciate it.
CamperBot
@camperbot
Mar 17 2018 23:42
bigyankarki sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:
:cookie: 410 | @evaristoc |http://www.freecodecamp.org/evaristoc