@evaristoc Thanks for your detailed response. will go and try to take all you have said. Well, I am interested in AI/ML lately. So I started with regression.

bigyankarki sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:

:cookie: 408 | @evaristoc |http://www.freecodecamp.org/evaristoc

I see, @bigyankarki...

Yea... regression is the basics.

There are excellent online courses about ML, @bigyankarki. AI courses I know less.

It is my impression you are just starting. It might take you a long way so be prepared. Just keep going.

For the time being, I would suggest you to revise any course on linear algebra? Although the problems are usually solved using heuristics, the base of the whole approach to ML/AI problem solving is fundamentally linear algebra in the majority of the cases.

If you are into CS and looking to reveal the tricks behind the heuristics, you will find relevant clues when studying courses on Algorithms and also Discrete Optimization.

Advance for the project about learners who found jobs without previous experience in coding:

What is the median time (days) people said they spent since starting to learn until they found jobs?

**320 days**. The high quarter spent 661 days (the double) or more while those who did it quick just spent around 90 days.

Notes:

- The time refers in some cases not only to those who started with fCC - some people reported they started to study even before they found fCC.
- The majority of those who found jobs were trying careers as front-end web developers.
- The time includes also delays in finding jobs, or periods in which the students stopped studying for a while.
- There are other factors, like contextual ones, that helped to reduce or increase the time people spent learning before finding jobs.

(Errata: Above I said quarter, but the correct term I wanted to use was *quartile* - sorry for that. Moreover, the calculation I made is a quick approximation to the quartile. However trust me when I say a proper implementation won't change the results dramatically for the data I am using; 60 people with full data, and after excluding some outliers)

@sabin20 I can help. I warn you there is a lot of material about that.

What do you want to know?

@evaristoc SO, first I want to convert a data in binary success or failure

@sabin20 Are you using an specific library to help you? Which one?

So, for the case status column I want to use certified as 1 and rest as 0

data["SUCCESS"] = np.where(data["CASE_STATUS"].str.contains("CERTIFIED"), 1, other=0)

I did this but the console said where() takes no keyword arguments

is there any other way to do this?

First, why numpy? I am to say that depending what you are after, that could be ok or not. If you want to learn an implementation from scratch, numpy is ok. Otherwise you might want to use other libraries.

I am pretty new to this stuff

Here is the link to my kernel

First: what is your goal: learning from scratch or learning an implementation without really knowing what that does?

From scratch will take more code, meaning you will have to implement a gradient descent and possibly a regularization yourself.

So results is more important than the implementation as of now

Right now I am stuck with converting the strings into int for logit regressions

@evaristoc

Bare `numpy`

will require a bit more work coding I am afraid.

The usual go is `pandas`

for data handling. I will advice you to use it.

For the analytics, the most popular one is `scikit-learn`

. For this particular problem sklearn is an overkill but it is where more material exists:

- https://towardsdatascience.com/building-a-logistic-regression-in-python-step-by-step-becd4d56c9c8
- http://nbviewer.jupyter.org/gist/justmarkham/6d5c061ca5aee67c4316471f8c2ae976

Although not very popular between users, a more useful way to go in your case is `statsmodels`

library:

It resembles R programming and although it doesn't have many features it does a lot with what it has.

For your specific problem, I think you ask about how to vectorize the variables (examples in pandas and scikit-learn):

http://fastml.com/converting-categorical-data-into-numbers-with-pandas-and-scikit-learn/

It is also called more properly **encoding**.

Let me know if that helps?

If you need more advice about deploying the LR either in scikit-learn or statsmodels let me know.

@sabin20 ^^^

For your specific problem using

https://stackoverflow.com/questions/3172509/numpy-convert-categorical-string-arrays-to-an-integer-array

`numpy`

, maybe this can help?https://stackoverflow.com/questions/3172509/numpy-convert-categorical-string-arrays-to-an-integer-array

@sabin20 ^^^

@evaristoc I was looking at the first link you mentioned but it doesnot state on how to convert the string into numbers.

I am confused about that part\

True, @sabin20. It is about LR. I am sorry to say that you won't have the perfect example by hand all the time - it is about combining solutions from different sources how this usually works, I am afraid .

I also handled to you another link above about *converting data into numbers with pandas and scikit-learn*.

It is stated that way in the link itself.

With pandas, @sabin20, the manipulation will be easier to manage by simply defining that variable as a categorical one. There I invite you to check the pandas documentation.

I know it might look overwhelming, I mean learning so many things at once, but it is what it is. If you choose to learn more it will become easier.

You might need support with the implementation so please come to ask question whenever you want.

for converting data into binary

data["SUCCESS"] = np.where(data["CASE_STATUS"].str.contains("CERTIFIED"), 1, other=0)

I tried this as per the site, but the console says where() takes no keyword argument

Check what this is:`data["CASE_STATUS"].str.contains("CERTIFIED")`

It has to be an array. Otherwise `numpy`

might complaint.

The numpy's

`where`

is a conditional method.
@sabin20 as you see @GolbergData is a R fan...

yup

I saw the source you mentioned, @sabin20. It is ok for what you are planning for sure.

It seems that the suggested approach will work. However it won't be the first time you have something to debug. It will catch you even with simple things like this one.

My quick advice is to break it all and check you are filling the methods with the right data and print the results at each step.

Success! Come if you need more help.

Or use R... :) then you will have to ask @GoldbergData .

data["CASE_STATUS"].str.contains("CERTIFIED")

return the correct data but it again says where() takes no argument

found it. thanks @evaristoc

sabin20 sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:

:cookie: 409 | @evaristoc |http://www.freecodecamp.org/evaristoc

I can't figure out that without doing it myself, sorry. It will be up to you to find out ;) .

return the correct data but it again says where() takes no argument

Not sure about your argumentation here...

Ok. What was the error?

@sabin20 ?

data["SUCCESS"] = np.where(data["CASE_STATUS"].str.contains("CERTIFIED"), 1, 0) and i converted the object to string

@evaristoc

Ok. :+1:! (I was suspecting the str but I wouldn't know without doing it... :) ). Good! Let us know if you need additional help! Success, @sabin20 !

for instance my data has numerous locations. I want to convert that into int by providing each location with an unique numerical codes

@evaristoc Thanks for your advice. Really appreciate it.

bigyankarki sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:

:cookie: 410 | @evaristoc |http://www.freecodecamp.org/evaristoc