These are chat archives for FreeCodeCamp/DataScience

20th
Sep 2017
Eric Leung
@erictleung
Sep 20 2017 03:51
@evaristoc that http://www.algomation.com site reminds me of this similar site https://visualgo.net/en in showing how algorithms work. The Algomation, I think, has an advantage in some aspect for showing the code to do that algorithm as well.
@evaristoc interesting conversation on deep learning being easy. I think I came across that opinion somewhere else as well. I think that opinion is good for two reasons. It lowers the barrier for people wanting to get into machine learning and gives you the opportunity to perform some very powerful techniques very easily. Second, it pushes other people to possibly focus on more nuanced or different methods (as you've listed).
@evaristoc I've enrolled in the deep learning course, but I haven't gone through it. I am very interested in learning from it. Good to see someone is benefiting from it :+1:
Eric Leung
@erictleung
Sep 20 2017 04:01
@bharath93m sorry, I'm not familiar with pyspark. Looks like you might be able to transform your Spark object into a pandas one in order to use its distinct method https://stackoverflow.com/a/39384987/6873133
Eric Leung
@erictleung
Sep 20 2017 05:53

Cheatsheets for AI, only a handful are really for AI. Lots of it is relevant for data analysis. I printed off the pandas ones myself.

I use R primarily and have been wondering how to do similar data manipulation tasks in pandas. So I printed out one of the pandas cheatsheets just for that :smile: It looks veryyy similar to the dplyr (R package used for data manipulations) cheatsheet.

evaristoc
@evaristoc
Sep 20 2017 10:17

@bharath93m Your question is not really clear and I have some time not using PySpark but I still decided to investigate your request as I still found it interesting.

I found this link useful. The definition of window functions is:

At its core, a window function calculates a return value for every input row of a table based on a group of rows, called the Frame. Every input row can have a unique frame associated with it.

For what I can see, and complementing the recommendation by @erictleung, you can say that window functions resemble the groupby + apply pandas methods. So I am sure you can implement a count. What it is not clear from your question is what you want to do with the count.

Anyway - check the link I am suggesting to see if it is helpful? There is an example in Spark SQL but everything you can do in SQL can be probably done in Py too.

evaristoc
@evaristoc
Sep 20 2017 15:36
@mstellaluna thanks for the link! I will check. I am not into SAP - never have, but I think it would be important to follow the advances of those many companies around DS. SAP main focus is on ERP and some banking, so churn is a good example.
CamperBot
@camperbot
Sep 20 2017 15:36
evaristoc sends brownie points to @mstellaluna :sparkles: :thumbsup: :sparkles:
:cookie: 821 | @mstellaluna |http://www.freecodecamp.com/mstellaluna
evaristoc
@evaristoc
Sep 20 2017 15:39

People...

Sorry for this but I am DELIGHTED with the Deep Learning training by Andrew Ng... Just look at this exercise to implement regularization from scratch (!!!!):

Problem Statement: You have just been hired as an AI expert by the French Football Corporation. They would like you to recommend positions where France's goal keeper should kick the ball so that the French team's players can then hit it with their head.

And then a soccer field figure as an illustration...

Wow....

mstellaluna
@mstellaluna
Sep 20 2017 15:40
@evaristoc yea true.. openSAP focuses on ds, IoT, mostly with their products. I wasnt sure i am signed up for sap hana (their in memory database) courses i want to take, i figured i would share anyway
evaristoc
@evaristoc
Sep 20 2017 15:45

I am miles from what I was doing when I started practicing DS and python so I can finish this exercises rather quicker (three weekly sections of 2 different courses in just 3-4 days is nice...).

But the training might catch you if you don't have some math. A lot of people complaining. There are also those who are complaining because there is not enough math in the course but rather because in order to understand the math of the course they need more math. :) :) :) :)

Anyway - this course already promises to be as influential as his ML course for the years to come, IMO.

mstellaluna
@mstellaluna
Sep 20 2017 15:48
Cool
evaristoc
@evaristoc
Sep 20 2017 15:49

@mstellaluna I don't see why not to do that. Depends... SAP clients are not as many as those of Oracle, IBM and MS - her more visible competitors. SAP does have a niche market that is very faithful. SAP market share is mostly European if I am not wrong. But it is still a very important market, with Forbes 500 clients.

So: depends on your aspirations. If you do SAP, you might be securing a place within the SAP market share for sure.

Matthew Barlowe
@mcbarlowe
Sep 20 2017 17:29
@evaristoc that's funny because sports analytics is what orginally got me into data science
evaristoc
@evaristoc
Sep 20 2017 17:43
Screenshot from 2017-09-20 19-43-04.png
@mcbarlowe this exercise was simple and could have been solved with even linear methods:
The dataset is one at which the goal keeper left the goal for a play and the action meant a goal against. The position of the goal keeper are spread in a simple x-y plane and each point labeled as "goal-against" vs "no-goal-against". The resulting best analysis using NN is the figure above.
evaristoc
@evaristoc
Sep 20 2017 17:49

But the nice thing is learning MANY things about NN implementations barely found in other courses.

And I haven't commented the invites: Andrew Ng interviews several people in the sector that are in the front-end of NN development and the insights are excellent. That also means that you can more easily associate those names to specific advances and specialties in the field.

Value for money if you are interested in NN. The funny thing is that I AM NOT interested that much. I am not really using it right now! But still, it is a good review of ML in general : many things done in this course for NN (regularization, the maths, the gradient descent, etc) are revisit of general methods for ML. So it is a sort of knowledge reinforcement. No regrets.
Matthew Barlowe
@mcbarlowe
Sep 20 2017 17:56
Yeah I want to take it I'm just finishing up the classes I'm taking now first
evaristoc
@evaristoc
Sep 20 2017 17:59

I am taking the Specialization - 5 courses. If you are not still very confident, take just the first two.

Anyway: you are starting, so take it slowly. I was discussing with a friend here in fCC about how long average might take someone to get an average base for Data Analytics with a ML/DMining orientation and we concluded that 2 years average. Of course depends on how talented you are and when you started to study all the basis.

So otherwise, @mcbarlowe consider that a mid-term plan.
IMO.
Matthew Barlowe
@mcbarlowe
Sep 20 2017 18:14
I already know some basic machine learning like svms and naive bayes and how to train and validate models so I'm not too far off I would think I just want be able to focus on it because I'm sure it's intense
Bharath
@bharath93m
Sep 20 2017 20:38
@erictleung Thanks
CamperBot
@camperbot
Sep 20 2017 20:38
bharath93m sends brownie points to @erictleung :sparkles: :thumbsup: :sparkles:
:cookie: 548 | @erictleung |http://www.freecodecamp.com/erictleung