These are chat archives for FreeCodeCamp/DataScience

19th
May 2017
satenndrra
@satenndrra
May 19 2017 06:53
@evaristoc sure anytime!
evaristoc
@evaristoc
May 19 2017 10:14

@jahala :

  • "Is something like that forecasting tool the right one for the job?... What I want to try to get some insight into is seasonality and trying to forecast about a half year in advance " - I think you are in the right direction. I think that time series analysis will be the first option applicable to your project.
  • "something like data about how many, and which cash-registers have been open in a store... When dealing with data that is just integers - and which has a lower "bounds(?)" (never negative value) .." - Counts + Categorical variables: If you have checked your data already (missing values, etc)? For counts you might want to make some transformations, to see if they reproduce different results? Question: Did you find that "cash-register" was a factor? Interesting!
  • "very often forecasted a negative trend" - I am sure this is a model issue that you should be able to deal with, probably making some imputations accordingly? It depends on the assumptions of the model. Those abnormalities are usually treated based on assumptions and transformations. Expect abnormalities (I am referring to accuracy in this case) with EVERY single model you are planning to try, even the most precise one.
  • "If at some point it is combinable with number of visitors, try to estimate how many positions should be open (2-4 hours in advance)" - It seems to me Operations Research territory, with forecast? traditionally this is tackled using model-based techniques (ie. probabilistic models). Obs: because they are model-based they are based on imposing model assumptions, not from emerging patterns from data (ie. ML). The trick is to find the best model.
  • "I'm guessing this is more "ML" territory?" - why not? Whatever works for you. I tried once Random Forest for a forecast exercise similar to yours and it was good but it was slower than a simpler made-up time series regression which was slightly less precise.

A note...
Notice that I am not saying that you shouldn't try other implementations. But when you ask if you should use ML because the grained precision you are looking for ("2-4 hours in advance") makes me wonder about the growing myth around ML techniques and the way people think about Data Science in general. I think many people are overrating the ability of the emerging techniques to predict things, renouncing to well known methods. This is a mistake!!! ML are HEURISTIC methods that might or might not be applicable to your data. Sometimes they are an overkill, either because they reproduce the same values than a more simple procedure or even worst. Even if the results are more precise, the cost of implementation could be higher than the effects of the error got by implementing a simpler but less computational expensive procedure.

Occam's Razor: if you have two models predicting the same thing with similar results, always pick the simplest one...
Einstein's Corollary: ... although NOT TOO simple: rather the simplest one that better explain your phenomenon...
Industry Corollary: ... unless it is too expensive; then pick the cheapest one unless the other one gives me a SERIOUS competitive advantage in my specific market. Even worst: if needed buy both patents and kill the more expensive one whenever possible.

Leonardo Raduy Lemos
@PunkDado
May 19 2017 18:46
@evaristoc @becausealice2 @satenndrra Hi Guys!
@evaristoc asked me two questions. The first one I can already answer.
questions were: (1) how many people who said they were taking FCC as resource for learning to code had previous experience programming and jobs in the sector
(2) demographics of the above
In 2017, from 13803 who took FCC as resource, 2720 (20%) already work as software developers
Leonardo Raduy Lemos
@PunkDado
May 19 2017 18:52
Next step is to work on the demographics. Anyway, I'm pushing the dataset on the Survey repository https://github.com/PunkDado/2017-new-coder-survey/tree/master/analysis
Leonardo Raduy Lemos
@PunkDado
May 19 2017 18:58
Sorry guys, wrong repo. The dataset is in my Repo: https://github.com/PunkDado/2017-new-coder-survey/tree/master/analysis
Alice Jiang
@becausealice2
May 19 2017 20:28
Hey guys! I have a lot going on right now that I need to catch up on, so I can't step in until the explanatory visualizations are ready to be built, but I am keeping an eye on the progress so I'm not very clueless ;)
evaristoc
@evaristoc
May 19 2017 22:53

People

I am taking a pair of Specializations in Coursera. Good courses, but... NO-ONE is taking them. In one it is apparently only me...

I would say, with all my sorrow, that Coursera is going for a VERY bad moment. I wouldn't be surprise if they close...

Pity because the courses are of very good quality, but no-one is paying attention to the students. NO-ONE, even Coursera staff. Apparently went on holiday and never came back...
@becausealice2 :+1: !
Alice Jiang
@becausealice2
May 19 2017 22:56
Poor coursera :(
Leonardo Raduy Lemos
@PunkDado
May 19 2017 22:58
@evaristoc, what are u taking? How come "only you"?
I learned R at Coursera with Johns Hopkins
evaristoc
@evaristoc
May 19 2017 23:04

@becausealice2
we could be working from data that @PunkDado will make at his repo.

@PunkDado :
One in Recommender Systems by Minnesota. EXCELLENT. But just me in the Recommender Evaluation Section. The problem is when you have to get peer reviewers, as in this specific course about Evaluation. Then after finishing the courses you should enter the Capstones which are peer-reviewed. The Recommender Course is paid per month so I will skip the Capstone with all the pain of my soul. But the other one, Data Mining and Information Retrieval, it was to be paid per course. I paid in advance and already reached the Capstone. I am just looking at others who also are taking it. It is a STRUGGLE waiting for some to review your work.
Then if you have questions or problems, you are on your own. Horrible experience.

evaristoc
@evaristoc
May 19 2017 23:14
@PunkDado
Can you make a simple distribution of no-experience/experience by country? I think region (eg. sub-continent) makes more sense for the amount of data. Not sure if you have it available? Can you please make a file for both? Not only in terms of having a previous job but also in ranges of years programming? It should be json. R allows you to produce a json file.
@becausealice2
We are probably preparing 1-2 maps to show world distribution of respondents based on programming and job experience. That's the plan. @PunkDado is preparing the data and not sure if he wants to spend some time trying some d3.js or using R libraries for a similar purpose ;) .
Alice Jiang
@becausealice2
May 19 2017 23:15
I'm playing one of my games to help flush all the garbage out of my brain before I get back to work, and someone found out I do data science and then asked me for help them figure out how to clean a dataset. They didn't like the advice I gave and I'm getting into trouble because the guild leader thinks I'm picking fights :unamused:
"You can use a programming language or else do it by hand, I guess" is apparently bad advice PFFFFFFF
Maps I can do
evaristoc
@evaristoc
May 19 2017 23:16
Hahahahaha!
Very pedagogic...
Alice Jiang
@becausealice2
May 19 2017 23:17
I didn't hold hands while I was a mentor in the FCC channels, why should I do it in game where I only go to get away from work? xD
It still follows me there X.X
evaristoc
@evaristoc
May 19 2017 23:17
Harrasment!?
Probably the same people you helped at the same channel...
These people don't learn...
Alice Jiang
@becausealice2
May 19 2017 23:18
Probably lol
evaristoc
@evaristoc
May 19 2017 23:18
:) :) :)
Alice Jiang
@becausealice2
May 19 2017 23:18
They said it's not what they're going into professionally, it's just for a class that's required to graduate
and I'm just like "okay so what part of that makes it so you can be upset when you're given advice instead of the solution?"
Am I crazy for thinking 30k observations of 25 features wouldn't be that bad to hand-clean?
keeping in mind it's a dataset for an introductory level analysis course
evaristoc
@evaristoc
May 19 2017 23:35
hahahaha!
Next time you should recommend a laundry...
@becausealice2
Going to bed! Take care all of you!
Alice Jiang
@becausealice2
May 19 2017 23:36
:joy: I should have. Sleep well!