These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
Northwest University? Didn't you have an interview with someone there?
My first course in Data Science was with Cloudera. They originally offered services in infrastructure and software, mainly Hadoop (Map/Reduce). They probably still do. They were pioneers in the sector back in 2012 when people were starting thinking about Big Data infrastructure.
However I suspect their business might have shrunk a bit due to tough competition. If so they are likely working in partnership with the now biggest players (first Amazon, then IBM / Google / MS, then others) who offer cloud services instead of in-house.
Courses are good but expensive. I think for the certs you must know Hadoop suite?
I remind you that I AM LOADING DATASETS IN KAGGLE ABOUT FreeCodeCamp.
This is part of the Open Data Initiative, so you will have the chance to apply h2o.io on those datasets too if you like!!
Be aware that you can practices big data manipulation with small files. One of our projects will be reaching the 5Gb of data (still not loaded).
The maximum file size for "Datasets" in Kaggle is 10Gb. So our data is relatively big for the current standard.
For "Competitions", Kaggle might allow datasets reaching the Tb.
Also to let you know that there have been someone TEACHING REGRESSION TECHNIQUES using THE NEW CODER SURVEY in KAGGLE.
She belongs to the Kaggle Team by the way:
Resuming the simple analysis about how people were using different resources, this is what they reported in the 2016 survey.
porcentage of users that might have been using the following resources to learn coding in 2016:
Blogs : 0.2%
Books : 0.9%
CodeWars : 10.0%
Codecademy : 61.4%
Coursera : 31.0%
DevTips : 6.2%
EdX : 22.2%
EggHead : 0.2%
FCC : 70.0%
Google : 0.4%
HackerRank : 0.2%
KhanAcademy : 24.0%
Lynda : 1.0%
MDN : 0.2%
OdinProj : 10.8%
Other : 13.4%
PluralSight : 22.8%
Reddit : 0.2%
SkillCrush : 0.2%
SoloLearn : 0.2%
StackOverflow : 1.2%
Treehouse : 2.7%
Udacity : 21.2%
Udemy : 26.4%
W3Schools : 0.8%
YouTube : 0.8%
statistics and probability are baked into much of the software we use but we no more need to think about them daily than a pilot needs to think about the equations of aerodynamics.