These are chat archives for FreeCodeCamp/DataScience

13th
Dec 2017
Alice Jiang
@becausealice2
Dec 13 2017 00:07
Northwest what? @evaristoc
evaristoc
@evaristoc
Dec 13 2017 09:54

@becausealice2
Northwest University? Didn't you have an interview with someone there?

@becausealice2
My first course in Data Science was with Cloudera. They originally offered services in infrastructure and software, mainly Hadoop (Map/Reduce). They probably still do. They were pioneers in the sector back in 2012 when people were starting thinking about Big Data infrastructure.

However I suspect their business might have shrunk a bit due to tough competition. If so they are likely working in partnership with the now biggest players (first Amazon, then IBM / Google / MS, then others) who offer cloud services instead of in-house.

Courses are good but expensive. I think for the certs you must know Hadoop suite?

evaristoc
@evaristoc
Dec 13 2017 09:59

PEOPLE

For those interested in Big Data / Data Science I recommend to learn more about h2o.io platform. Also the Google one. Databricks is also fine.
At the moment I am a bit not dealing with Big Data, being able to handle my projects from my desktop, but if you are interested in manipulating Big Data files, try those.
Observation: Kaggle is apparently using h2o.io.
evaristoc
@evaristoc
Dec 13 2017 10:05

PEOPLE

I remind you that I AM LOADING DATASETS IN KAGGLE ABOUT FreeCodeCamp.

This is part of the Open Data Initiative, so you will have the chance to apply h2o.io on those datasets too if you like!!

Be aware that you can practices big data manipulation with small files. One of our projects will be reaching the 5Gb of data (still not loaded).

The maximum file size for "Datasets" in Kaggle is 10Gb. So our data is relatively big for the current standard.

For "Competitions", Kaggle might allow datasets reaching the Tb.

evaristoc
@evaristoc
Dec 13 2017 12:37

PEOPLE

For a project I had in mind, just an overview of the Survey 2017:
porcentage of users that might have been using the following resources to learn coding in 2017:
  • Codecademy : 51.7%
  • CodeWars : 10.2%
  • Coursera : 24.2%
  • CSS : 25.7%
  • EdX : 17.8%
  • Egghead : 7.4%
  • FCC : 75.9%
  • HackerRank : 11.3%
  • KA : 20.9%
  • Lynda : 14.1%
  • MDN : 35.3%
  • OdinProj : 5.4%
  • Other : 5.6%
  • PluralSight : 13.2%
  • Skillcrush : 2.5%
  • SO : 61.7%
  • Treehouse : 12.4%
  • Udacity : 21.1%
  • Udemy : 28.2%
  • W3S : 53.7%
What is SO??
evaristoc
@evaristoc
Dec 13 2017 12:53
evaristoc
@evaristoc
Dec 13 2017 12:59

PEOPLE:

Also to let you know that there have been someone TEACHING REGRESSION TECHNIQUES using THE NEW CODER SURVEY in KAGGLE.

She belongs to the Kaggle Team by the way:
https://www.kaggle.com/rtatman/regression-challenge-day-5

@erictleung @QuincyLarson ^^^
evaristoc
@evaristoc
Dec 13 2017 14:13

Resuming the simple analysis about how people were using different resources, this is what they reported in the 2016 survey.

porcentage of users that might have been using the following resources to learn coding in 2016:
Blogs : 0.2%
Books : 0.9%
CodeWars : 10.0%
Codecademy : 61.4%
Coursera : 31.0%
DevTips : 6.2%
EdX : 22.2%
EggHead : 0.2%
FCC : 70.0%
Google : 0.4%
HackerRank : 0.2%
KhanAcademy : 24.0%
Lynda : 1.0%
MDN : 0.2%
OdinProj : 10.8%
Other : 13.4%
PluralSight : 22.8%
Reddit : 0.2%
SkillCrush : 0.2%
SoloLearn : 0.2%
StackOverflow : 1.2%
Treehouse : 2.7%
Udacity : 21.2%
Udemy : 26.4%
W3Schools : 0.8%
YouTube : 0.8%

syamkumar
@syam3526
Dec 13 2017 14:57
anyone know how to use machine learning models in android without cloud support
syamkumar
@syam3526
Dec 13 2017 15:09
@/all anyone know this
Alice Jiang
@becausealice2
Dec 13 2017 18:20
@evaristoc Northeastern, I've already dumped all of my information about that visit here
Eric Leung
@erictleung
Dec 13 2017 19:39
@evaristoc awesome update! Keep it up with the data initiative. Wish I could make more time to help out :frowning: And wow, really solid regression lesson there :+1:
Interesting 2-part article for being a data scientist and sums it up in the titles in that you are a software engineer first and data analyst roles are poison. I think there's truth in both of those claims in case you're looking into new jobs.
Alice Jiang
@becausealice2
Dec 13 2017 20:30
statistics and probability are baked into much of the software we use but we no more need to think about them daily than a pilot needs to think about the equations of aerodynamics.
+100
Matthew Barlowe
@mcbarlowe
Dec 13 2017 21:01
There is some truth to that but I think verges on being dangerous flippant especially if you are expected to interpret the results as well as build the models
Alice Jiang
@becausealice2
Dec 13 2017 21:13
I wouldn't let anyone on my DS team who didn't have a solid foundation of statistics and probability, even if it's not a very big one. The point I took away from that comment is that you only need to know "enough"
Was it Mary Poppins who said "enough is as good as a feast"?
But, especially with big companies, this is a team sport. There's no reason the people making the software should also be the ones interpreting the results it produces
Josh Goldberg
@GoldbergData
Dec 13 2017 21:18
Wouldn’t it be difficult to build and refine a model if you aren’t also interpreting the results?
At least in the development phase? Though the development phase probably never ends.
Alice Jiang
@becausealice2
Dec 13 2017 21:22
you don't need to interpret results to refine a model.
It's super helpful and useful, but as long as you know what the results should (or in some cases, shouldn't) look like, you can fine tune things and get it going where you need it to go
Josh Goldberg
@GoldbergData
Dec 13 2017 22:19
How would you refine it if you don’t know what the result is?
Alice Jiang
@becausealice2
Dec 13 2017 22:19
I think you and I have two different definitions for "results"
Josh Goldberg
@GoldbergData
Dec 13 2017 22:19
And by interpreting results, I don’t mean frequently as in consuming the results like an end user would
Alice Jiang
@becausealice2
Dec 13 2017 22:20
and I also get the sense that you didn't really read through the two articles...
By results I
Josh Goldberg
@GoldbergData
Dec 13 2017 22:20
I caught the tail end of the conversation. Sorry. I didn’t see a link. Wasn’t aware this was related to an article
Alice Jiang
@becausealice2
Dec 13 2017 22:20
Im referring to what the end user is consuming, not all the data spit out in the creation process....
Two articles