These are chat archives for FreeCodeCamp/DataScience

14th
Dec 2017
Josh Goldberg
@GoldbergData
Dec 14 2017 00:29
Thanks. Read both articles. Yeah we were talking two different definitions.
What are your thoughts on the article?
@becausealice2
Alice Jiang
@becausealice2
Dec 14 2017 00:34
Generally speaking I agree with him. I like his distinction of analysts vs scientists, though I think he's not giving enough credit to the technical knowledge of analysts as a foundation to build off of.
The theories behind their behaviour, maybe, but I disagree with his point that everything should be forgotten and relearned
Josh Goldberg
@GoldbergData
Dec 14 2017 00:43

I thought the author was a little too “I figured this out” - ish. I do see where he is coming from as I work in the field and deal with some of the issues he’s brought up. Especially with reproducibility. But I’m not sure I’m on board with “software developers can do anything, including data science. They’re just not interested.” That statement is somewhat careless to me and it makes it seem as if true foundational knowledge and expertise in statistics is not necessary. I’ve been to talks where the presenter worked for a data science consulting firm. He had no clue how the algorithms worked (only the arguments specified in the documentation). He couldn’t answer moderate-depth questions that seemed reasonable and something he should know, especially when he’s charging for these models he’s deploying.

I personally know a software developer who’s looking to get into data science. And I wouldn’t say he’s “one online class away” from displacing seasoned statisticians with less coding experience. While functions and packages are built out in popular software domains to execute these tasks with little effort, it is quite dangerous to treat these tools as black boxes or to thinly understand them. Taking a shotgun approach to model selections and hoping something sticks is troubling I think also, since you may not know why one mode necessarily outperforms the other.

I will note that best practices of software development (all the things mentioned in the article) are in the process of being carried over to data science. So I don’t think it’s necessary to become a junior developer first to gain the skills needed to be a modern data scientist. The author kind of mentioned this but I felt a little too softly. Anyway, it’s a good dialogue as I don’t think it’s wise to be either extreme (only software with vague stats or only statisticians with vague software); somewhere in the middle in this case is probably the best.

Alice Jiang
@becausealice2
Dec 14 2017 00:48
I forgot that he had said that software devs were super soldiers. I agree with you on that. I wouldn't want to work with anyone under a title containing "data" who didn't have at least basic knowledge of stats, depending on what exactly their job entails
But at the same time, he's not entirely wrong. It's entirely possible to build models without that expertise using modern technologies. It's foolish and puts reliability on shaky ground, but it's still possible
Timothy Javins
@timjavins
Dec 14 2017 01:28
@becausealice2 You're awesome. Thanks for discussing that.
CamperBot
@camperbot
Dec 14 2017 01:28
timjavins sends brownie points to @becausealice2 :sparkles: :thumbsup: :sparkles:
api offline
Timothy Javins
@timjavins
Dec 14 2017 01:34
@GoldbergData & @mcbarlowe , Thanks for adding to the discussion. It was valuable!
CamperBot
@camperbot
Dec 14 2017 01:34
timjavins sends brownie points to @goldbergdata and @mcbarlowe :sparkles: :thumbsup: :sparkles:
:cookie: 124 | @goldbergdata |http://www.freecodecamp.org/goldbergdata
:cookie: 140 | @mcbarlowe |http://www.freecodecamp.org/mcbarlowe
Alice Jiang
@becausealice2
Dec 14 2017 01:40
Also thanks to @erictleung for sharing the articles. I really like them
CamperBot
@camperbot
Dec 14 2017 01:40
becausealice2 sends brownie points to @erictleung :sparkles: :thumbsup: :sparkles:
:cookie: 554 | @erictleung |http://www.freecodecamp.org/erictleung
Timothy Javins
@timjavins
Dec 14 2017 01:50
thanks @erictleung
CamperBot
@camperbot
Dec 14 2017 01:50
timjavins sends brownie points to @erictleung :sparkles: :thumbsup: :sparkles:
:cookie: 555 | @erictleung |http://www.freecodecamp.org/erictleung
Josh Goldberg
@GoldbergData
Dec 14 2017 02:24
@erictleung yes thank you for the articles.
CamperBot
@camperbot
Dec 14 2017 02:24
goldbergdata sends brownie points to @erictleung :sparkles: :thumbsup: :sparkles:
:cookie: 556 | @erictleung |http://www.freecodecamp.org/erictleung
Josh Goldberg
@GoldbergData
Dec 14 2017 02:24
@becausealice2 appreciate your response.
@timjavins you’re welcome! Thanks for reading my long rant. Lol
CamperBot
@camperbot
Dec 14 2017 02:25
goldbergdata sends brownie points to @timjavins :sparkles: :thumbsup: :sparkles:
:cookie: 140 | @timjavins |http://www.freecodecamp.org/timjavins
Alice Jiang
@becausealice2
Dec 14 2017 02:25
:+1:
Eric Leung
@erictleung
Dec 14 2017 02:44
@becausealice2 @GoldbergData thanks for the lively discussion! I agree with the points you've Bott brought up. It can read as a bit extreme in is claims, but I think the core message of it is still good. The definition of "data scientists" is quite amorphous haha.
Alice Jiang
@becausealice2
Dec 14 2017 03:00
OH! I just remembered something
When I finished the deep learning nanodegree by Udacity, I went through their feedback slack channel to see if anything I had to say hadn't yet been said, and I found this...
Screenshot (8).png
and my eyes rolled so hard I gave myself whiplash. I'm still mad that someone who hoped to get into DL was so careless with basic statistics as to claim 99% of students--who hadn't actually been surveyed on this topic--agreed on something
I think it's the point you were trying to make @GoldbergData that just because we can make models blindly because technology makes it possible doesn't mean that we should, right?
Josh Goldberg
@GoldbergData
Dec 14 2017 03:37
Yes. But even more so what can happen is people with expertise in a field can make silly mistakes outside of their discipline state-of-mind, if you will. Check out the Linda effect. It hits home on this as an example. Behavior economics research has basically found that we have two brains: one for doing our domain work, and the other for everything else. Basically, we are prone to these misjudgments and flaws even when we have offsetting knowledge.
@becausealice2
Josh Goldberg
@GoldbergData
Dec 14 2017 03:51
@becausealice2 moreover, I hate this platitude of “there is a balance,” but this I agree. You don’t need to have DEEP knowledge to deploy reasonable models (in my light-experienced opinion), but what makes this notion dangerous are peoples’ idea of what DEEP knowledge is to them. It’s a moving target in both directions, depending on who you’re speaking to. Data science is such a new field. People are trying to figure out where they belong, best practices, and what skill sets can predict future success as an employee (not sure they’ve even found this out in most jobs, let alone software development which has both been around much longer than data science). The author’s analogy is shaky on airplanes and aerodynamics, since the concept of building and flying a plane has been basically figured out. Theory statistics, well, is still heavily theory (I know, tautology). Practical statistics (business use-case statistics, which I’ll consider an umbrella for everything considered statistical learning) is no where near the standardization of building and flying an airplane. Once we reach (if we ever reach) that level of standardization, then we can maybe make analogies like this one. But we are far from it. Practical statistics, data science, business case statistics, is too nascent, idiosyncratic, and undeveloped to put people at the wheel of production models that are relied upon in any meaningful way. Could I read up on some documentation and quick tutorials on some crazy algorithm with some data for my job? Yes. But I wouldn’t feel comfortable deploying something I don’t understand. What if the inputs change? The dynamics? How would I know how to react. Eh, anyways, #rant.
Btw. How is Udacity’s DL ND? @becausealice2
Alice Jiang
@becausealice2
Dec 14 2017 04:11
I didn't like it
There was a strange balance in micromanagement
Josh Goldberg
@GoldbergData
Dec 14 2017 04:12
I’m doing the data analyst ND to brush up on python since I’m more of an R user @becausealice2
@becausealice2 really? that sucks.
Alice Jiang
@becausealice2
Dec 14 2017 04:12
They taught lessons with specific code examples, and then the projects were essentially the same as the lessons, just different variable names
Josh Goldberg
@GoldbergData
Dec 14 2017 04:13
@becausealice2 that seems largely useless
Alice Jiang
@becausealice2
Dec 14 2017 04:13
But then on the flip side, if you didn't come in with the mathematical knowledge of what goes on under the hood, the explanation was "don't worry, it just works"
It was time and money spent with all but one week I could have learned faster and for free with TensorFlow docs and Google searches
Josh Goldberg
@GoldbergData
Dec 14 2017 04:14
eh….some people are comfortable with that. I am not
@becausealice2 good to know. I appreciate this feedback.
Alice Jiang
@becausealice2
Dec 14 2017 04:15
I never have been. I had to drop calculus and get a waiver from the state to graduate high school with insufficient math credits because the teacher took that approach to explaining what appeared to be vanishing variable.
When I asked where they went she said don't worry about it, so I told her she was an awful teacher, and she stopped grading my assignments.
Josh Goldberg
@GoldbergData
Dec 14 2017 04:15
can I make a random recommendation? Has anyone heard of brain.fm?
Alice Jiang
@becausealice2
Dec 14 2017 04:15
Then my options were going back to math for 7th graders or appealing to the state lmao
Josh Goldberg
@GoldbergData
Dec 14 2017 04:16
@becausealice2 wow. that’s horrible. Some people don’t have the depth of knowledge to teach
@becausealice2 that’s hilarious. The way I see it. I’d rather understand something once. Then decide that it’s useless. Not decide something is useless I don’t understand….
Alice Jiang
@becausealice2
Dec 14 2017 04:18
The first week ... Maybe I'm thinking of the first module... idk.... of the dlnd was okay. We built mini TF from scratch. And their instructions for installing and setting everything up was on my level. But again, if you didn't come in with enough mathematical understanding it was explained away by "don't worry, just copy this code"
Quincy Larson
@QuincyLarson
Dec 14 2017 04:57
@evaristoc this analysis is great! I'm going to reach out to the author. Thanks for discovering this and sharing it with us!
CamperBot
@camperbot
Dec 14 2017 04:57
quincylarson sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:
:cookie: 384 | @evaristoc |http://www.freecodecamp.org/evaristoc
Quincy Larson
@QuincyLarson
Dec 14 2017 05:40
@erictleung These DSLore articles are gold! Thanks for sharing them with me!
CamperBot
@camperbot
Dec 14 2017 05:40
quincylarson sends brownie points to @erictleung :sparkles: :thumbsup: :sparkles:
:cookie: 557 | @erictleung |http://www.freecodecamp.org/erictleung
Alice Jiang
@becausealice2
Dec 14 2017 05:55
Took two days but I FINALLY got JDK, JRE, SBT, Scala, and Intellij installed and running error free (so far)
Finally gonna achieve those Scala/Spark dreams :sparkles:
Eric Leung
@erictleung
Dec 14 2017 07:36
@becausealice2 @GoldbergData after @QuincyLarson mentioned the "DSLore articles are gold!" I realized I haven't read the other articles on there. Turns out they are truly gems! Here's one on the debate on how much statistics you need for a data scientist with some fun small quizzes :smile:
@becausealice2 nice! :+1: Those are a pain to setup....
Alice Jiang
@becausealice2
Dec 14 2017 07:38
Tyty :)
:laughing:
Quincy Larson
@QuincyLarson
Dec 14 2017 20:19
@becausealice2 These are gold :)

@erictleung I'm personally fond of:

What do you call a statistician who lives in San Francisco?

A Data Scientist

Alice Jiang
@becausealice2
Dec 14 2017 20:21
I only got about half right, and the only one I didn't have to guess was Hadoop :(
Quincy Larson
@QuincyLarson
Dec 14 2017 20:25
@erictleung I get the impression - like most engineering fields - it's less about pioneering breakthroughs and more about using off-the-shelf tools to get things done. So in some cases, I wonder if "data engineering" would be a better term, but that itself is an even more engineering-focused discipline.
Alice Jiang
@becausealice2
Dec 14 2017 20:32
My most favoritest camper from FCC is a data engineer now :D
and that's about as much as I can contribute to this discussion right now.
Eric Leung
@erictleung
Dec 14 2017 21:17
@QuincyLarson yeah, it appears there is a spectrum of "data science". And thanks for sharing that article :+1:
CamperBot
@camperbot
Dec 14 2017 21:17
erictleung sends brownie points to @quincylarson :sparkles: :thumbsup: :sparkles:
:star2: 1369 | @quincylarson |http://www.freecodecamp.org/quincylarson
Eric Leung
@erictleung
Dec 14 2017 21:18
To anyone interested, here's a Machine Learning 101 slidedeck by Google. It's pretty comprehensive and claims to be "the culmination of almost 2 years of head banging, so you don't have to." :laughing:
I skimmed through it and does highlight quite the spectrum of machine learning, it's past, the various methods, and applications.
Is this what I think it is? Machine Learning as a Service (MLAAS)?! https://algorithmia.com/ :smile:
evaristoc
@evaristoc
Dec 14 2017 22:40

@erictleung @becausealice2 @QuincyLarson and rest. My opinion about Data Scientists?

I try to put attention to the term "science".

I already mentioned here sometime ago what I heard from the Data Science Leader of Amazon Europe when asked a similar question by the audience. Instead of explaining, he posed a real problem Amazon was facing and asked the audience to think of a solution. The problem was more interesting because by the time they solved no data was available.

He asked people to keep their answers for themselves (we were about 200), continued his talk and left their answer for later in the talk.

Then he gave the solution to the problem: they had to design an experiment to collect data. The purpose was to formulate and test a model. Keeping the details apart, some people showed surprise.

He mentioned that many applicants are mainly trained in applying "recipes" to those problems but were unable to think "out of the box".

If you ask what my solution was, I at least passed the test that an experiment was required :). Exactly the details of the experiment I didn't though, sorry.

The conclusion of the talk was that to fulfill the role of a data scientist, ideally you should be able to "design" experiments that are able to bring up valid information from large amount of data and even being able to produce relevant data when it doesn't exist.

In a simple, common scenario, those experiments could be for example model comparisons, and for bringing relevant data where it doesn't exist, feature selection and feature engineering.

Of course, you should be able to discharge results that make no sense, eg. spurious correlations, (some funny examples here).

In real terms though how much ideal data science is done might be decided by the knowledge of the practitioner, business goals and budget, and if you buy or hire. IMO those are key factors determining the spectrum (@erictleung's words). I can certainly confirm the existence of that spectrum based on what I have seen and heard.