These are chat archives for FreeCodeCamp/DataScience

1st
Oct 2016
Caleb Martin
@caleb272
Oct 01 2016 04:52
hey
awe shit its been a while since someone was on here
does anyone know where i can get the world data for the last d3 challenge
evaristoc
@evaristoc
Oct 01 2016 08:44

@caleb272 Hi and welcome!

This is a topic room about data science, works async.

Have you tried the corresponding Help room? Or the forum?

You can maybe find the link to the map in the prototype example in codepen.

Good luck!

Caleb Martin
@caleb272
Oct 01 2016 08:45
@evaristoc yup already found the link in the prototype thanks
CamperBot
@camperbot
Oct 01 2016 08:45
caleb272 sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:
:cookie: 315 | @evaristoc |http://www.freecodecamp.com/evaristoc
evaristoc
@evaristoc
Oct 01 2016 08:45
:+1:
Caleb Martin
@caleb272
Oct 01 2016 08:46
yea i have been ease dropping on this chat for a while now so im not really new but yea
evaristoc
@evaristoc
Oct 01 2016 08:46
Hope you find here something of interest then! Stay connected!
evaristoc
@evaristoc
Oct 01 2016 10:49

People,

A MS Data Science Summit

Some presentations available now on Internet:
https://channel9.msdn.com/Events/Machine-Learning-and-Data-Sciences-Conference/Data-Science-Summit-2016#fbid=VPGk5_sIEiE
evaristoc
@evaristoc
Oct 01 2016 13:15

People

For those interested in information retrieval: Perplexity

I knew about entropy measures but first time I hear about this to be honest. Very well explained in wikipedia:
https://en.wikipedia.org/wiki/Perplexity

@erictleung: there are LOTS of post about ML in biotech but probably something this might interest you?
http://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/

Sorry: wrong link:
http://www.montefiore.ulg.ac.be/~geurts/Papers/geurts09-molecularbiosystems.pdf

Hèlen Grives
@mesmoiron
Oct 01 2016 13:24
@razerh0 the future of IT; hmm they will introduce the next big thing. As always after some time it will be mainstream. Open source will continue to lower starting points faster. We will arrive at faster PCs because they are still poring out old models. It will take some time to become less elitists that is once schools know how to update faster with less budget and girls see the point of fun doing it at school. It thrives much on bubbles before it settles in reality. Predictions made 30 years ago are now becoming mainstream. So the fancy things from today will then much comfortable to work with as it has reached critical mass. IT threat is still steep learning curves as a churning market need tools that people can use in a fast manner to transition. For too many people the investment in knowledge is still to cumbersome. When women enter the field pay tend to go down. What can be seen as a lowered entrance or positive as sharing becomes the reciprocal collaborative rule. Shoot me some arguments ;-)
evaristoc
@evaristoc
Oct 01 2016 13:37
@mesmoiron agree: mainstreamming the use of tools and lowering average wages for what it is now a high skilled labour. Computers also allow for centralisation (1 man instead many) and re-location (cheaper locations). It won't be the first time.
@Lightwaves the article you posted: :+1:
Hèlen Grives
@mesmoiron
Oct 01 2016 17:54
A bit off topic; but it deals with bias in teachers and preschoolers. What I think is important take away when making predictions based on biased/faulty data. The error is so subtle as it is not an objective account of reality. ML must deal with inhereted faults as opposite to calculated errors http://www.npr.org/sections/ed/2016/09/28/495488716/bias-isnt-just-a-police-problem-its-a-preschool-problem
So if you take the data set and blindly believe it to be true; then you can predict the untrueness of it; what in this case is the reality, but actually not the reality because what was collectively seen was not correct. A bit I know that you know that I know - case. A very nested form of reality.
Lightwaves
@Lightwaves
Oct 01 2016 19:57
I think it goes back to what the author of weapons of math destruction said about models b "Models are opinions embedded in mathematics."
We can either create a biased model without realizing it or our data can be biased
So we have to be hyper-aware of that
evaristoc
@evaristoc
Oct 01 2016 20:50

@Lightwaves @mesmoiron: yea... it is not really about trying to discredit stats: it is about the conclusions that people reach from it.

Data is good as it can: you have to understand it and try to correctly use it. Otherwise, the quote: "There is lies... and statistics" would fit completely.

This applies to stats, as to many other sectors:

Garbage in, garbage out.

Wrong analyses, wrong conclusions.

In general there is a lot of misleading out there. Unfortunately is a wide-spread bad practice from which people are even becoming richer or powerful: misinformation. Why? The general opinion doesn't understand statistics. So it is more a question of responsibility from the researcher to provide true information, respecting the right of the people of not having to know statistics, which it would be unrealistic anyway.

My personal opinion? When not about the scientific sector, I am ready to tolerate some license from some people to softly manipulate information in order to commit an audience if no-one is hurt, particularly if it is not for a bad cause and up to certain level of falsity. However there is a threshold I am not ready to bear. I hate very bad use of stats even when used to support things I am in favour. One thing is supporting a cause, another is lying.

One of the problems the sector is currently facing is in fact the large amount of people who has got a Data Science degree with poor knowledge of statistics, for example. There are a LOT of data scientists out there that are more "tool-driven" without a proper understanding of the analyses. Unfortunately, old fashion school, there is no good conclusions without a good analyst: tools won't do that for you all the time. And the more complicated the case, the more you require a strong statistics background and knowledge about the sector you are analysing. Otherwise, the chance that you reach an incorrect conclusion would be tremendously high.

@Lightwaves just review the video of the Spark course about the conclusions from Google about the decline of facebook using Google Trends and how facebook replies.

evaristoc
@evaristoc
Oct 01 2016 22:52

For those interested in Bayesian approaches for the analysis of latent variables with multinomial distribution:

Here what it seems the original paper about Latent Dirichlet Allocation:
https://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf

A very enjoyable reading (Advanced!!!)

Also to check eventually (for me...): Dickey's work about censored data.