These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
@caleb272 Hi and welcome!
This is a topic room about data science, works async.
Have you tried the corresponding Help room? Or the forum?
You can maybe find the link to the map in the prototype example in codepen.
caleb272 sends brownie points to @evaristoc :sparkles: :thumbsup: :sparkles:
@erictleung: there are LOTS of post about ML in biotech but probably something this might interest you?
@Lightwaves @mesmoiron: yea... it is not really about trying to discredit stats: it is about the conclusions that people reach from it.
Data is good as it can: you have to understand it and try to correctly use it. Otherwise, the quote: "There is lies... and statistics" would fit completely.
This applies to stats, as to many other sectors:
Garbage in, garbage out.
Wrong analyses, wrong conclusions.
In general there is a lot of misleading out there. Unfortunately is a wide-spread bad practice from which people are even becoming richer or powerful: misinformation. Why? The general opinion doesn't understand statistics. So it is more a question of responsibility from the researcher to provide true information, respecting the right of the people of not having to know statistics, which it would be unrealistic anyway.
My personal opinion? When not about the scientific sector, I am ready to tolerate some license from some people to softly manipulate information in order to commit an audience if no-one is hurt, particularly if it is not for a bad cause and up to certain level of falsity. However there is a threshold I am not ready to bear. I hate very bad use of stats even when used to support things I am in favour. One thing is supporting a cause, another is lying.
One of the problems the sector is currently facing is in fact the large amount of people who has got a Data Science degree with poor knowledge of statistics, for example. There are a LOT of data scientists out there that are more "tool-driven" without a proper understanding of the analyses. Unfortunately, old fashion school, there is no good conclusions without a good analyst: tools won't do that for you all the time. And the more complicated the case, the more you require a strong statistics background and knowledge about the sector you are analysing. Otherwise, the chance that you reach an incorrect conclusion would be tremendously high.
@Lightwaves just review the video of the Spark course about the conclusions from Google about the decline of facebook using Google Trends and how facebook replies.
Here what it seems the original paper about Latent Dirichlet Allocation:
A very enjoyable reading (Advanced!!!)
Also to check eventually (for me...): Dickey's work about censored data.