These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
@becausealice2 yes I found the news too. Things are changing. Did you know that in order to keep to the rules, all those platforms must hire more people in order to double check? They haven't create the right algorithm yet that can detect those "outliers".
It might be an statistical issue:
In the pharmaceutical sector, every new medicine must go through a long way of clinical trials to show that they are safe. The analysis is done based on statistics over a sample. If everything goes well any side effect might appears in a very few of the test subjects.
However those samples are relatively small to the population. When they start to sell after the trial is completed they might reach a larger section of the population and the side effects will pop up more clearly.
It is like this: suppose the medicine side effect affect a 0,1% of the population. Let's say they found out testing a sample of size 1,000. So during trial the medicine affected just 1 person. Now suppose the medicine sells to 1,000,000 people. How many people are likely to suffer from side effects?
It is possible that it is the same for Machine Learning processing. You test your model over a small size of the total number of events. Even if largely accurate (let's say 99,99%), there are always a few "falses" that will pass the test. If you have to test a substantial amount in production, those few "falses" can be actually a large amount, depending of the total target population size. You need people to help you out with those ones.