These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
I recently attended to a talk from the DS manager of Amazon Berlin. Very interesting thing, he mentioned a couple of key stuffs that no-one has mentioned in the past as a key skill from a data science. He called "metric linkage". He said that a data scientist should be able to create them. But what is that?
Here an example by challenging you:
As you know, Amazon is also selling perishable products. In order to satisfy quality standards they have hired a group of QA employees who main role is to verify the quality of those products before being send to buyers.
However, the company is pursuing to automate almost everything. One of the targets is to find ways to involve a system that could also automate the process of QA of perishable products.
The example was for strawberries.
if you are asked to implement a system to evaluate the quality of the fruits, what would you do?
The QA activity is very manual and based on experience. It also involves evaluating not only visual cues but also tactful or organoleptic attributes of the fruit.
Again this idea of measuring a distance hasn't happened to me yet, but for what I have been discussing with other people I think there are who would refuse to claim that that measuring should be seriously taken. Why? The position of the groups will rely A LOT on your corpus and words are actually NOMINAL variables, so in theory they lack ordering. In theory, you can always re-group them according to the meaning you want to give them.
However, a sort of distance is always implemented in practice. One simple example is Sentiment Analysis. A more clear example in my opinion is just the following:
If you can check what they are doing for creating those grouping you might get a good starting point, I think...