These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
rajathrao sends brownie points to @rhistina :sparkles: :thumbsup: :sparkles:
email alert data
How does the message look like? Can you show an example? Is just that (alerts)? It seems to be a relatively easy problem, although I need to know more.
Don't understand why you are saying "data is not constraint". You are in fact asking to solve a problem when data is a constraint.
Ok. My purpose was to realize if it was indeed possible that your were dealing with a relatively simple problem. I would assume I know the sort of messages (positive and negative) you have.
One usual approach is to use semi-supervised learning. That means that you might have some additional work to do, I am afraid.
Within the techniques it is one that I have seen called "self-training" in some reference books, which would be like using an iterative process of labeling data and adding it to your existing labeled dataset in order to label more data. The thing is, you or a group of you should still check that the labeling was correct.
If you follow that approach, I would suggest to keep it simple at the beginning. If bagging, I would suggest to use no many models and those should be relatively simple. You should expect poor performance at the beginning. Why? You still have to train your classifier over the initial small dataset, so don't get fancy.
There are other techniques applicable for these situations and they might depend on the kind of data you have. In particular I have found that "self-training" technique allowed me a better focus on feature engineering during the process. Having a small dataset could help you to find those relevant features that might distinguish your model in further steps. However, keep your mind open and be ready for changes - having small data means that you might not have a representative set and therefore it might seriously underfit for a larger number of added examples.
Let me know if this helps.