These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
I have done few projects regarding Relation Extraction myself, nothing fancy but enough for what it was meant to. A simple Apriori algorithm was ok for what I did. If you are in a more advanced assignment you are asked to work with graphs and ontologies, probably. So you have already a classified set of rules to follow so you can assign terms to objects in the text.
Sorry R is not my main tool but there are people here who might help? If you are insisting on using supervised methods is because you have some data? If you describe your approach and what your train/test dataset is about there are people here who could help, me included. Please show some link to your code and datasets? (try to keep this chat thread as clean as possible, please).
@erictleung already mentioned the applicability of some methods. You might have to compare several of them. NB trends to be hardly effective with unbalanced classes, which is usually the case for any text classification where some entities are usually under-represented.
I searched quickly and found an article that might interest you:
(I REALLY REALLY LOVE doing this, but many people come here with questions that are already answered on Internet so please do your research?)
It seems LR is ok? I like random forest too for your case. SVM is excellent but won't really work if a good normalization of the data can't be achieved, which is usually the case with text data. Normalizing text could be really tough and data intensive.
Having you good datasets I would think of AdaBoost but IMO it will add value if you have 1 or 2 strong classifiers and few other weak ones. I invite you to find a boosting procedure that works better with some sort of fuzzy rules IMO.
@erictleung: do you have any experience with relationship extraction techniques applied to your field?
I hope this can help.
@SauravDeb Here some very quick examples of what I was suspecting regarding deep learning (convolutional but I would say recurrent could be more effective but harder to implement) I haven't really read them:
Here something using word2vec that you could also read:
I hope this helps. Good luck!!