These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
@mesmoiron Thanks. I do not necessary need a confidence metric but need to know if the 1st group is related with the 2nd.
@evaristoc Thanks. I've used a little bit WordNet and WordNet Domains, and there is a cumbersome because I have groups of words from a variety of domains. Also, I've used cortical API (SimService seems similar). It's really good they use fingerprints to find relationships, but I wouldn't like to depend on an API. The problem is that I want to disambiguate authors, and there are cases that I can't say they are different just comparing their names; I need other attribute. I'm using some subjects, which also varies in language. So I have two groups of words, and I want to determine if those words are similar or not.
Until now, I have a small implementation of Google Distance, but it takes a great amount of time, tried some syntactic similarities, and cortical API.
What I'm going to do now is see exactly what offers SimService API, SEMILAR Project, and review this paper.
Thank for your help. If anyone have other idea thanks.
cuent sends brownie points to @mesmoiron and @evaristoc :sparkles: :thumbsup: :sparkles:
apottr sends brownie points to @adventurebear :sparkles: :thumbsup: :sparkles:
@cuent No worries! Nice project!
@cuent just out of curiosity:
The problem is that I want to disambiguate authors, and there are cases that I can't say they are different just comparing their names; I need other attribute.
And the other attribute is...? Text written by the author? Are they scientific authors? Are you trying to work with citations?
I'm using some subjects, which also varies in language.
Does the project involve translations? Or do you mean the writing style? Or rather the content, assuming that the content is usually specific to the author?
Is Google Distance giving you a nice accuracy?
cuent sends brownie points to @mesmoiron :sparkles: :thumbsup: :sparkles:
@cuent: I assume "subject (categories)" of the article ~= "keywords"
@koustuvsinha and I did something similar for another problem, without the translations. @koustuvsinha can you contact @cuent?
I think you are already in the right direction. Yes: you probably need better attributes. Institution is definitively one. If you can get additional data about Main Research Area per institution that could help.
Another problem that you are confronting is that some authors are invited to participate in articles not related to their field. In that case, I suggest to consider the importance of the position in the author list accordingly (if I am not wrong, the importance convention changes per field).
I have the impression that you might need other methods for a better evaluation but I guess you have to keep it simple? :)
Before getting fancier, try simpler solutions?
@mesmoiron forensic linguistics: interesting topic!