These are chat archives for FreeCodeCamp/DataScience

4th
Oct 2015
evaristoc
@evaristoc
Oct 04 2015 09:54

Hi, @qmikew1! The text mining is currently progressing.

I have been trying very simple techniques with verbose python to check if translating from python to JS makes sense.

The purpose of the text mining project at the moment is not considering sentiment analysis: only "speech act", particularly finding questions and requests for help, but also beginning and end of conversations. In the last part of project (this week) I tried a lazy, instance-based analysis comparing "distances". A methodology similar to k-NN where k is the whole training dataset. The training dataset I have been using is a popular corpus. I was using it to compare composition and the same corpus modified to compare sentence structure, to get two measures. Then I just quickly checked if there were regions in the "scatter plot" of those distances that better grouped the targeted sentences.

I was using a small sample with data from one of the campers mentioned in the DataScience list above and the results were promising, although still with some caveats.

The results hasn't been published yet. I would like to find time to compare to other technique first. I am afraid I will have to use more python capabilities for that...

@qmikew1 The general idea of the room is also to invite people to be proactive by presenting/joining small projects? Projects can be a week long, and the only requirement would be that JS should be used at some point. It is not compulsory: you can just simple check room progress: you have been in the list for some time, so feel free to do as you want, "advisoring" is also fine.

evaristoc
@evaristoc
Oct 04 2015 11:06
This message was deleted
Michael Krebs
@michael-krebs
Oct 04 2015 11:38
Is there a built-in gitter chat messages export thing, or are you guys just scraping?
evaristoc
@evaristoc
Oct 04 2015 14:06
Welcome @michael-krebs! "Scraping", if that is the word. GitHub and Gitter are open source and public. Everyone registered at Gitter has access to Gitter's API's. We are using them following some "etiquette" though. White Hats, always.