These are chat archives for FreeCodeCamp/DataScience

Sep 2015
Sep 14 2015 18:25

Hi people:
@BerkeleyTrue, @QuincyLarson, @dcsan, @benmcmahon100
@abhisekp, @biancamihai, @Lightwaves, @cdikibo, @AdventureBear, @SaintPeter, @mildused, @ArielLeslie, @qmikew1, @dting

Did you know that…? If we consider that a person becomes no-active in Gitte Help room (ie. doesn't show activity) if they don't send any message after 15 days of their last participation, we got that 721 of the total participants (1726) in the Gitter FCC/Help room wer considered active when observing activity during 2014-12-30 and 2015-08-16. Similarly, if we assume that a participant is a frequent one if he/she has at least 5 messages at different days, that number is drastically lower: only 308 people were considered as frequent visitors, visiting Gitter Help room to an average rate of 42% of the all days between their first and last message, being the longest stack 83 days long.

The data above should be cautiously manipulated: there are sure lots of invisibles: for example people who are passively visiting Gitter (eg. those who come just to read). There are other factors that affect the presence of users at Gitter rooms. That doesn't make the information useless. Data of this kind is useful if you can find that it is "stable" or a "signature" of a particular room, and you want to see if some changes "disturb" that stability.

Quick Report:
@cdikibo hasn't been involved much in the project due to problems with her computer that it has taken her about 2 weeks long.

DA app:
-- We found some small issues and we dedicated the time to them: we were discussing the best way to organise files and today I was working on normalising the date representations of the data, to allow for future comparisons. In the next few days I will still present a first demo through C9. Let you know.

Text Mining:
-- In the few hours, between today and tomorrow I will load the results of a (very small) test of classification of randomly selected sentences from the Help room using nltk and the nps chat corpus.
-- Additionally I am working on a modification of the ubuntu chat corpus, in particular the bot-answerable questions (BAQ) corpus, in order to see if by making it more generic in its content, we can improve its precision. I will be also using the factoids list that comes with the corpus to modify the content of what we are considering as utterances (visit wikipedia). The modified file is ready but still not available.
-- The plan is to combine both corpora, the nps chat and the modified BAQ corpus, to check if that approach is good enough to capture tech questions in Gitter rooms.
-- There are some relevant challenges and "discoveries" about my last analysis that I could happily share with you if you want.

-- There is a proposal for carrying out a simple survey in the Gitter channels to collect demographic data. The proposal was mentioned to @BerkeleyTrue.

-- Since this week we are weekly loading links in the News of FCC about topics that talk about JavaScript and its role in data analysis and data science. This week: "Big Data and Nodejs". Serve yourself.

This Week…:

DA app:
-- Load first demo in C9, with stacked views for at least two rooms.
-- Evaluate an implementation of redis.

Text Mining:
-- Publish the modified files and scripts, likely presenting some python notebook.
-- Combine corpora, run the modifications and do simple tests.

-- Waiting for revision from FCC.

Room Promotion:
-- We will for the first time promote the room between other participants.

-- Load scripts on repos.
-- Propose to change python scripts into JavaScript to use things like nodejs to treat the data.