These are chat archives for FreeCodeCamp/DataScience

Sep 2015
Sep 21 2015 16:46

Hi people:
@BerkeleyTrue, @QuincyLarson, @dcsan, @benmcmahon100
@andela-bfowotade, @SaintPeter
@abhisekp, @biancamihai, @Lightwaves, @cdikibo, @AdventureBear, @mildused, @ArielLeslie, @qmikew1, @dting, @coding-choi, @techstonia

Did you know that…? If we define being OFF from the chatroom if the user waits more than 5 minutes or more to send the next message, being ON if the user sent messages during the period before being considered off, and assume the time for any first message to add 1 minute, we could say that in the Help room (again, Jan/Jul 2015):

  • There were around 19000 ON actions, including those that consisted in just 1 message.
  • There were about 8500 users who sent a message and went OFF after sending it.
  • The maximum continuous time being ON was about 3 hours.
  • The average ON time was 4.6 (OBS: likely affected by the definition)

OBS: This calculation is not about actual communication with someone: the data is about each user. Actually the data for one room won't be necessarily the same for others. But it help to characterize the communication in the different chats according to the topic, importance and utility of the chat rooms and add more information about dynamics.
Check the quick and dirty code with a graph at its current repo, "ON and OFF times (minutes) in the FCC Help Chatroom".

Quick Report:

DA app:

  • the project continues its advance; last week we concentrated on front end; so far we are building graphs in pure d3.js which could take some time to prepare
  • the project is still in github, no rendered

Text Mining:

  • having a look of the progress of the chat room analyses, I am seeing text mining not an activity on itself but actually as part of a broader broader project category which I am calling "Chat Room Analysis"; anything done with text mining will be commented there from now on.

Chat Room Analysis:

  • see Did you know that…? section
  • as you might have noticed we are elaborating metrics every week, using fixed data as sandbox; we will eventually combine those metrics to elaborate more complex analyses
  • there is also some ideas of applying basic social network analysis to see what pops up…
  • still interested in translating python into Javascript; for the moment the python code is kept "raw" to facilitate the readability


  • there is a small group who would like to take some time to carry out a small short survey; the project got the green light from FCC
  • currently the team is just 2 people: we have agreed to adventure in the project only if we get a team of 3-4

At News:

  • A short article indicating some links to currently popular modules for machine learning topics written in Javascript; you can find a lot more at npm

This Week…:

  • DA app: polishing the front end; start working the redis idea; heroku the current demo?
  • Chat Room Analysis: test the reliability of modified ubuntu baq corpus to capture requests and questions; relate the data; suggest machine learning to detect adequate periods of response per HOUR/DAY (obs: some data available already at repo)
  • Survey: find someone else
  • An to-all invitation pending (a bit busy myself these days...)
Sep 21 2015 19:06
Sep 21 2015 19:11
trying to understand the graph:
  • theres a bit more chat traffic for ziplines than bonfires
  • traffic bumped after 23 august but started declining a bit after
  • ~19k active chat users = who said something (on/off definition i didn't fully parse yet)
Rex Schrader
Sep 21 2015 19:17
@evaristoc Are you removing @camperbot messages? I notice that a lot of messages in the /HelpBonfires room are people interacting with the Bot.
Sep 21 2015 19:17
type bonfire name to get some info on that bonfire. And check HelpBonfires chatroom
Sep 21 2015 19:20
Hi, @dcsan:
  • It is likely that I should check the setting but for now yes: ziplines showed more traffic (I was particularly surprise...)
  • Yes: I think the trend in the declining could be taken as correct... more than "traffic" (ie visitors), it is the volume of messages per visitors actually...
  • ON: The "length" of one user sending message in the following minutes before we count 5 minutes of "total silence". If the user send a message in the next 5.00001 minutes, it is counted as it was OFF and then ON, starting at 5.00001.
@SaintPeter At the moment I am not making any distinction. Be aware that the data we are collecting with the "app" is different for the time being. In the app we are more about working the technicalities of the app; in the now Chat Analysis project I am more into exercising some possible metrics using static data.
But we are all interested in the impact of the bot, absolutely.
I am currently not counting that, but my idea was to do that too, for sure.
Rex Schrader
Sep 21 2015 19:24
@evaristoc I'm just thinking that you should filter out the Bot from your counts. In some ways the bot may be the most verbose of us all.
Sep 21 2015 19:34
@SaintPeter Good point... when working on Chat Analysis I am not working with bot data. I was planning to do some analyses on recent data tackling the bot issue after developing some metrics to work on users.
@SaintPeter Actually the analysis will get more complex in the future: I am analysing data before the bot and the splitting of the rooms. Now, all that data is more complex.
In gitter there are more rooms, and for a nice evaluation at least all the key rooms should be checked...
It is in fact likely that the split was between the reasons why people are focusing a bit less on those rooms: there are other places to look for... Assuming there are same number of people per month subscribing to FCC...
There was also: change of program, other new tools, wiki, etc... a lot changed...