These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
already_working. That data will also be in the first part. However, columns in the 2nd part that are in the form of a question e.g.
"What's your gender?"are exclusively in the 2nd dataset. @evaristoc let me know if my thinking is correct. If so, I can stop focusing on the 1st dataset and just work on the 2nd one.
@erictleung The only headers that appear in both .csv files are as follows:
# Other None Start Date (UTC) Submit Date (UTC) Network ID
#,"How old are you?"..., which indicates the first column. The first column has strings like
@sudeepnarkar @ozkoc Thanks for your kind words! Gathering the data was just a small part of it. Analyzing it is the real work :)
@krisgesling thanks! Yes - I am amazed at the geographic diversity of the responses.
@erictleung We shouldn't need two seperate CSV files - we should be able to get everything into one file. Once we've merged everything from part 1 of the survey into part 2, we can probably just delete part 1 and rename part 2. If you can do this once, you will save everyone a ton of trouble down the line.
quincylarson sends brownie points to @sudeepnarkar and @ozkoc and @krisgesling and @erictleung :sparkles: :thumbsup: :sparkles:
@erictleung @joeybuczek according to @koustuvsinha the networkID is acting as the primary key between the two files. I haven't check. @joeybuczek the chances that someone skipped the second one highly possible. The chance of duplicate data is also possible. It is likely that the survey will have several discrepancies, we have to try to identify them and at least reduce its impact.
EVERYONE BE AWARE: you have to make conclusion based on the context of how rigorous the fieldwork was. Those who know statistics: you could make observations about the validity of some of the conclusions. I suggest to keep your conclusions within the context in which the data was gathered and be VERY CAREFUL with any generalisation to a larger audience.
@twolfe2 thanks for your interest! What kind of visualization skills do you have? You can always head to the GitHub survey repository and take a look at the questions, initiate conversation on them on what kind of visualizations you think would work, or even ask some questions yourself!
@jboxman best of luck finishing the rest! Looking forward to seeing what kind of visualizations we can make out of the data.
@zydecat glad to have you interested in using the data for your master's dissertation! I don't think @QuincyLarson should have a problem with it but let's wait on his approval :smile:
erictleung sends brownie points to @twolfe2 and @jboxman and @zydecat and @quincylarson :sparkles: :thumbsup: :sparkles:
@jboxman yeah, I think @QuincyLarson will want to store this in a db with an API for people to more easily query it in the future.
And yes, I'm a part of that cleaning and combining effort :smile: I should finishing up soon. In the meantime, you can at least explore and familiarize yourself with the raw data.
jboxman sends brownie points to @erictleung :sparkles: :thumbsup: :sparkles: