These are chat archives for FreeCodeCamp/DataScience

Aug 2016
Max Matthews
Aug 08 2016 01:28
Hoping "someone out there will hear (my) prayer and show mercy" and give me an API key. 😊 Would like to build an open source dashboard where I can enter all my student's names who are using FCC and track their progress. May be helpful for other FCC communities to track groups of people at a time
Max Matthews
Aug 08 2016 01:37
Tagging @roelver because it looks like he's the one who can answer my "prayer/show mercy" after a little searching in Gitter
Aug 08 2016 13:13

@maxmatthews I haven't heard your prayer but I can try to help:

  • Indeed @roelver and also @QuincyLarson are possibly the people with more authority who can let you know about the status of the API for the public. As I have said, the API is ready but so far I don't know any case of other people getting access. Apologies for my ignorance.
  • Excellent idea! You are not the only person who have come to this channel asking to do the same - not that you are not original :) , it is simply that it looks the most natural thing to do for many of you. Can you please discuss this VERY LOUDLY at the City Group Leader channels, look for others who are trying to do the same and pray together? I think that will have a bigger effect.

I have already suggested this to other people in the Core Team but bear in mind that the current priority is new curriculum.

Aug 08 2016 14:17


I am consulting with the project, a Big Data project made in JS, about exercises in Big Data for the DataScience Room

The project is a relatively new one and seeks to compete in the same terrain as Spark for ML for Distributed Systems. My proposal to them is to find some exercises that could match the level of experience of this group and test the tool, possibly to analyse FCC data.

If I succeed to get some proper advice and possibly involvement by in the setup of a few exercises, I think the project would benefit several levels of Big Data experience in this room, included some advanced ones.

With that I hope that some of us with limited experience with Big Data could get a first grasp on it, while those who have more experience will be testing a different tool that is essentially based on JS (nodejs) for data analysis of distributed systems.

The possible advantage of both getting involved in this task would be:

  • For us, learning/improving Big Data skills in a language where we are already working (JS) + other ones like python or R
  • For them, start engaging possible users for their tool
  • For both, the possibility to be engaged in a project that could offer a bright future if the idea take off

I hope you like the idea. If so, wish me luck!!!

Eric Leung
Aug 08 2016 16:06
@evaristoc sounds cool! I haven't messed around with even Spark but if it is similar, it might be interesting to learn. Update us when you can! :smiley:
Aug 08 2016 16:22
Count on you!
Arijit Layek
Aug 08 2016 18:04
@evaristoc awesome idea!! @SamAI-Software FYI :)
Aug 08 2016 18:59
@alayek would you like to take part? You decide your level of involvement...
Arijit Layek
Aug 08 2016 19:12
@evaristoc sure, but this two weeks up to August 23, I am busy with curriculum expansion
So, if possible, maybe after that
Aug 08 2016 19:38
@alayek absolutely possible for you! :) We would be happy to have you if the project crystallizes as expected! I think there should be a few things to happen before we can actually talk about a project so let's see...
Aug 08 2016 20:03


More about the possible project

People at answered. The answer was not a direct one so here my interpretation (to be confirmed):

  • It should test the purpose of the tool they are building (Big Data manipulation, but also Data Science / ML features)
  • Therefore, any exercise should be preferable on HDFS
  • For them:

    We see interesting applications not only on datascience, but also in big-data in general, in the form of classical Extract-Transform-Loads (ETLs) jobs to pre-process or post-process various data sources, at whatever volume or complexity.

I am thinking about the following exercise, what do you think?

  • taking the historical data from several of our Gitter rooms
  • seed each room in a different HDFS node
  • run analyses; some that I can think of:
    • SNA tradition
    • analysis of camperbot performance
    • active engagement (simple measure of how long a person devotes to different rooms)

Be aware that if we use chatroom data this is the type of activities I believe we can carry out. We can still think together about other available data but it would be important to find a good dataset with interesting applications. It should also serve FCC too, if not directly at least profiling the FCC project. Think that it is an exercise prepared from the FCC DataScience room after all.

If everything goes fine, we should go from setting up the architecture, through the implementation of the tool until ending with an website that could show certain data upon request. Going from start to end.

It could take the form of a hackathon, if you like.

So... this is my idea. I will keep you informed anyway of the progress!