These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
@maxmatthews I haven't heard your prayer but I can try to help:
I have already suggested this to other people in the Core Team but bear in mind that the current priority is new curriculum.
The skale.me project is a relatively new one and seeks to compete in the same terrain as Spark for ML for Distributed Systems. My proposal to them is to find some exercises that could match the level of experience of this group and test the tool, possibly to analyse FCC data.
If I succeed to get some proper advice and possibly involvement by skale.me in the setup of a few exercises, I think the project would benefit several levels of Big Data experience in this room, included some advanced ones.
With that I hope that some of us with limited experience with Big Data could get a first grasp on it, while those who have more experience will be testing a different tool that is essentially based on JS (nodejs) for data analysis of distributed systems.
The possible advantage of both getting involved in this task would be:
I hope you like the idea. If so, wish me luck!!!
People at skale.me answered. The answer was not a direct one so here my interpretation (to be confirmed):
We see interesting applications not only on datascience, but also in big-data in general, in the form of classical Extract-Transform-Loads (ETLs) jobs to pre-process or post-process various data sources, at whatever volume or complexity.
I am thinking about the following exercise, what do you think?
Be aware that if we use chatroom data this is the type of activities I believe we can carry out. We can still think together about other available data but it would be important to find a good dataset with interesting applications. It should also serve FCC too, if not directly at least profiling the FCC project. Think that it is an exercise prepared from the FCC DataScience room after all.
If everything goes fine, we should go from setting up the architecture, through the implementation of the tool until ending with an website that could show certain data upon request. Going from start to end.
It could take the form of a hackathon, if you like.
So... this is my idea. I will keep you informed anyway of the progress!