These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
quincylarson sends brownie points to @smithbrandon and @berkeleytrue and @roelver :sparkles: :thumbsup: :sparkles:
@alicejiang1 Planning to join you at Berkeley's edX training. It is more PySpark.
Also noticed that the skale.me's syntax resembles the Spark's one substantially. I think they did it so as an invitation for Spark users to easily transition into skale.me. So if you learn one, you will have an easy introduction to the other one.
Check DataCamp and kaggle. Remember that in kaggle you can try the FCC Survey 2016.
Check (of course...) kaggle, DataKind, and DrivenData.
@alicejiang1: going already for the second week of the first course (Introduction). I always had this idea that Spark was an amazing tool. I have been in contact with the Big Data / Data Science community for a while but my first introduction to Big Data was Hadoop. I was not until a conference about 1.5 year ago about Spark where I met personally Paco Nathan, one of the founders of Spark / Databricks. Really nice guy. He is a real promoter of Open Source and Learning Access for Everyone.
Of all, what surprise me the most is the Databricks Platform. It is amazing that you can do some Big Data / Data Science exercises without troubling much about settings. This people in Berkeley really did an excellent job.
Immediately after that meeting about Spark everyone in the Big Data circles here in Amsterdam were starting to talk about it and never went back. I remember one guy who was a regular attendant to those meetings saying "Hadoop? Hadoop techs are dead!". He is a Scala guy who by that time was working for a start-up Big Data company here in Holland making projects in the automobile sector. By that time it was more about electricity use because they were more into electric cars, but I wouldn' t be surprise they are moving into the Internet of Things in that sector...
This will be the first time that I go through it. I am lucky to know SQL (for Hive), python and Hadoop, as well as some idea of distributed/multiprocessing theory and practice: it makes it easier.
displayfeature for displaying a dataset? Have you seen that you can also plot data, etc when using that feature? Well, plottings seems to be in d3.js or similar...
If you want to do it, go ahead and let me know? @alicejiang1 said she was doing it too... We can use this channel to discuss it... we can eventually come with some projects involving node, JS together with Spark/skale.me...
I am obcessed by that project, the skale.me one... I think it could be a really interesting stuff... but maybe I am the only one who thinks that :)
alicejiang1 sends brownie points to @darwinrc :sparkles: :thumbsup: :sparkles:
I've been slowly going through udacity descriptive statistics and i'm on the 6th lesson.
The stuff I really need to learn is PDFs, Conditional probability, Bayes Rule, etc especially on the algorithms/implementation side of things.
@alicejiang1 and for you, second course means this one I am doing, the CS105, doesn't it? I think they are ordered as:
First == CS105 (Introduction)
Second == CS110 (Data Analysis)
Third == CS120 (Distributed)
If that true, I would expect the Third one (that recently closed) the most difficult one of those 3... But you said it was not that difficult, didn't you?
And I might be wrong in the ordering anyway...
@alicejiang1 @Lightwaves @darwinrc I am enough sparkled for today.... Done with 2 sections... I am maybe finishing the training tomorrow. Just sitting on my bench, coffee, a few snacks and... done. Happy that I can do it now so quickly, some months ago I think I would have struggled a lot...
See you around!