These are chat archives for FreeCodeCamp/DataScience

Aug 2016
Aug 15 2016 08:56


For all of us taking the trainings in pySpark: general info:

  • The courses are hosted by edX as a xSeries in Spark offered by Berkeley University
  • There are 3 courses:
    • An Introduction (CS105x)
    • Data Analysis (CS110x)
    • Distributed (CS120x)
  • CS105 is still open because was extended until September (@alicejiang1 finished; @darwinrc is still doing it and @Lightwaves and I just started this weekend); the new edition of CS110 starts today (you can register at any time: we are all registering in this one too); CS120 just closed but you can visit the course content and do the exercises although won't be scored (ask @alicejiang1 if you want to know more about it)
  • They are teaching pySpark and SQL Spark on the Databricks platform (very nice!)
  • Python is a requirement and I would say regex; the training will be easy if you have basic to intermediate knowledge of:
    • Big Data technologies
    • python (2.x; specially pandas/matplotlib/numpy libraries will become handy)
    • functional programming (Spark is Scala) and map-reduce
    • regex (you will be using it in several exercises)
    • some basic knowledge of data analysis (I saw some regression analysis, I haven't seen all courses yet)
    • some SQL
  • The main focus is on learning to use the tool and its advantages for managing Big Data over things like Hadoop; for ML you should follow other trainings
Alice Jiang
Aug 15 2016 23:28
I can answer questions about the Spark xSeries as well as the Microsoft Data Science Curriculum on edX, if anyone wants to know more about them :)