These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
Just for general interest... outdated reference though (so maybe not 2$ per day; probably other settings too)
I think after installing Hadoop something like HBase or Hike (?) is required to manipulate the data.
Then there is that we can simply load data in HDFS/Hadoop by using command line:
Now, the data will be stored as files (very much like a simple mongoDB format with no schema at all). HBase is one of the many ways to give a schema or ordering to the no-structured data. All the existing options that run on top of HDFS (Cassandra, for example) seems to provide different advantages and disadvantages.
About how to set a Spark system - it should be similar for skale according to what I have read in the skale chatroom:
For this exercise the simplest option should be the one.