These are chat archives for FreeCodeCamp/DataScience
discussion on how we can use statistical methods to measure and improve the efficacy of http://freeCodeCamp.com
For those who might try to open the Torrent dataset with python:
My version of of the dataset came as a line-by-line file; the best way to read it in my case was by using either readlines or readline methods.
You could then use the json library to read it all but you need to get rid of the no-json characters at the beginning and the end of the line you want to read (eg. the inline character).
NOTE: it is huge! So far I needed 9.0 GiB of memory just to open it. I have enough memory for one reading, but if you don't you would have read it in bunches... Perhaps this can help: http://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python