These are chat archives for FreeCodeCamp/DataScience

3rd
Nov 2017
evaristoc
@evaristoc
Nov 03 2017 10:00 UTC

Great @timjavins!

That is an important observation. I am evaluating it as an option to host datasets when compared to other dedicated dataset-storage platforms like datadotworld or kaggle. It might be a good place for those things you mentioned but I am still not sure if it is the best place to host fCC datasets when compared to dedicated dataset-storage platforms like kaggle or datadotworld.

However, BitTorrent has other advantages that we really like.

The following is a list of the main attributes we are expecting to get from the selected data-storage host(s):

https://github.com/freeCodeCamp/open-data/issues/19#issue-270577767

Whatever your level of experience is, I would like to hear your opinion. You might be surprised how valuable that can be.

evaristoc
@evaristoc
Nov 03 2017 10:21 UTC

People

For those still looking for places where to find data, here something that can help you:
https://www.re3data.org/
evaristoc
@evaristoc
Nov 03 2017 10:47 UTC

@timjavins would you do me a favour?

Can you test the current dataset in BitTorrent and try to download it?
http://academictorrents.com/details/030b10dad0846b5aecc3905692890fb02404adbf

I need at least 4 tests apart of mine. I will ask you questions about from where you downloaded it and the pros and cons you found.

If you find problems I won't give you many clues : you have to try to troubleshoot yourself simulating someone who is doing it without no much help than the provided on Internet to solve any issue. Only there we will know if there is enough information to solve it and where to find it. No finding solutions is still information to share.

Hope you can help?


People

if anyone in this Room can help, please? I need at least 4 test from other users, either frequent BitTorrent users or not.
evaristoc
@evaristoc
Nov 03 2017 10:53 UTC
The more tests, the better.
Quincy Larson
@QuincyLarson
Nov 03 2017 14:54 UTC

@/all Kaggle just published a survey of 16,000 data scientists and made all the full dataset open! I’m planning to publish a summary of their findings on Monday, like I've written about Stack Overflow and Oreilly's datasets in the past.

Please take a look at the dataset, and if you all find any interesting insights from this dataset, let me know. I’ll credit you in my article. Here’s their announcement and links to their datasets: https://www.kaggle.com/surveys/2017

Josh Goldberg
@GoldbergData
Nov 03 2017 15:27 UTC
I’ll try to take a look for sure @QuincyLarson
I read the announcement a few days back. Really interesting stuff @QuincyLarson. I’d be looking this weekend. What would be the latest I can get you something by?
Timothy Javins
@timjavins
Nov 03 2017 20:39 UTC
@evaristoc I've got my client working on it, but there are only 2 leachers and 1 seeder, all of which are offline.