These are chat archives for FreeCodeCamp/DataScience

4th
Nov 2017
Arun Kumar
@arunkumar413
Nov 04 2017 01:42
@evaristoc, i was given a task to find the regression for
the price of diamonds vs the weight (carat) and calculate the R squared value
Alice Jiang
@becausealice2
Nov 04 2017 02:07
I feel personally attacked by Kaggle's visualizations.
Josh Goldberg
@GoldbergData
Nov 04 2017 02:44
@becausealice2 personally attacked? Lol
Matthew Barlowe
@mcbarlowe
Nov 04 2017 02:48
@arunkumar413 If youre using the diamond data set you're probably using R i assume the function to get R^2 is lm()
Josh Goldberg
@GoldbergData
Nov 04 2017 02:49
Yes R is very capable of running linear regression @arunkumar413
Arun Kumar
@arunkumar413
Nov 04 2017 03:18
Pandas
@mcbarlowe I'm using python pandas
Matthew Barlowe
@mcbarlowe
Nov 04 2017 03:21
Have you tried calling .r2 class method on your OLS model?
without seeing what code you've attempted it will be hard for me to tell what exactly you need to do
Alice Jiang
@becausealice2
Nov 04 2017 03:46
Kaggle, of all people, have no excuses for the rookie mistakes made in those visualizations
evaristoc
@evaristoc
Nov 04 2017 14:48

@arunkumar413

For your case, I wouldn't advise an scratch approach.

You might need to install a more statistical-oriented library to help pandas. I have used statsmodelslibrary in the past and I definite like it although it is not a popular option nowadays.

You can also use:

  • scipy
  • numpy
  • sciktlearn

There are surely many other libraries.

There is substantial resources online about how to do a regression with all of those libraries and pandas, so please try that before asking? Let us know if you are in trouble after trying them. I would suggest not to post full errors, or at least make external links to them?

Good luck!

@QuincyLarson thanks! that is a very good post!!!
CamperBot
@camperbot
Nov 04 2017 14:52
evaristoc sends brownie points to @quincylarson :sparkles: :thumbsup: :sparkles:
:star2: 1363 | @quincylarson |http://www.freecodecamp.com/quincylarson
evaristoc
@evaristoc
Nov 04 2017 14:53

@timjavins Thanks! What do you mean with

"2 leachers and 1 seeder, all of which are offline"?

I am not very much knowledgeable with BitTorrent terminology, sorry.

CamperBot
@camperbot
Nov 04 2017 14:53
evaristoc sends brownie points to @timjavins :sparkles: :thumbsup: :sparkles:
:cookie: 136 | @timjavins |http://www.freecodecamp.com/timjavins
Matthew Barlowe
@mcbarlowe
Nov 04 2017 14:56
@evaristoc leechers are people downloading the file and a seeder is the person making the file available for download
evaristoc
@evaristoc
Nov 04 2017 15:06
@mcbarlowe I see: because it is distributed. So if a seeder is not available them the file won't. Ok, I understand. I guess that 1 leecher was @timjavins himself.

@timjavins can you please let me know from where (country) and at what time you tried to download?

@mcbarlowe - would you be ok in try a test too?

It is a big file. You don't have to decompress it, only to try to complete a full download.
Matthew Barlowe
@mcbarlowe
Nov 04 2017 15:08
What’s the link to the torrent
Some else? ^^^ - Just 2-3 more tests.
Matthew Barlowe
@mcbarlowe
Nov 04 2017 15:10
yeah noone is seeding it or rather if they are there torrent client is offline
evaristoc
@evaristoc
Nov 04 2017 15:10
@mcbarlowe you decide how far you want to go and let me know your observations.
Matthew Barlowe
@mcbarlowe
Nov 04 2017 15:10
Ok someone is seeding it now
evaristoc
@evaristoc
Nov 04 2017 15:11
Ok. That is then a deterrent. The file must be available on demand.
IMO.
Matthew Barlowe
@mcbarlowe
Nov 04 2017 15:11
yeah torrents work off peer to peer connections
evaristoc
@evaristoc
Nov 04 2017 15:11
:+1:
Matthew Barlowe
@mcbarlowe
Nov 04 2017 15:11
you can always host it yourself if you have plenty of bandwith
evaristoc
@evaristoc
Nov 04 2017 15:13
I have to be a client? What do I have to do to be a host?
Matthew Barlowe
@mcbarlowe
Nov 04 2017 15:14
well I just meant you can always leave your torrent program open so people can constantly connect to the torrent and download it
evaristoc
@evaristoc
Nov 04 2017 15:15

I am reading this:
https://lifehacker.com/5534190/how-to-share-your-own-files-using-bittorrent

We are using academictorrents, I think they are connected to universities.

But I was estimating a permanent connection.

Disclaimer: it was not an option I suggested in the first place. it was decided by other people in the group before I came.

Matthew Barlowe
@mcbarlowe
Nov 04 2017 15:18
Yeah torrents are peer to peer connections and are only as permanent as the people hosting the file and data
evaristoc
@evaristoc
Nov 04 2017 15:18
BitTorrent works on the UDP protocol, a broadcasting-oriented one. Not sure about security? Don't remember, read about it long time ago...
Matthew Barlowe
@mcbarlowe
Nov 04 2017 15:19
I can help seed it after I download it cause I'm on fiber and have a pretty good bandwidth
got no idea on protocols that's beyond me
evaristoc
@evaristoc
Nov 04 2017 15:20

Just a reminder for all of us:

https://www.bleepingcomputer.com/tutorials/tcp-and-udp-ports-explained/

UDP stands for User Datagram Protocol. Using this method, the computer sending the data packages the information into a nice little package and releases it into the network with the hopes that it will get to the right place. What this means is that UDP does not connect directly to the receiving computer like TCP does, but rather sends the data out and relies on the devices in between the sending computer and the receiving computer to get the data where it is supposed to go properly. This method of transmission does not provide any guarantee that the data you send will ever reach its destination. On the other hand, this method of transmission has a very low overhead and is therefore very popular to use for services that are not that important to work on the first try. A comparison you can use for this method is the plain old US Postal Service. You place your mail in the mailbox and hope the Postal Service will get it to the proper location. Most of the time they do, but sometimes it gets lost along the way.

evaristoc
@evaristoc
Nov 04 2017 15:25

@mcbarlowe I am not sure. Could be but let's decide that together?

These are the characteristics we are looking for in the data hosting platforms:

Stick to 1 or 2 data storages that balance:

  • the best storage in size
  • the best visibility to the dataset
  • the best visibility to fCC
  • the best of the additional features that better help the user
  • the simplest procedures
  • the less disruptive to users
  • comply with the open data philosophy

I don't want to force a solution that might prove impractical when left alone. It has to work without too much monitoring.

I might ask you again for a possible test though.
Matthew Barlowe
@mcbarlowe
Nov 04 2017 15:26
well so far it hasn't been to bad I was able to get a connection relatively quickly and now someone with a very fast upload is seeding it
evaristoc
@evaristoc
Nov 04 2017 15:27
My current opinion is that we might need not one but a combination of two data storages: one to present a partial amount of data for people to get in contact with it, the other one a big storage facility to keep the full data for those who want to use the whole stuff.
@mcbarlowe :+1:
No experienced with BitTorrent. Not sure how you manage to see that.
But good!
Matthew Barlowe
@mcbarlowe
Nov 04 2017 15:30
my client gives me all the info on the download who the seeds are and what rate I'm downloading them from
evaristoc
@evaristoc
Nov 04 2017 15:30
If you seed it, do you have to use a special program related to academictorrents?
I will also wait for more info from @timjavins. We need probably 2-3 more tests. I am also probably testing another option, https://datproject.org/. I will ask here if someone want to help with that one too.
Matthew Barlowe
@mcbarlowe
Nov 04 2017 15:50
no you just leave your torrent client open academictorrents is just a site that hosts the torrent file
evaristoc
@evaristoc
Nov 04 2017 16:26

Is academictorrent in that sense a seed, @mcbarlowe? Or academictorrent need still seeds?

Did you succeed?

Matthew Barlowe
@mcbarlowe
Nov 04 2017 16:35
Yes I have the file and no academictorrent is not a seed. A seed is a person who has the file on their computer and makes it available to download to other people
evaristoc
@evaristoc
Nov 04 2017 16:51
Ok, @mcbarlowe. Thanks a lot, doctor! We are on track then.
CamperBot
@camperbot
Nov 04 2017 16:51
evaristoc sends brownie points to @mcbarlowe :sparkles: :thumbsup: :sparkles:
:cookie: 133 | @mcbarlowe |http://www.freecodecamp.com/mcbarlowe
evaristoc
@evaristoc
Nov 04 2017 16:53
Mentioning you in the issue about BitTorrent. Do you have a github account? Maybe would be nice if you leave at least a simple message in this section?:
freeCodeCamp/open-data#11
evaristoc
@evaristoc
Nov 04 2017 17:08
@mcbarlowe ^^^
Also @timjavins
evaristoc
@evaristoc
Nov 04 2017 17:49

People,

Just for information:

Looking at the moment into common-crawl and found some of their activities rely on a python library, https://github.com/ikreymer/pywb.

common-crawl produces TERABytes of data PER MONTH - far beyond my current memory capacity :( .

evaristoc
@evaristoc
Nov 04 2017 18:06

Tebibytes. 84 Tebibytes only for one of the formats.

If you sometime gets interested, you should need to have a hand at map-reduce in AWS.

I need some info but just for a bunch of links.
Hmmm.... I don't know...
evaristoc
@evaristoc
Nov 04 2017 18:13
There is also an API for those interested in common-crawl