These are chat archives for FreeCodeCamp/DataScience

18th
Jan 2017
Biniam Haddish
@biniamHaddish
Jan 18 2017 08:11
hey can anyone tell me how can i start data science course here?
Biniam Haddish
@biniamHaddish
Jan 18 2017 08:27
yep
Eric Leung
@erictleung
Jan 18 2017 08:27
@biniamHaddish hello! Are you asking about whether freeCodeCamp has a data science course? There is no data science program at freeCodeCamp currently. There are fantastic data science courses, however, at edX and Coursera. Others might have more recommendations. I don't have specific courses but generally, I would say choose a programming language (R or Python), pick a data set, and ask questions about the data.
Biniam Haddish
@biniamHaddish
Jan 18 2017 08:29
that's right .
Eric Leung
@erictleung
Jan 18 2017 08:29
An article on a data scientist's experience and toolset. I got the most out of his toolset list. https://jeffersonheard.github.io/2017/01/being-a-data-scientist-my-experience-and-toolset/ I didn't know about the Odo Python package that convert your data between various common data formats!
@biniamHaddish if you have specific questions on doing data science, there's plenty of people here that can lend you their wisdom.
Biniam Haddish
@biniamHaddish
Jan 18 2017 08:39
@erictleung thanks a lot on the comment !
CamperBot
@camperbot
Jan 18 2017 08:39
biniamhaddish sends brownie points to @erictleung :sparkles: :thumbsup: :sparkles:
:cookie: 447 | @erictleung |http://www.freecodecamp.com/erictleung
Hèlen Grives
@mesmoiron
Jan 18 2017 10:27

Well I got Anaconda to work with NLTK; so it will become an exciting day. I need to toy around the various tools just to see how I will handle my messy data. It isn't even raw data yet. I think making a toolset cheat sheet will be probably a good idea. In R I found Views that installs modules for given related tasks and pipelines. That is certainly useful. If someone comes accross image analysis. Just keep me in mind. I am not actively searching for that, but images are also powerful in conveiing messages. I could do a few by hand as examples just to see what I can come up with; you never know what computing could find.

A quick question about scraping: I was scraping the Harvard site and I noticed that using the browser web console I wasn't able to find and select the data I wanted to scrape. So I resorted to copy and paste. Can it be that some data is hidden or many levels deep burried?

@erictleung thank you
CamperBot
@camperbot
Jan 18 2017 10:28
mesmoiron sends brownie points to @erictleung :sparkles: :thumbsup: :sparkles:
:cookie: 448 | @erictleung |http://www.freecodecamp.com/erictleung
Alvin
@alvin-odins
Jan 18 2017 16:26
i need some tips on how to plot interractive maps using R, with the idbr, package.
evaristoc
@evaristoc
Jan 18 2017 17:13

Hi @alvin-odins for what I am finding you might need to make calls from your frontend or backend to the different datasets and formats that idbr handles and then work the data on your app?

It seems that idbr is an API to census data?

Here an example NOT using idbr but I think the blog might contain some examples of your interest?
https://walkerke.github.io/2016/07/spatial-neighbors-in-r---an-interactive-illustration/

It might be that you should be using shiny package at some point? Not sure...
evaristoc
@evaristoc
Jan 18 2017 17:20
@mesmoiron scrapping is becoming more difficult... many organisations are rendering pages using JS and dynamic pages. The simple scrappers work on static pages mostly.
Congrats for installing Anaconda and NLTK! Good Luck! Ask questions here if needed.
Amelia
@apottr
Jan 18 2017 18:43
JS and dynamic pages makes scraping easier if anything
just have to know what to look for
evaristoc
@evaristoc
Jan 18 2017 22:00

@apottr can you suggest a technology and experience? I am also interested as I found myself stuck in the past.

Forced to think on Selenium to trigger JS commands... I didn't work any scrapping solution at the end and looked for another resource.

Amelia
@apottr
Jan 18 2017 22:02
As people start using angular and react to render out dynamic experiences, they need a way to get that data into the page. Usually if you sleuth around a bit you'll be able to find a JSON url or something along those lines to give you a more processed, easy to use form of the data.
So while it doesn't work great for mass scraping, it certainly helps when trying to scrape a single dynamic page
And, you might even find an undocumented API that you can pull data from.
evaristoc
@evaristoc
Jan 18 2017 22:03
@apottr not that I don't believe you but have you tried? It make sense what you say though..
Amelia
@apottr
Jan 18 2017 22:04
I have, yeah
I was trying to find a way to scrape my town's assessor's database and I found a JSON api hiding in there
evaristoc
@evaristoc
Jan 18 2017 22:05
:) :) :) !
Amelia
@apottr
Jan 18 2017 22:05
Wix websites also have an undocumented API that provides the page data in a easy-to-use format
evaristoc
@evaristoc
Jan 18 2017 22:05
Some API require authorisation to work though, you might need a API key...
Amelia
@apottr
Jan 18 2017 22:05
That's true
but usually if you can find a site actively using it, you'll find an API key in there too
evaristoc
@evaristoc
Jan 18 2017 22:08
Well... my recent target was medium (don't tell anyone... ;) ). The first thing that came to my mind was selenium but it was a bit cumbersome to install... I don't think they deploy all the data of a page at once: it seems to be triggered by a cookie authentication...
They also offer an API but it is terrible...
Amelia
@apottr
Jan 18 2017 22:12
right
evaristoc
@evaristoc
Jan 18 2017 22:13
@apottr I think that in fact many sites are trying to deter scrapping not only because it affects their databases but also to protect the access to their data for commercial reasons.
Not new: try to scrap Google searches...
Amelia
@apottr
Jan 18 2017 22:14
Yeah, that's true
but ultimately, if someone wants to scrape the data, they'll be able to
evaristoc
@evaristoc
Jan 18 2017 22:16
With limitations but yes, I think so. You have to know how to and work on it.
I hope you didn't find any information in your town's assessor's database that could be sold to a enemy country... otherwise I see you in CNN...
Take care!
(and... hide well...)
Amelia
@apottr
Jan 18 2017 22:19
hahaha
cya :)
Alice Jiang
@becausealice2
Jan 18 2017 23:37
Amelia the next Snowden/Manning?
;)
This has been an interesting conversation, though. I'm going to have to poke around a site I've been meaning to scrape?
Amelia
@apottr
Jan 18 2017 23:44
the hardest site i've ever successfully scraped was the nyc department of corrections engine
that was a chore
probably would've gone better with Selenium though