These are chat archives for FreeCodeCamp/DataScience

1st
Feb 2017
Tom Lee
@user512
Feb 01 2017 00:29
Hi all, is there any Ruby/JS PDF scraper library recommendation? :smile:
G Singh
@gsingh1313
Feb 01 2017 01:55
hello world
CamperBot
@camperbot
Feb 01 2017 01:55

welcome to FreeCodeCamp @gsingh1313!

Alice Jiang
@becausealice2
Feb 01 2017 04:06
@erictleung yes! I was telling my mom about them and said basically that same thing... I wonder how much good content I've missed because I've accidentally developed the ability to scroll through dozens of their posts without paying enough attention to realize I'm scrolling through their posts :/
Eric Leung
@erictleung
Feb 01 2017 07:25

@becausealice2 someone should make like a collaborative filter or something on their posts to find the good ones :laughing:

@user512 sorry, I don't know of any...

@gsingh1313 welcome!

Hèlen Grives
@mesmoiron
Feb 01 2017 11:52
@gsingh1313 hi welcome!
About my project: it is coming along very slowly. I decided to make a python module for some of the things just to see how I wrestle with that. I do need to process hundreds of files. So I have setup the project structure as good as I can. There's still a small issue. If I ocr the files, correcting them immediately will be much easier. The output won't be 100% accurate .However I don't know if that step leaves the data raw enough; or that I have to process them and later correct them. I haven't made up my mind yet about that. Anyway I keep in mind the article about the fact that raw data is actually never processed or worked with. It is always a derivative.
evaristoc
@evaristoc
Feb 01 2017 13:46
@user512 Sorry, no idea. Haven't tried yet to work on PDF format.
evaristoc
@evaristoc
Feb 01 2017 20:41
@mesmoiron
:+1: