These are chat archives for FreeCodeCamp/DataScience

21st
Feb 2017
Hèlen Grives
@mesmoiron
Feb 21 2017 16:19
Hi; I'm looking for methods (good but not to complicated) to anonymize data. Any experience. My raw corpus input is growing; but the next step is getting rid of private data. That's a lot actually. Maybe building a dictionary and parse the files. On the other hand should I do the analysis on anonymous data or the initial full raw data.?
Amelia
@apottr
Feb 21 2017 18:27
@mesmoiron perhaps this might help? https://github.com/datascopeanalytics/scrubadub
the problem with PII is that sometimes (like in the case of names) it's virtually indistinguishable from regular text to the eyes of the computer