These are chat archives for FreeCodeCamp/DataScience

Feb 2017
Hèlen Grives
Feb 21 2017 16:19 UTC
Hi; I'm looking for methods (good but not to complicated) to anonymize data. Any experience. My raw corpus input is growing; but the next step is getting rid of private data. That's a lot actually. Maybe building a dictionary and parse the files. On the other hand should I do the analysis on anonymous data or the initial full raw data.?
Feb 21 2017 18:27 UTC
@mesmoiron perhaps this might help?
the problem with PII is that sometimes (like in the case of names) it's virtually indistinguishable from regular text to the eyes of the computer