These are chat archives for cltk/cltk

31st
Jan 2018
Avinash Kumar
@avkumar19
Jan 31 2018 05:58
@kylepjohnson I will make a new ticket today but before doing that I wanted to discuss few things.
  1. I feel that if the Corpus of raw data contains any kind of discrepancy then it will carry forward to the steps ahead like Tokenization.
  2. I have edited few files and ran Tokenizer on both the existing file and the edited file, and lot of those unnecessary characters with no meaning present in existing file were now not present in the edited one. I think this will lead to efficient training and execution.
  3. I just wanted to know whether I can remove those unnecessary words/characters from the files or not. Most important things is that before making any changes int the files I will definitely mention the source.
    And I would really love to fix this problem myself.