These are chat archives for beniz/deepdetect

15th
May 2018
cchadowitz-pf
@cchadowitz-pf
May 15 2018 13:51
Any trained models coming with #406? :) I'd love a DD-based alternative to Tesseract
Emmanuel Benazera
@beniz
May 15 2018 15:43
you might want to look for the tesseract datasets, and we could train and release one, why not.
cchadowitz-pf
@cchadowitz-pf
May 15 2018 16:56
To be honest, I haven't looked into how Tesseract trains their models until now. It almost seems like they rely on language data from this repo https://github.com/tesseract-ocr/langdata and font files. I couldn't find any references to labeled image training data so perhaps they generate it from font files and language data? I'll see what else I can find
Emmanuel Benazera
@beniz
May 15 2018 17:10
tesseract 4 is using lstm
cchadowitz-pf
@cchadowitz-pf
May 15 2018 17:41
yeah, but it still seemed like they use the language data from that repo for tesseract 4