Newspapers (irrespective of their century) are not part of OCR-D! OCR-D focuses on VD-materials which don't include newspapers. And I can't think of a reason why the VD should change their scope of materials for a VD19. whenever it might be begun...
The DFG mentions 19th century explicitly, too: "Die zu entwickelnden Lösungen sollen eine Volltextdigitalisierung von Druckwerken des 19. Jahrhunderts ebenfalls einbeziehen". OCR-D is not restricted to VD material (although that is the main focus). And isn't a newspaper printed, too? Then it is a "Druckwerk".
@/all our next open TechCall takes place next Tuesday, 11-12 am. Feel free to join it if you are interested in the following topics:
debpackages for OCR-D
for the conference details also see https://hackmd.io/OOMgg3ZeSqK4vfKL1wRbwQ?view
What is the difference between OMAR and the current approach of using multiple models at recognition time?
OMA = One Model to recognize them All (acronym coined ad hoc because Clemens likes acronyms, derived from the "one ring to rule them all"). That can also be a set of models used at recognition time. It just means that the old approach to choose a model based on the script(s) used in a book or other criteria is replaced by the simpler rule to always use the same model (or set of models).
Of course this still has limitations. Clemens' model covers Latin, Greek and Hebrew scripts, our models currently only work with Latin and some Greek glyphs. So Arabic, Chinese, ... scripts still need other models.
It's only OMA for OCR-D. And maybe there will be different models for OCR workflows with and without binarization.
structural information like chapters and sections from METS/MODS and put this into a table of content of a PDF.
Not that I'm aware of. But with a bit of pre-processing and a toolkit like https://github.com/ocelotconsulting/hummus-toc it wouldn't be too difficult (generating that information in the first place is the hard part IMHO). Can you create a feature request in ocrd_pagetopdf so we can discuss possible solutions? We could either extend ocrd_pagetopdf or create a dedicated processor for it.
multipageparameter a few weeks ago, which allows producing a single PDF file from all the pages in a METS. In case users haven't noticed it yet, see the usage section of ocrd_pagetopdf.
Though de-keystoning is more of a shearing operation IIUC?
yes, I guess it can be approximated as just shear in 3d. So if we had a descriptive annotation of that mapping in PAGE, then our coordinate-conversion API (based on affine transformations) could compensate.
But I still don't see any robust/usable keystone detector or general dewarper out there:
@/all does anyone know other promising tools?
If you have some dewarped/straight material, you could create easily deformed material (better live-transform and use it as augmentation) and then train a CNN to dewarp it again...
Absolutely. You would think this is an easy problem nowadays. Esp. with augmenters like imgaug or ocrodeg.
https://github.com/thomasjhuang/deep-learning-for-document-dewarping (although I think that a GAN architecture is somewhat strange on a global level - interesting that this works...)
This is pretty much what ocrd-anybaseocr-dewarp attempted. However, they don't provide an off-the-shelf model, and the training data representation looks strange. Plus:
NVIDIA GPU (11G memory or larger)
ouch. (pix2pixHD with fewer GPURAM or only CPU seems to be impossible...)
https://github.com/cvlab-stonybrook/DewarpNet (looks more soqphisticated)
Indeed. I'll have a look – thanks a lot!
hm exchanging might be too time-consuming, but CPU should be doable, you need to load the trained model once with a GPU, safe it to CPU and then you can use it with CPU (just remove .cuda())
wow, that sounds almost doable – thx!! I will try that (on a larger GPU) – if this works, we'll have at least some dewarping in OCR-D (where most users run via Docker which is currently CPU-only)
ocrd-skimage-normalizefirst (doing simple contrast stretching) or something more elaborate via
ocrd-skimage-binarizemakes this the default choice for window size (looking at DPI meta-data).
olena sauvola, with different blackness & whiteness & noise levels (vertical) and k from 0.025 to 0.475 see https://digi.ub.uni-heidelberg.de/diglitData/v/olena-k-20200702.png . If you open this image in gimp and set the threshold to 255, one will recognize that darker images require a higher k and brighter images require lower k. At least two horizontal stripes should have clean white background per group. First column=original, second=ground truth, then k=0.025 +=0.025. Image shrinked to 25% (original resolution 300 dpi)
noise level 0 --> sigma = 0.1 * (linear(white)-linear(black)) 64 128 191 255 white 0 0.275 0.275 0.325 0.350 64 - 0.200 0.250 0.300 128 - - 0.125 0.200 191 - - - 0.100 noise level 1 --> sigma = 0.2 * (linear(white)-linear(black)) 64 128 191 255 white 0 0.275 0.275 0.325 0.300 64 - 0.200 0.250 0.275 128 - - 0.125 0.175 191 - - - 0.100 noise level 2 --> sigma = 0.4 * (linear(white)-linear(black)) 64 128 191 255 white 0 0.275 0.275 0.300 0.225 64 - 0.175 0.250 0.200 128 - - 0.125 0.150 191 - - - 0.075 noise level 3 --> sigma = 0.8 * (linear(white)-linear(black)) 64 128 191 255 white 0 0.375 0.350 0.375 0.250 64 - 0.175 0.300 0.225 128 - - 0.125 0.025 191 - - - 0.025
kis not independent of foreground/background level.