These are chat archives for kba/hocr-spec

Oct 2018
Audun Bjørkøy
Oct 15 2018 10:17
Hello, we are trying to get a grasp on the hOCR standard. From different sources we get slightly different syntaxes. We looked for a schema, but there seem to be none? Our initial question is that some hocr files seems to have the <xml> tags wrapping the html content, while other seems to just be <html> tags. What would be the correct standard? Is there a preferred python library out there for parsing hocr files? Thanks.