Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Alex Leone
    @alexleone
    These are effective groups!
    Traun Leyden
    @tleyden
    yeah so far I'm super impressed
    Thought Object
    @thoughtobj
    Is this using the go-tesseract wrapper for tesseract?
    Traun Leyden
    @tleyden
    It's disabled by default due to some bugs
    So it just calls tesseract via exec
    Omnipresent
    @Omnipresent
    Ok. I guess thats same since it would require the image to be saved to disk anywayd.
    Im new to docker. The readme says there will be 3 docker images running on same server. Could they be run on different servers? Dedicated server for rabbitmq, for api requests, and for ocr workers? To be more scalable
    Traun Leyden
    @tleyden
    yes definitely!
    I've set it up so that rabbitmq is running on https://www.cloudamqp.com/, which is nice because they manage it and give lots of nice web UI administration for rabbitmq
    Omnipresent
    @Omnipresent
    Awesome. This is super great. Ive been building a opencv and tesseract pre processing pipeline and was looking to scale it. Now i can just add mine as a preprocessing step
    Btw curious...what did u use for the diagram on the readme?
    Omnipresent
    @Omnipresent
    Is the httpd service load balancing between the workers?
    Daemian Mack
    @daemianmack
    hey all. is there a demo running somewhere of this project (or of tesseract) that i can try out?
    Traun Leyden
    @tleyden
    @daemianmack hey! nope, no public api, but it should be easy to deploy it on your own cloud.
    Daemian Mack
    @daemianmack
    @tleyden i was hoping to avoid setup if the sort of text i'm looking to OCR turns out to be impractical. maybe you could opine -- does the text in this image look like it might be possible to OCR with tesseract, given i might need to use character whitelisting and some kind of positioning bounding/transform? http://i.imgur.com/SMbdWzK.jpg
    Traun Leyden
    @tleyden
    @daemianmack it's really hard to know without trying, but my gut tells me that tesseract will struggle with that
    Thought Object
    @thoughtobj
    Is tesseract ran as a command line or does it use the provided C-APIs? Want to know if everything is done in memory or I/O
    Traun Leyden
    @tleyden
    @thoughtobj initially it was using a g
    .. a go binding to the c api
    However I ran into limitations and switched to a command line approach (fork / exec) subprocess
    Thought Object
    @thoughtobj
    @tleyden do you remember what limitations you ran into and whether they were from the actual c-api or from the go binding? command line approach would work fine however, it requires writing the file to the disk which includes I/O. Doing everything in memory would be better, no?
    Traun Leyden
    @tleyden
    @thoughtobj yeah there were limitations to the go bindings and I filed an issue (that I can dig up), which may have been fixed by now. I believe I made the commandline exec() approach the default but kept the gobinding approach as optional.
    But yeah, the gobinding approach is cleaner and more efficient and was my original approach
    simkimsia
    @simkimsia
    I was googling around for OCR as a service and your github came up
    How actively is the github repo maintained?
    simkimsia
    @simkimsia
    I have created an issue for this tleyden/open-ocr#52
    @tleyden Sorry I had to ping you directly. I was hoping you had an answer to this
    Traun Leyden
    @tleyden
    @simkimsia it's been maintained in the sense that it's been low maintenance, and I have been helping people that get stuck. Haven't added much in the way of new features, and I still need to get back to documenting and cleaning up the stroke width transform stuff.
    simkimsia
    @simkimsia
    @tleyden Thanks for clarifying.
    @tleyden I have somehow resolved my issue with the docker-compose up by turning on my VPN. Not sure why.