Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Quan Nguyen
    @nguyenq
    I saw what you did for tess4j 2.0.1 release. I think I can try to release myself for lept4j.
    Quan Nguyen
    @nguyenq
    OK, I just published a release in github.
    Quan Nguyen
    @nguyenq
    and to Maven Central also. I'm trying to verify if the files got there.
    4F2E4A2E
    @4F2E4A2E
    Great thx!
    Quan Nguyen
    @nguyenq
    OJ, there are 2 issues with the code currently in master: 1) mergePdf.mergeDocuments() in PdfUtilities.mergePdf appears to be missing a parameter 2) testMergePdf() throws an exception: JBIG2 encoding not implemented, java.lang.UnsupportedOperationException
    bhayer
    @bhayer
    anyone here has experience with Tess4J (or JNA) on glassfish 4.x ?
    4F2E4A2E
    @4F2E4A2E
    Nope, but give to the technology (Application Server) it should be no problem. I guess the more important question will be on which Operating System you are trying &/ willing to get it running ...
    Vitaliy Hayda
    @vitaliyhayda

    Hi there!

    So I'm trying to train Tesseract 3.04 for specific font and I've completed couple of .box files with over 800 characters each. Then I've created .tr files and now I'm ready to train Tesseract based on those:

    sudo ./tesstrain.sh --lang eng --langdata_dir /Users/vitaliy/Desktop/tess-training/langdata --tessdata_dir /Users/vitaliy/Desktop/tess-training/TIFs

    This is an error (s) I'm getting:

    === Starting training for language 'eng'
    mktemp: illegal option -- -
    usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
    mktemp [-d] [-q] [-u] -t prefix
    ERROR: text2image not found

    Any advice helps! Thank you!

    Vitaliy Hayda
    @vitaliyhayda
    @nguyenq
    Quan Nguyen
    @nguyenq
    Hi Vitaliy, you'll need to build the Tesseract training executable:
    But if you already have the TIFF/Box pairs, you won't need text2image program, which is used to create TIFF/Box pair given an input UTF-8-encoded text file. You can edit the tesstrain.sh script to comment out that command.
    Vitaliy Hayda
    @vitaliyhayda
    Thank you for such a quick reply! Will try following your advice
    Vitaliy Hayda
    @vitaliyhayda

    I did quite a few steps after that!

    But I'm blocked once again with this error:

    ./pango/pango-coverage.h:25:10: fatal error: 'glib.h' file not found

    include "glib.h"

    How do I get that file if I'm on Mac?
    Vitaliy Hayda
    @vitaliyhayda
    Ok, looks like I have figured it out. It was not easy...
    hohenheim23
    @hohenheim23
    hi there ! @vitaliyhayda @nguyenq i tried the new tess4j mavenized : when i run mvn clean install all run perfectly! i saw that he has eng.traineddata and also osd.traineddata, when i wanted to add an ara.traineddata that i dl from github, and also add a picture .tif containing arabic text : i get error "Invalid Memory access". Is that possible to modify the project to try make it works for arabic isn't ?
    hohenheim23
    @hohenheim23
    this is the error : i don't understand why i got this error : Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.759 sec
    Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from src\main\resources/tessdata/ara.cube.lm
    Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object
    init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file ....\ccmain\tessedit.cpp, line 210
    Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from src\main\resources/tessdata/ara.cube.lm
    Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object
    init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file ....\ccmain\tessedit.cpp, line 210
    Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from src\main\resources/tessdata/ara.cube.lm
    Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object
    init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file ....\ccmain\tessedit.cpp, line 210
    Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from src\main\resources/tessdata/ara.cube.lm
    Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object
    init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file ....\ccmain\tessedit.cpp, line 210
    Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from src\main\resources/tessdata/ara.cube.lm
    Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object
    init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file ....\ccmain\tessedit.cpp, line 210
    Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from src\main\resources/tessdata/ara.cube.lm
    Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object
    init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file ....\ccmain\tessedit.cpp, line 210
    Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from src\main\resources/tessdata/ara.cube.lm
    Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object
    init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file ....\ccmain\tessedit.cpp, line 210
    Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from C:\Users\machebbi\AppData\Local\Temp\tess4j\tessdata/ara.cube.lm
    Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object
    init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file ....\ccmain\tessedit.cpp, line 210
    Quan Nguyen
    @nguyenq
    You'll need all the ara.cube.* files in tessdata folder.
    hoangtocdo90
    @hoangtocdo90
    hello quan nguyen
    codelovercc
    @codelovercc
    hi everyone
    hi hoangtocdo90
    wow
    somebody there?
    Victor Arbues
    @Painyjames
    hi there
    anyone knows how to solve the unable to load library tesseract errorrs?
    I'm on centos with tesseract 3.04
    tess4j 3.4.0
    it works on mac with tesseract 3.05
    Victor Arbues
    @Painyjames
    I needed to add them manually
    System.load("/usr/local/lib/liblept.so.5")
    System.loadLibrary("tesseract")
    hi1027
    @hi1027
    Is there a tutorial for training
    Mohsiur Rahman
    @mohsiur-rahman
    Hi, we are currently using tesseract in Java by passing it through command line but our service goes into exception handling numerous times, will this java wrapper help improve the performance. We are using tiff files as our input
    Sergey
    @romankovsv
    Hi guys
    has somebody faced such issue
    api.TessBaseAPIRecognize(handle, null);
    throws
    an excpetion
    java.lang.Error: Invalid memory access
    at com.sun.jna.Native.invokeInt(Native Method)
    at com.sun.jna.Function.invoke(Function.java:383)
    at com.sun.jna.Function.invoke(Function.java:315)
    at com.sun.jna.Library$Handler.invoke(Library.java:212)
    I instanciate following way
    TessAPI api = new TessDllAPIImpl().getInstance();
    TessAPI.TessBaseAPI handle = api.TessBaseAPICreate();
    I am trying to find a way to get coordinates of specfic text
    Sergey
    @romankovsv
    HERE is my full code could you be so kind to take a look
    Tesseract tessInst = Tesseract.getInstance();
    File tessDataFolder = LoadLibs.extractTessResources("tessdata");
    tessInst.setDatapath(tessDataFolder.getAbsolutePath());
    captureScreenshot();
    System.out.println("TessBaseAPIGetIterator");
        TessAPI  api = LoadLibs.getTessAPIInstance();
        TessAPI.TessBaseAPI handle = api.TessBaseAPICreate();
    
        String lang = "eng";
        File toast = new File("src/test/ToastScreenShoots/toast.png");
        BufferedImage image = null;
        try {
            image = ImageIO.read(new FileInputStream(toast));
        } catch (IOException e) {
            e.printStackTrace();
        }
        ByteBuffer buf = ImageIOHelper.convertImageData(image);
        File file = new File("src/test/ToastScreenShoots/toast.png");
        api.TessBaseAPIInit3(handle, "tessdata", lang);
        TessAPI.TessResultIterator ri = api.TessBaseAPIGetIterator(handle);
        TessAPI.TessPageIterator pi = api.TessResultIteratorGetPageIterator(ri);
    
        api.TessPageIteratorBegin(pi);
        System.out.println("Bounding boxes:\nchar(s) left top right bottom confidence font-attributes");
    
        int height = image.getHeight();
        do {
            Pointer ptr = api.TessResultIteratorGetUTF8Text(ri, TessAPI.TessPageIteratorLevel.RIL_WORD);
            String word = ptr.getString(0);
            api.TessDeleteText(ptr);
            float confidence = api.TessResultIteratorConfidence(ri, TessAPI.TessPageIteratorLevel.RIL_WORD);
            IntBuffer leftB = IntBuffer.allocate(1);
            IntBuffer topB = IntBuffer.allocate(1);
            IntBuffer rightB = IntBuffer.allocate(1);
            IntBuffer bottomB = IntBuffer.allocate(1);
            api.TessPageIteratorBoundingBox(pi, TessAPI.TessPageIteratorLevel.RIL_WORD, leftB, topB, rightB, bottomB);
            int left = leftB.get();
            int top = topB.get();
            int right = rightB.get();
            int bottom = bottomB.get();
            System.out.print(String.format("%s %d %d %d %d %f", word, left, top, right, bottom, confidence));
    
            IntBuffer boldB = IntBuffer.allocate(1);
            IntBuffer italicB = IntBuffer.allocate(1);
            IntBuffer underlinedB = IntBuffer.allocate(1);
            IntBuffer monospaceB = IntBuffer.allocate(1);
            IntBuffer serifB = IntBuffer.allocate(1);
            IntBuffer smallcapsB = IntBuffer.allocate(1);
            IntBuffer pointSizeB = IntBuffer.allocate(1);
            IntBuffer fontIdB = IntBuffer.allocate(1);
            String fontName = api.TessResultIteratorWordFontAttributes(ri, boldB, italicB, underlinedB,
                    monospaceB, serifB, smallcapsB, pointSizeB, fontIdB);
            boolean bold = boldB.get() == TessAPI.TRUE;
            boolean italic = italicB.get() == TessAPI.TRUE;
            boolean underlined = underlinedB.get() == TessAPI.TRUE;
            boolean monospace = monospaceB.get() == TessAPI.TRUE;
            boolean serif = serifB.get() == TessAPI.TRUE;
            boolean smallcaps = smallcapsB.get() == TessAPI.TRUE;
            int pointSize = pointSizeB.get();
            int fontId = fontIdB.get();
            System.out.println(String.format("  font: %s, size: %d, font id: %d, bold: %b," +
                            " italic: %b, underlined: %b, monospace: %b, serif: %b, smallcap: %b",
                    fontName, pointSize, fontId, bold, italic, underlined, monospace, serif, smallcaps));
        } while (api.TessPageIteratorNext(pi, TessAPI.TessPageIteratorLevel.RIL_WORD) == TessAPI.TRUE);
    on the api.TessPageIteratorBegin(pi);
    I get exception
    java.lang.Error: Invalid memory access
    at com.sun.jna.Native.invokePointer(Native Method)
    at com.sun.jna.Function.invokePointer(Function.java:470)
    at com.sun.jna.Function.invoke(Function.java:404)
    at com.sun.jna.Function.invoke(Function.java:315)
    at com.sun.jna.Library$Handler.invoke(Library.java:212)
    at com.sun.proxy.$Proxy9.TessResultIteratorGetUTF8Text(Unknown Source)
    rosenjcb
    @rosenjcb
    Is there a way to get a return type other than void with .createDocument()? I want to do an OCR on a document and get a searchable PDF but right now I have to create the document and retrieve it from the filesystem... it's a lot of code.
    rosenjcb
    @rosenjcb
    @asvn oh yeah this is empty lol
    Last message was by me back in September 2018