nvm. I wrote my own implementation for text recognition with Tesseract.
Here is how it works:

I'm searching for a word, for example: "Fahrrad" and get the result back:

  word: 'Fahrradv...',
  score: 0.94,
  bbox: { x: 231, y: 179, w: 71, h: 9 },
  center: { x: 267, y: 184 } 

Score means the similarity factor of the searched word. 0.0 = nothing, 1.0 = full match. I'm using the Jaro-Winkler distance for calculating the similarity.

Now I "just" need to plugin my subsystem with SikuliX for further processing.

Anyway, nice to talk to you guys :smile: