thinking of using Gopa for some simple crawling we have to do
Mind if I shoot some q?
As far as I see it does not handle pdf files, it creates a doc for them in ES, but no text is extracted. How hard would it be to add it? Maybe the easiest (if not the best solution) would be to send the pdf to a ingest pipeline that uses the ingest attachment plugin?
Is development of gopa ongoing?
@jmlucjav yes, sending to ingest pipeline is a easy way to achieve it, I am a little busy to not able to catch up with this project, but any pr are welcomed :)
@jmlucjav just a quick update, the PDF feature now added, and the PDF files was processed within GOPA, no extra ingestion needed.