You can change the default OCR engine via the GUI after you have opened an image file: Tools > OCR > OCR Engine. If you cannot get that to work, you can edit the gscan2pdf configuration file instead (look for the line containing "ocr engine"):
Code: Select all
$ grep engine ~/.config/gscan2pdfrc
"ocr engine" : "gocr",
I checked this file, "ocr engine" is already set to "gocr"
Anyway, there was another error, that the package unpaper
was not istalled. After I installed unpaper
, there was no more error.
Do you still get the error message after you have copied the Tesseract V3.05 *.traineddata files for all the languages (including ara.traineddata) and the Tesseract Cube data files listed below to the /usr/share/tessdata/ directory?:
ara.cube.bigrams, ara.cube.fold, ara.cube.lm, ara.cube.nn, ara.cube.params, ara.cube.word-freq, ara.cube.size, ara.tesseract_cube.nn
Sorry, I don't get it, When I enter the sourceforge website I can only download ara.bin files in version 4 and version 3.04/3.05. And yes, tesseract is in Sabayon repo with version 3.05.
So, where are the all the ara.xxx.xxx files? Too complicated.
Right-click on the text in the OCR pane in gscan2pdf and a window titled 'Editing text...' pops up. You can edit the text in this window and/or copy it and paste it into another application.
Finally with this I had success. I can let recognize the scanned document as text and then copy and paste.
So then, Fitz, thanks to you, I could make gscan2pdf make run and I can use OCR. I will read your recommendations to OCR topic also. I already found the hint, that 400 ppi is necessary.
I will mark this thread as solved.
-Linuxfluesterer (I love KDE...)
Take away Facebook from me and let there be real people again...