On 12/15/2015 02:00 PM, Tom Horsley wrote:
If you have pdf files with actual characters, the
pdftotext tool works well for extracting the text
(though not necessarily the layout).
As far as doing OCR from actual image files,
I always found tesseract to work better than most
(but it was still pretty feeble).
Thanx.
Downloaded and tested tesseract.
It failed TOTALLY on EVERY image file created by pdf.