OCR and link search

Please post all questions and comments regarding Help & Manual 7 here.

Moderators: Alexander Halser, Tim Green

Post Reply
WrightStuff
Posts: 25
Joined: Fri Aug 26, 2022 9:45 am
Contact:

OCR and link search

Unread post by WrightStuff »

I need to search for text in screenshots in a 400 page PDF document output of Help and Manual, whats the best way to do this? Also, how can I search for hyperlinks throughout the doc? Is there a tool to validate links/look for broken links?
Stephen Thompson
Technical Writer
User avatar
Tim Green
Site Admin
Posts: 23125
Joined: Mon Jun 24, 2002 9:11 am
Location: Bruehl, Germany
Contact:

Re: OCR and link search

Unread post by Tim Green »

Hi Stephen,

For the text in the screenshots you need to use an external OCR-based tool that can read text in graphics. Help+Manual doesn't have a feature like that, but since the graphics are always external files anyway there is no problem with running an external tool on them. You will probably need to do this on the source graphics and not on the PDF output file. I'm not aware of any OCR program that can scan for text in graphics inside a PDF, which is basically raw printer output saved in a file and displayed on the screen with a special printer driver. There may be such a tool, but we are not aware of it.

Similarly, Help+Manual doesn't have a dedicated tool to enumerate all links in your project or check whether or not they are broken. The Project Report tool will show you all the links in your project and will report broken internal links, but it can't check external links. The best way to check the external would be to generate WebHelp and use a link checking tool on the HTML files. Again, we are not aware of a program that can do that on a PDF file.
Regards,
Tim (EC Software Documentation & User Support)

Private support:
Please do not email or PM me with private support requests -- post to the forum directly.
Post Reply