Integrated Zoom indexer

Discussions about Help+Manual 9

Moderators: Alexander Halser, Tim Green

Post Reply
Andie Davidson
Posts: 17
Joined: Wed Apr 05, 2017 9:38 am

Integrated Zoom indexer

Unread post by Andie Davidson »

I use the integrated Wrensoft Zoom indexer when creating each user manual. I use the full Zoom product to index the whole site of all user manuals (there are many). This is because I don't want to have to create an offline config in Zoom for every individual manual. It works well.

Except I found by accident that short strings in png files can also be words/terms! Example, 'NAS' (networked storage) occurs 223 times in one manual, exclusively in png files, yet is a valid thing to search for.

How can I exclude png files from the integrated indexer?
User avatar
Tim Green
Site Admin
Posts: 23181
Joined: Mon Jun 24, 2002 9:11 am
Location: Bruehl, Germany
Contact:

Re: Integrated Zoom indexer

Unread post by Tim Green »

Hi Andie,

The integrated indexer doesn't (and cannot) index the "contents" of PNG files. It only indexes topic files, so I'm not quite sure what you're referring to here. :?
Regards,
Tim (EC Software Documentation & User Support)

Private support:
Please do not email or PM me with private support requests -- post to the forum directly.
Andie Davidson
Posts: 17
Joined: Wed Apr 05, 2017 9:38 am

Re: Integrated Zoom indexer

Unread post by Andie Davidson »

I realise this sounds odd.

This is one of the projects where this happens: https://portal.7thsense.one/user-guides ... ser-guide/

My search in project web pages seems to find non-text strings (i.e. not text in the actual page) and points to those pages containing them, with no highlighted text, and indeed the text string is not there in the page.

So to find out where the string was being found, I did a string search of the whole published project HTML folder, all file types, using Notepad++ 'Find in files'. PNGs are encoded, but the code was being read, example search result:

Line 1362: ‰[jĈ#FŒùï–ðt`Û1™’‚¾¹6R”’òҲ’·jYˆò•ÿÔNasCöÃóU ›°aÚÜÐRè~ÚB™üßbCùrAjY°c1X"\𤂅¸ etc.

No instances were found in page text, but many in .png internal code of images in the published project.

If 'Find in files' searching ONLY .png I get the same result, so somehow, like Notepad++ can access that code, is the Zoom indexer doing the same? My standalone Zoom specifies which file types to read, so excludes .pngs, and does not find the string in the same project files from the index of all projects.

The eWriter file's integrated search does not find this, which I find very odd in comparison, since surely the same function is running, but the web page search does.

BUT
Search nas* gives null results (correct)
Search nast correctly gives null results (no nasties here)
Search na gives correct results (I allow 2 character indexing)
Search nat and the results are correct: national, unattended etc.

So what is special about the string nas ?
User avatar
Tim Green
Site Admin
Posts: 23181
Joined: Mon Jun 24, 2002 9:11 am
Location: Bruehl, Germany
Contact:

Re: Integrated Zoom indexer

Unread post by Tim Green »

I can't think of any reasonable way that this could be happening. Even if you have embedded images in your pages, they get converted to image files on export, so there is no source code in the topic file that could be scanned. To test it, make a copy of your project and go to Config > Common > Project Search Path and delete all the paths there. This will enable you to generate WebHelp without any images. Do this to an empty folder to ensure you have no old leftover files (the indexing is done on the output files after publishing). If it still finds the string, you know it's cannot be the images.
Regards,
Tim (EC Software Documentation & User Support)

Private support:
Please do not email or PM me with private support requests -- post to the forum directly.
User avatar
Tim Green
Site Admin
Posts: 23181
Joined: Mon Jun 24, 2002 9:11 am
Location: Bruehl, Germany
Contact:

Re: Integrated Zoom indexer

Unread post by Tim Green »

Note: You can also check directly whether the PNGs are included. Look in the zoom index files generated in the WebHelp output folder with an editor. They are all plain text.
Regards,
Tim (EC Software Documentation & User Support)

Private support:
Please do not email or PM me with private support requests -- post to the forum directly.
Andie Davidson
Posts: 17
Joined: Wed Apr 05, 2017 9:38 am

Re: Integrated Zoom indexer

Unread post by Andie Davidson »

You are correct, Tim, the png finding is purely coincidental. The string is not in any file, and not in the zdat files.

In fact the issue is in everything that H+M publishes with integral Zoom indexing. It seems to be any 3-char string ending "as".

Search for xas or zas, for example, with or without "quotes" in the H+M online manual: https://www.helpandmanual.com/help/

351 instances of "xas"?

As I found earlier it is not true of the integral eBook version of the manual. Nor is it a function of any browser.

So something appears to be happening with 3-char strings ending "as" in the search mechanism itself.
User avatar
Tim Green
Site Admin
Posts: 23181
Joined: Mon Jun 24, 2002 9:11 am
Location: Bruehl, Germany
Contact:

Re: Integrated Zoom indexer

Unread post by Tim Green »

Andie Davidson wrote: Fri Feb 24, 2023 5:29 pm In fact the issue is in everything that H+M publishes with integral Zoom indexing. It seems to be any 3-char string ending "as".
Whoa -- this is truly bizarre, and I can reproduce it in our own documentation. We're going to have a look into it.
Regards,
Tim (EC Software Documentation & User Support)

Private support:
Please do not email or PM me with private support requests -- post to the forum directly.
Post Reply