Search in PDFs doesn't work (wrong encoding?)

Please post all questions and comments regarding Help & Manual 7 here.

Moderators: Alexander Halser, Tim Green

Post Reply
User avatar
Anna Ryut
Posts: 23
Joined: Thu Feb 02, 2017 10:02 am

Search in PDFs doesn't work (wrong encoding?)

Unread post by Anna Ryut »

Hello!

Recently we discovered that search in our PDFs doesn't work for the Russian language.

For example: if I open the PDF file in the Adobe Acrobat reader and enter the word "Файл", an error "Nothing was found" appears, however, this word is often used in the file (see pic.1). The search works well if we open the same PDF file via browser (but our customers can't use it). At the same time, if I enter some English words, which also exist in Russian documentation, the search works well.

If I want to copy some words from the PDF file it copies unreadable symbols (see pic. 2). That's why I suppose that Russian encoding works incorrectly.

I've read your documentation about languages and checked all settings we have. The project settings are the following:
  • Language of the help File: Russian (Russia)
  • Font charset - RUSSIAN_CHARSET
Style settings in the Russian style repository for fonts are "Montserrat, Regular, Script - Cyrillic".

I've also checked my PC region and language settings - it is also Russian everywhere.

I thought that it happens because I use 1 style repository for the English and Russian language projects where all settings are for the English language, so I also did the Russian style repository, and still, it doesn't help.

My Help&Manual version is 7.5.4 Build 4760
1.png
2.png
You do not have the required permissions to view the files attached to this post.
User avatar
Tim Green
Site Admin
Posts: 22668
Joined: Mon Jun 24, 2002 9:11 am
Location: Bruehl, Germany
Contact:

Re: Search in PDFs doesn't work (wrong encoding?)

Unread post by Tim Green »

Hi Anna,

This means that the text in your PDF is being generated as glyphs (one small graphic for each character) instead of text. You will also find that the PDF file is very large. You need to check the Font Embedding section in Configuration > Publishing > PDF.

For the first check, just make sure that "Export all text as glyphs" is NOT on, and that NO fonts are excluded from embedding (you must embed ALL fonts for Russian). If this does not work it means that there are characters in your text that are not included in the fonts you are using. Windows then simply replaces the text with glyphs.

Before trying to change out your fonts in your entire project, first set the font embedding mode to CID, and if that doesn't work to Type3. Type3 will always work, but the text will not be quite so clear at higher zoom settings. If CID works for you then you will get the best quality with that.

In addition to this you should also check your PDF reference printer driver. HM uses a printer driver to generate PDFs, and the choice of driver can have a big effect on quality. You can access the reference printer driver settings in View > Program Options > PDF. By default, HM will use the screen device driver, but you can often get better results by using a real printer driver. By the same token, some printer drivers can also cause problems, particularly optimized drivers from printer manufacturers. It is generally better to use one of the standard drivers supplied with Windows.

If the driver of your own installed printer doesn't work well you can activate and select one of the known good standard drivers included with Windows. You don't have to actually have the physical printer. Just add the driver with Add Printer in the printer section of the Windows Control Panel and then select it as your reference driver in Help & Manual. The standard Windows Brother HL-2040 and HL-2060 drivers deliver good results, and standard LaserJets and DeskJets are usually also OK. You can also use the Microsoft XPS Document Writer driver, which is always installed in all current Windows versions. However, this driver has the restriction that it can't be set to the higher output resolutions supported by more recent printer drivers.

Windows 10 no longer provides direct access to the list of standard drivers. To access it you must do this:

Select Add Printer, then click on "The printer I want isn't listed"
Next: Select "Add a local printer or network printer with manual settings"
Next: Accept the suggested port (this is irrelevant)
Next: Click on "Windows Update" and wait a couple of minutes for the list to display.

The last step can really take a few minutes. Go and make a cup of coffee... ;-)

​After this you will be able to select a driver from the list.​

Important Notes:

DON'T use "PDF printer drivers" like Adobe Distiller for this! This will not work properly and is actually counter-productive.

The output resolution and default paper size are set in the settings for the printer driver in the Windows Control Panel. The default page size set for the printer driver must be at least as large as the page size you have set in your print manual template. If it is smaller you will get clipping in your PDF pages.
Regards,
Tim (EC Software Documentation & User Support)

Private support:
Please do not email or PM me with private support requests -- post to the forum directly.
User avatar
Anna Ryut
Posts: 23
Joined: Thu Feb 02, 2017 10:02 am

Re: Search in PDFs doesn't work (wrong encoding?)

Unread post by Anna Ryut »

Thanks Tim!

The CID mode helped and I also added printer settings.
The search now works.
User avatar
Anna Ryut
Posts: 23
Joined: Thu Feb 02, 2017 10:02 am

Re: Search in PDFs doesn't work (wrong encoding?)

Unread post by Anna Ryut »

Dear Tim!

Search now works ok with CID config, but now when I open files in Google Chrome it shows unreadable symbols in the title of the PDF (pic.11)
11.png
As far as I understand, in my case, it takes the title from the project settings
21.png
Does it mean that CID mode settings are not applied to this configuration?
You do not have the required permissions to view the files attached to this post.
User avatar
Tim Green
Site Admin
Posts: 22668
Joined: Mon Jun 24, 2002 9:11 am
Location: Bruehl, Germany
Contact:

Re: Search in PDFs doesn't work (wrong encoding?)

Unread post by Tim Green »

Hi Anna,

I just double-checked this with a Russian project in both Help+Manual 7 and 8 and it's working fine. My guess is that you may have copied and pasted this text from a source with different font encoding. Try deleting it and replacing it with the same word typed directly in the project editor and copied from there. Does that help?

If you continue to have this problem please mail a small demo project that reproduces the issue to support AT ec-software.com (replace the AT with @) and we'll check it for you. You can create this project as follows:

​1: Select Save As.. in the File menu and save the project in the single-file HMXZ format (first option, single file storage).
2: Delete any topics not needed for the demo.
3: Go to Configuration > Common Properties > Miscellaneous. Turn off "Automatically create history files" and delete the history in the project with the "Purge xx history files" button. Then save again.
​4: Perform a test to make sure you can still reproduce the issue with the demo project.​
​5​: Send us the resulting .hmxz file and the .mnl PDF template file you are using.
Regards,
Tim (EC Software Documentation & User Support)

Private support:
Please do not email or PM me with private support requests -- post to the forum directly.
Post Reply