Tangled encodings in 4.51

Please post bug reports for earlier versions of Help & Manual (3 and 4) here, along with reports for TNT.

Moderators: Alexander Halser, Tim Green

Post Reply
Andrew Hain
Posts: 28
Joined: Wed Feb 04, 2004 4:14 pm

Tangled encodings in 4.51

Unread post by Andrew Hain »

In Help&Manual 4.1, if Export as UTF-8 is unticked to output HTML in legacy encodings, dashes and directional quotes are output as entities that browsers display as intended in browser-based help. In 4.51 they are output as character points between 128 and 159 in the Windows-1252 character set, but the encoding (set through the %DOCCHARSET% environment variable) is written to each file as ISO-8859-1 and the characters are rendered as question marks or squares.

If Export as UTF-8 is set anyone using IE6, which a significant proportion of people in our organisation still use, sees this message instead of the topic pane:
The XML page cannot be displayed.
Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later.


Parameter entity must be defined before it is used. Error processing resource 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd...

%xhtml-prefw-redecl.mod
-^
User avatar
Tim Green
Site Admin
Posts: 23156
Joined: Mon Jun 24, 2002 9:11 am
Location: Bruehl, Germany
Contact:

Unread post by Tim Green »

Hi Andrew,

Thanks for posting this, I've asked the developers to get back to you on it.
Regards,
Tim (EC Software Documentation & User Support)

Private support:
Please do not email or PM me with private support requests -- post to the forum directly.
User avatar
Alexander Halser
EC-Software Support
Posts: 4098
Joined: Mon Jun 24, 2002 7:24 pm
Location: Salzburg, Austria
Contact:

Unread post by Alexander Halser »

We cannot duplicate this. Can you elaborate on that?
but the encoding (set through the %DOCCHARSET% environment variable) is written to each file as ISO-8859-1
When you output UTF-8 encoded HTML, the %DOCCHARSET% encoding ist "utf-8", not "iso-8859-1" (unless you have manually modified this in the HTML templates). So the encoding and the exported characters match. If you export non-UTF8, characters are encoded as entities and the encoding is "iso-8859-1" (or similar, depending on the glocal project character set).
Alexander Halser
Senior Software Architect, EC Software GmbH
Andrew Hain
Posts: 28
Joined: Wed Feb 04, 2004 4:14 pm

Unread post by Andrew Hain »

I can bring you up to date on what has happened.

The project was previously edited with H&M 4.1 with HTML output set to legacy encodings (ISO-8859-1 + entities), and each page began with

Code: Select all

<?xml version="1.0"?>
...
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
Using H&M 4.51 the code begins with exactly the same lines but text is saved as Windows-1252 and is not rendered correctly.

If Export UTF-8 encoded HTML is then ticked each file then begins:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
...
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
All characters are displayed correctly as intended but if IE6 users try to view the pages over HTTP behind our corporate firewall the message above is output. Viewing local files with IE6 or using IE7 or Firefox are not problems. This was reported with H&M 4.1 and we worked round this by using legacy encodings. This is probably the most serious issue for us and I am now wondering whether if it is really a bug in Help & Manual or some sort of network configuration issue.

If I then revert to legacy encodings the files begin:

Code: Select all

<?xml version="1.0" encoding="Windows-1252"?>
...
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
This renders all characters correctly but also has the IE6 issues described above.
Post Reply