prevent search engines from indexing my help?

Please post all questions relating to Help & Manual 6 here!

Moderators: Alexander Halser, Tim Green

Post Reply
Kevin Killion
Posts: 38
Joined: Fri Jun 15, 2012 9:44 pm

prevent search engines from indexing my help?

Unread post by Kevin Killion »

I do not wish to make it too easy for non-clients to see my beautiful new HTML Help pages.

How do I set H&M to set my help pages to PREVENT Google and other search engines from indexing them?

Thanks,
Kevin
User avatar
Hendrik
Posts: 53
Joined: Wed Aug 21, 2002 2:55 pm
Location: Belgium
Contact:

Re: prevent search engines from indexing my help?

Unread post by Hendrik »

For a start: put a robot.txt into the root of your site.
http://en.wikipedia.org/wiki/Robots_exclusion_standard
http://www.mcanerin.com/en/search-engine/robots-txt.asp

Others may have better ideas. ;-)
Kevin Killion
Posts: 38
Joined: Fri Jun 15, 2012 9:44 pm

Re: prevent search engines from indexing my help?

Unread post by Kevin Killion »

Yes, adding robots.txt would work fine if I'm creating a website on my own.

But I use a task in H&M to create six versions of my help, in six folders on the server. So, the robots.txt would have to be in all six. The problem is that H&M requires you to empty out a destination folder before running Publish, so that there aren't any stray files leftover. Thus, every time you run Publish, you would have to manually drop robots.txt into each of the destination folders (locally, if I then upload them to server, or to the folders at the destination, if I upload them first).

Ideally, H&M would have some way to include robots.txt into the output as part of the tasks in Publish. And it would be extra nice if H&M took care of the FTP upload as part of the Publish tasks.
Simon Dismore
Posts: 454
Joined: Thu Nov 16, 2006 1:29 pm
Location: London, UK

Re: prevent search engines from indexing my help?

Unread post by Simon Dismore »

Kevin Killion wrote:Ideally, H&M would have some way to include robots.txt into the output as part of the tasks in Publish.
Have you tried adding robots.txt as a baggage file?
Kevin Killion
Posts: 38
Joined: Fri Jun 15, 2012 9:44 pm

Re: prevent search engines from indexing my help?

Unread post by Kevin Killion »

The baggage file idea is a good one -- I had never read about baggage files before.

BUT ... in order to use this I'd have to create a separate domain for my help files, since there can be only one robots.txt file, at least according to this, found at the site that Hendrik recommended earlier:
You can ONLY have one robots.txt on your site and ONLY in the root directory (where your home page is):
OK: http://www.yourdomain.com/robots.txt
BAD - Won't work: http://www.yourdomain.com/subdirectory/robots.txt
If I put this at the root, my entire company site would be skipped by Google.

Hmmmm. Is my only option to do just that, set up a separate domain for my help files?

Thanks.
Simon Dismore
Posts: 454
Joined: Thu Nov 16, 2006 1:29 pm
Location: London, UK

Re: prevent search engines from indexing my help?

Unread post by Simon Dismore »

If you look at the examples in Hendrik's Wikipedia link I think you'll find that the robots.txt file at the root of your domain can ask search bots to exclude specific subdirectories.

It's none of my business (and anyway I'm not an expert in SEO) but you might attract more potential users to your site if you allow your help files be indexed by search engines, especially if other reputable sites can link to your material.
User avatar
Tim Green
Site Admin
Posts: 23189
Joined: Mon Jun 24, 2002 9:11 am
Location: Bruehl, Germany
Contact:

Re: prevent search engines from indexing my help?

Unread post by Tim Green »

Hi Kevin,

As Simon pointed out, you can craft your robots.txt to include/exclude specific directories on your site. In addition to this, you can turn the auto-reload function for topic files ON, because this seems to prevent Google from indexing your site as well. In your project (if you are not using a skin), or in your skin (if you are) go to Configuration > Publishing Options > WebHelp > Navigation and switch on the option for reloading the full UI if the topic is loaded without the navigation frame.

In addition to this, you can also add a meta robots command to your HTML templates, telling search bots not to index your pages. Add this line

Code: Select all

<meta name="robots" content="noindex, nofollow" />
To all the following templates:
  • HTML Page Templates > Default
  • Layout
  • Table of Contents
  • Keyword Index
  • Full Text Search
Add it on a line of its own together with the other <meta> tags in the <head> section of the template. Again, in your project if you are not using a skin and in your skin if you are.

Warning: Like traffic regulations in Bombay, robots directives are only really vague suggestions. Reputable search engines like Google and Bing will honor them, almost all of the others will just ignore them. 8)
Regards,
Tim (EC Software Documentation & User Support)

Private support:
Please do not email or PM me with private support requests -- post to the forum directly.
Kevin Killion
Posts: 38
Joined: Fri Jun 15, 2012 9:44 pm

Re: prevent search engines from indexing my help?

Unread post by Kevin Killion »

Great, perfect! Thanks, Tim!
Post Reply