Script to fix broken topic links

This is the place to share H&M templates and utilities with other users. Topic templates can be posted as text attachments (*.txt) or in the posting itself between

Code: Select all

 and 
tags. Print manual templates (*.mnl) are digital and can only be posted as attachments. Utilities and multiple files can be posted in ZIP archives. Please include plenty of comments so that users understand what you're doing! Registration is required to access this forum.

Moderators: Alexander Halser, Tim Green

Post Reply
jeremym1234
Posts: 13
Joined: Tue Feb 24, 2009 9:13 pm
Location: Boise, ID
Contact:

Script to fix broken topic links

Unread post by jeremym1234 »

Attached is a Ruby script I wrote to help fix broken topic links after importing 15 help files into H&M. The files had cross-project links (over 700 of the 34,000 links where cross-project) and I wrote this script to automatically point them at the correct child project (via a change to the link's href and the addition of the domain attribute).

Hopefully this can help others in the future or be a starting point for other utility scripts to manipulate your H&M xml files.

J.D.

Code: Select all

# Copyright (c) 2009 Jeremy D. Mullin (http://www.jdwashere.com)
# Portions Copyright (c) 2009, iAnywhere Solutions, Inc.
#
# This software is provided "AS IS" without warranty of any kind. Use at your own risk.
#
# This script will find broken cross-project links in Help & Manual projects and fix them.
#
# I wrote this script when importing html help from RoboHelp into H&M, although
# this should help to convert from any help authoring software to H&M.
#
# This script is meant to process a directory full of subfolders that each contain
# individual H&M project files that have a unique topic prefix. For example, if the
# folder "MyProjects" contained:
#
#    <MyProjectBaseDirectory>
#       <Project1>
#          projectfile.hmxp
#          <Topics>
#          <Maps>
#          <Baggage>
#       <Project2>
#          projectfile.hmxp
#          <Topics>
#          <Maps>
#          <Baggage>
#
# then the Project1 topic files, contents map, and topic links are all expected to
# have a "project1_" prefix:
#
#       project1_topic1.xml
#       project1_topic2.xml
#       etc...
#
# and so on for every project in the folder. To get this layout, follow the steps in
# the H&M Online Help. See:
#
#     More Advanced Procedures -> Working with Modular Help Systems ->
#        Managing IDs and context numbers -> Changing IDs and context numbers globally
#
# Once your projects have this layout, check them in or back them up, then run fixlinks.rb
#
# usage: ruby fixlinks.rb <directory> [-updatefiles]
#    If -updatefiles is not passed in, this utility will do a dry run, but
#    will not modify any files on disk.

require 'fileutils'

$warnings = []
$errors = []
$updatefiles = false
$fixedLinks = 0

class TopicInfo
  attr_reader :domain
  attr_reader :filename
  attr_reader :fullpath
  attr_reader :projectfile

  def initialize( projectfile, domain, filename, fullpath )
    @domain = domain
    @filename = filename
    @fullpath = fullpath
    @projectfile = projectfile
  end
end



###############################
# getTopicFiles
###############################
def getTopicFiles( projectfile, project, dirname )

  result = {}

  Dir.foreach( dirname ) {
  |entry|

  if ( not FileTest.directory?( dirname + "/" + entry ) and not ( entry == "." ) and not ( entry == ".." ) )
    # store key without "prefix_" on it, store value as a new TopicInfo instance
    result[entry.sub( /.+?_/, "" ).sub( ".xml", "" )] = TopicInfo.new( projectfile, project, entry, dirname + "\\" + entry )
  end
  }

  return result
end



###############################
# getSubdirs
###############################
def getSubdirs( dirname )
  result = {}
  projfile = ""

  Dir.foreach( dirname ) {
    |entry|

    if ( not ( entry == "." ) and not ( entry == ".." ) and not ( entry == ".svn" ) )

      # Find the .hmxp file for this project
      Dir.foreach( dirname + "\\" + entry ) {
        | file |
        if /hmxp$/.match( file )
          projfile = file
          break
        end
      }

      # create a new hash entry and put the topic filenames into this new hash
      begin
        result[entry] = getTopicFiles( projfile, entry, dirname + "\\" + entry + "\\Topics" )
      rescue
        $warnings << "WARNING: " + entry  + " did not have a Topics subdir\n"
      end
    end
  }

  return result

end



###############################
# findTopic
###############################
def findTopic( subdirsHash, projectToIgnore, topic )
  result = Array.new

  # return an array of matching topic names in all projects
  # except the one passed in.
  subdirsHash.each_key {
    |project|

    if not ( project == projectToIgnore )
      if ( subdirsHash[project][topic] != nil )
        result << subdirsHash[project][topic]
      end
    end
  }

  return result
end



###############################
# resolveMultipleLinks
###############################
def resolveMultipleLinks( possibilities, project, topicFile, match )

  # make the user choose which topic link to replace with if multiple options exist
  print "\nFound multiple topic link possibilities for " + project + ": " + topicFile.filename + ": " + match + "\n"
  print "Which of the following do you want this link to point to?\n"
  possibilities.each_index {
    |topicIndex|

    menuIndex = topicIndex + 1
    print menuIndex.to_s + ")  " + possibilities[topicIndex].domain + ": " + possibilities[topicIndex].filename + "\n"
  }
  print ">"

  choice = STDIN.gets.to_i
  while ( ( choice <= 0 ) or ( choice > possibilities.size ) )
    print "Invalid input, please select a valid integer from the menu\n"
    print ">"
    choice = STDIN.gets.to_i
  end

  return possibilities[choice - 1]
end



###############################
# findBrokenLinks
###############################
def findBrokenLinks( subdirsHash )

  # search each project (each subdir)
  subdirsHash.each_key {
    |project|

    # search each topic file in this subdir for broken links
    subdirsHash[project].each_value {
      |topicFile|

      # read in the individual topic file
      begin
        f = File.new(topicFile.fullpath)
        lines = f.read()
        f.close
      rescue
        $errors << "ERROR reading file " + topicFile.fullpath + " :" + $! + "\n"
        return
      end

      newlines = String.new( lines )
      madeChanges = false

      # Look for topic links. Check each link to see if the topic it references is
      # really in this project. If not, search all the other projects for a matching
      # topic.
      lines.scan( /type=\"topiclink\" href=\".+?\"/ ) {
        |match|

        # If this link already has a domain (already links specifically to another project), skip it.
        textAfterMatch = $'
        if /\s+?domain/.match( textAfterMatch )
          next
        end

        madeChanges = true

        # Get the topic reference from this link string
        topicref = match.sub( "type=\"topiclink\" href=\"", "" ).chop

        # Remove the project prefix
        topicrefNoPrefix = topicref.sub( /.+?_/, "" )

        # Declare TopicInfo instance we will use later
        replacementTopic = Object.new

        # See if the target is in this project (internal project link we don't have to modify)
        if ( subdirsHash[project][topicrefNoPrefix] == nil )
          # Nope, look for it in other projects
          possibilities = findTopic( subdirsHash, project, topicrefNoPrefix )

          # Replace the topic and add a domain attrib which specifies what project to link to
          if ( possibilities.size == 0 )
            $errors << "ERROR: No topic link possibilities found for " + project + ": " + topicFile.filename + ": " + match + "\n"
          else
            if ( possibilities.size == 1 )
              replacementTopic = possibilities[0]
            else
              replacementTopic = resolveMultipleLinks( possibilities, project, topicFile, match )
            end

            newlink = "type=\"topiclink\" href=\"" + replacementTopic.filename.sub( ".xml", "\"" ) + \
            " domain=\"../" + replacementTopic.domain + "/" + replacementTopic.projectfile + "\" "

            print "Replacing " + match + " with " + newlink + "\n"
            $fixedLinks = $fixedLinks + 1
            newlines.sub!( match, newlink )
          end
        end
      }

      # If we made changes, write out the new lines
      if ( madeChanges and $updatefiles )
        f = File.new(topicFile.fullpath, "w")
        f.truncate( 0 )
        f.write( newlines )
        f.close
      end

    }  # for each topic file

  }  # for each project

end



###############################
# main body
###############################
if ARGV[0].nil?
   print "You need to pass in the directory that contains the project subdirectories.\n"
   print "usage: fixlinks.rb <directory> [-updatefiles]\n"
   print "  If -updatefiles is not passed in, this utility will do a dry run, but\n"
   print "  will not modify any files on disk.\n"
   exit
end
if not ARGV[1].nil?
  if ( ARGV[1] == "-updatefiles" )
    $updatefiles = true
  end
end

print "Starting\n"

# declare hash of subdir names
subdirs = {}

# populate that hash with hashes that include the topic files of all subdirs
subdirs = getSubdirs( ARGV[0] )

# process each subdir, looking for broken links
findBrokenLinks( subdirs )

# print any warnings
if ( $warnings.size > 0 )
  print $warnings
end

# print any errors
if ( $errors.size > 0 )
  print $errors
end

# and we're spent...
print "\nFixed " + $fixedLinks.to_s + " links.\n"
print "Finished with " + $warnings.size.to_s + " warnings and " + $errors.size.to_s + " errors.\n"

You do not have the required permissions to view the files attached to this post.
User avatar
Tim Green
Site Admin
Posts: 23156
Joined: Mon Jun 24, 2002 9:11 am
Location: Bruehl, Germany
Contact:

Unread post by Tim Green »

Hi Jeremy,

Thanks for posting this, I'm sure some users will find it useful! :)
Regards,
Tim (EC Software Documentation & User Support)

Private support:
Please do not email or PM me with private support requests -- post to the forum directly.
Post Reply