TreeDec Program
1. Overview
TreeDec reads an input file which specifies a hierarchical structure
for a website and decorates the HTML files of the website with
navigation aids as a result. A set of links is added to each page,
allowing the website user to navigate to other related pages nearby in
the tree. Specifically, TreeDec can generate three sets
of related links:
- An ancestor path, which shows the context
of the current page
- Siblings, which show related pages
- Immediate children which show details of the current page.
TreeDec will delete earlier TreeDec decorations, so it can be run
against either an undecorated or previously decorated website. Note
that TreeDec uses relative URLS for the generated links, to
facilitate movement of the website as a whole.
1.1 Parent Directory
TreeDec assumes the existence of a parent directory. This
directory:
-
must contain, directly or in its subtrees,
all the files to be decorated;
-
is where TreeDec should be executed from;
-
is the location from which the pathnames of
td-tree.dat are defined;
-
is where the table of contents
will be written and the place from which its links are valid;
1.2 TreeDec Logic
Here is an outline of the logic of TreeDec:
- First sets configuration options. Opens td-config.dat
and reads and parses the contents.
- Reads in and parses the td-tree.dat file. As a result,
sets up an internal tree structure so TreeDec knows
how to generate the navigation tables.
- As it's reading in td-tree.dat, it generates
the table-of-contents file.
- Goes through the internal tree structure (every node represents a
file) and for each node, reads in the existing file and generates a
corresponding new file (in a temporary directory) with the tree
decorations: links to ancestors, siblings, and children.
- As requested by rewrite_option,
renames the new files to the old names, thus deleting
the original files.
1.3 TreeDec Customization
The configuration file gives the user some
control over the decorations (e.g. you can choose background color) to
be added to the webpages. Nonetheless, users might need to have
complete control of the generated HTML. If you wish to do this,
you must modify the PERL code. The comment line:
### ### ### INSERT DECORATIONS HERE ### ### ###
indicates where the new HTML is generated.
2. Parameters
TreeDec accepts two optional parameters:
-
"undo" tells TreeDec to rewrite the website files simply by
removing all previous decorations. No new decorations are
added. This parameter overrides the setting of do_ancestors, do_siblings, and do_children.
-
dir=directory tells TreeDec to read its input
(td-config.dat and td-tree.dat) from, and write its
message file to, the named directory. The default directory is
"treedec". Note that the table of contents is always written to the
current (parent) directory, since its relative URLs would be invalid
from any other location.
3. Inputs
There are two input files, a configuration file named td-config.dat
and a tree file named td-tree.dat. The configuration file controls
the operation of TreeDec; the tree file describes the tree structure
to be imposed on the files of the website. Within the two input
files, a leading "#" indicates a comment record.
3.1 Configuration file
The configuration file, named td-config.dat, controls
how TreeDec goes about its
decoration chores. Each record in the file is of the form
keyword=value
and sets some independent choice. There must be no spaces
around the "=". The keywords are composed of letters,
numbers, and underscores. A non-quoted value extends from the "=" to
the last non-blank. A quoted value extends from the first to last
quotation mark. Thus, in these lines,
color1= light gray
color2=" light gray "
substituting "." for the space character, color1 has a value of
"...light.gray" and color2 has a value of "...light.gray..".
Generally speaking, you need explicit quotes if and only if the
value must have trailing whitespace.
If you specify conflicting parameters,
TreeDec will believe the last thing you tell it.
Here is an explanation of all the legal keywords and values, and
the default settings.
- do_ancestors
do_siblings
do_children
-
Controls whether context (ancestor path), related (siblings), or
details (children) blocks are generated. Value must be 0 (suppress)
or 1 (generate). Default = 1;
- do_toc
-
Controls whether the table of contents (TOC) is used as the implicit
root of the tree when constructing ancestor paths. If so, the TOC
is treated as the parent of any zero-level entries (no preceding tabs)
in the tree file. Value must be 0 (TOC not in
ancestor path) or 1 (use TOC as root). Default = 1;
- do_skip
-
Controls whether text is inserted to allow skipping over the
navigation table. This is useful primarily for those employing
audible screen readers. Value must be 0 (no skipping inserted) or 1
(insert text for skipping). Default = 0;
- ancestor_bg
sibling_bg
children_bg
-
These control the background color used in the
context (ancestor path), related (siblings), and
details (children) blocks. The value will be passed on to
a BGCOLOR parameter in HTML, so it should be a legal color.
Defaults are:
ancestor_bg | "#ffaaaa" (light pinkish-red)
|
sibling_bg | "#ffff66" (light yellow)
|
children_bg | "#aaaaff" (light blue)
|
- ancestor_sep
-
This controls the visible mark that separates links within the
ancestor path. Note that this is inserted into an HTML file, so be
careful of greater-than and less-than signs. The default is
" -> " which is displayed in HTML as: space, dash,
greater-than sign, and space (so it looks like a crude arrow pointing
to the right). You can also insert an image; for example:
"<img src=/home/image/arrow.gif>"
- insert_string
Insert_String
-
The character string within the existing HTML files marking the place
where the navigation aids will be inserted. "Insert_String" tells
TreeDec to perform a case-sensitive search for the designated
string; "insert_string" makes it case-insensitive. TreeDec will insert
the new material immediately before or after the first occurrence
of this string within a file. If the string occurs in the middle of a
line, the line will be split to allow insertion.
By default, TreeDec uses insert_tag (see following), not insert_string
to find the insertion point.
- insert_tag
-
The HTML tag within the existing HTML files marking the place where the
navigation aids will be inserted. The default is "body".
TreeDec inserts the new material immediately before or after
the first occurrence of this tag within a file. If the tag
occurs in the middle of a line, the line will be split to allow
insertion.
Several points to be aware of:
-
The search is always case-insensitive, since case does not matter
for HTML tags.
-
Enter only the name of the tag, not the angle brackets.
-
A closing tag (e.g. "/head") is OK.
-
The search is more complex than for a simple string because
additional parameters in the tag are accounted for. E.g.
if you specify "insert_tag=body", TreeDec inserts at
"<body>" or "<BODY bgcolor=red>"
-
The entire tag must be on a single line of the file in order to be found.
- insert_where
-
This determines whether insertion takes place before (value="b") or
after (value="a") the insertion string or tag. The default is "a".
- tempfile_dir
-
TreeDec generates temporary files before possibly overwriting the
existing files. This parameter specifies the directory in which these
files are written. The default is "../tmpweb". E.g. new version of
a/b/xyz.html is written to
../tmpweb/a/b/xyz.html. Likewise, if tempfile_dir
has a value of "mirror", the file would be written to
mirror/a/b/xyz.html. If a needed temporary
directory does not exist, TreeDec will create it automatically.
- rewrite_option
-
A three-letter code to control under what conditions TreeDec will
overwrite existing HTML files and the disposition of the
temporary directory tree. The first letter controls
what happens if no errors are detected during page generation;
the second if at least one error is detected.
The valid codes and their meanings for the first two letters are:
r | rewrite
|
a | ask the operator whether to rewrite
|
n | do nothing
|
q | quit as soon as an error is found
|
"q" is allowed only as the second letter.
The third letter is "s" or "d" for save or delete, and indicates what
is to be done to the temporary directory tree
after successful overwriting has occurred (all the temporary
copies have been renamed to the "real" website tree). If the
temporary directory is deleted, then any other files that exist in it
(not created by TreeDec) will also be deleted.
If overwriting has not occurred, then the temporary
directory tree is left intact.
The default is "ans", i.e. if no errors are detected, ask the operator
whether to rewrite, otherwise do not rewrite, and save the temporary
directories even if overwriting occurs.
Example
A td-config.dat might look like this:
# Configuration file example here
ancestor_bg=#ff44ff
sibling_bg=#9999ff
# the following should produce a space, two successive greater-than
# signs, and then another space.
ancestor_sep=" >> "
# insert decorations right after the head segment of the HTML file.
insert_where=a
insert_tag=/head
# if no errors go ahead and rewrite, otherwise ask operator;
# delete temporary files if rewriting takes place
rewrite_option=rad
3.2 Tree File
The tree file, conventionally named td-tree.dat,
specifies the logical tree to be implemented.
3.2.1 Record Format
Each record of the tree file consists of some indentation implicitly
showing its position in the tree, the filename of the page to be
altered, some whitespace, and the title to be used in the decorations:
indentation filename whitespace title
3.2.2 Indentation field
The indentation field consists of zero or more tab characters
so that the filename is indented to express hierarchical relationships in
the conventional way. Each tab represents another level down
within the tree. A record must not skip down more than one level
beyond the previous record, since this cannot be interpreted as normal
tree structure. Thus, the first record must be a "zero-level"
record (no tabs).
It is valid to have several zero-level records in a file. You can
think of this as representing several independent trees, or as
one big tree with the table of contents as the root
(see do_toc parameter).
3.2.3 Filename field
The filename must immediately follow the indentation (no intervening
spaces) and must not itself contain spaces. Always use "/" as the
separator between directory levels, even on Windows systems, where "\"
is the convention. Perl takes the operating system into account, so
that the same source code can run on various systems without
alteration.
Filenames must be specified as relative to the parent directory. Absolute pathnames are not
allowed. For example, "mammal/feline.html" is OK, but not
"/home/zoosite/webfiles/mammal/feline.html". The directory structure
does not have to reflect the logical tree structure in any way; the
only constraint is that all the files must be within the subtree of
the parent directory.
Note that filenames are used for two purposes: first, to tell
TreeDec which files to rewrite, and secondly to compute
relative URLs between files.
3.2.4 Title field
Finally, the title is taken as "everything else" (the remaining words)
in the record. In the special case where the title field of the
record is "*", TreeDec will attempt to retrieve the title from the
<title> field of the HTML file, instead of getting it explicitly
from td-tree.dat.
3.2.5 Record Order
Note that order of siblings is significant for the display of sibling
and child links. For example:
## probably wrong
root.html
chaps/c-3.html Chapter 3
chaps/c-1.html Chapter 1
subs/sub11.html Subsection 1.1
chaps/c-2.html Chapter 2
## probably right
root.html
chaps/c-1.html Chapter 1
subs/sub11.html Subsection 1.1
chaps/c-2.html Chapter 2
chaps/c-3.html Chapter 3
3.2.6 Tree File Example
Here is a plausible td-tree.dat file:
# Tree file example here - think of a zoo application
./zoo-home.html Zoo home
mammal/mammalia.html Mammals
mammal/feline.html Felines
mammal/lions.html See the Lions
mammal/tigers.html and the Tigers
mammal/canine.html Canines
mammal/wolf.html Beware of Wolves
mammal/jackal.html The Jackals Circle round
bird/birds.html Birds
bird/canary.html Hear the Canaries sing
# and so on for birds and other families
fish/fish.html Our Aquatic Friends
insect/insects.html Insects
# try to pick up title from mosquito file:
insect/mosquito.html *
4. Constraints
-
TreeDec handles static HTML pages only - there is no ASP support.
-
TreeDec assumes that all pages of the website are accessible
within one file system and it uses relative URLS for
inter-page pointers.
5. Results
See the results page for an explanation
of TreeDec output.