GenTree utility for TreeDec

1. Operation

1.1 Setting up GenTree input

To capture the static link structure of a website, use the free software linklint together with the utility convert-ll as described in the system documentation. As a result, two files will be generated: pages.dat and links.dat.

1.2 Invoking GenTree

GenTree accepts one mandatory parameter and three optional parameters, which are written in keyword=value format. GenTree should be executed from the parent directory of the website to be decorated.

1.2.1 root

The root parameter is mandatory and it specifies the pathname of the file to be used as the root of the tree. The parameter is written as: root=pathname.

1.2.2 indir

The indir parameter (indir=input_directory) tells GenTree in which directory to find the pages.dat and links.dat files. The default is "treedec".

1.2.3 outdir

The outdir parameter (outdir=output_directory) tells GenTree in which directory to write its output files (see below). The default is "treedec".

1.2.4 dir

The dir parameter (dir=inout_directory) is a shortcut for indir and outdir to indicate that input and output files are in the same directory. The default is "treedec".

1.2.5 Example

Consider this command line:
   gentree.perl root=mysite/home.html  indir=/home/blip/data
This tells GenTree that:

2. GenTree Output

2.1 Message file

GenTree writes informational and error messages to STDOUT and to a message file, named gen-td-msg.txt.

2.2 Tree file

GenTree writes a tree file, named td-tree.dat. It will overwrite a pre-existing file, so rename any other versions you want to save. The tree file will conform to the format requirements of TreeDec. The tree is generated by performing a breadth-first expansion from the specified root.

Note that this is not guaranteed to generate a perfectly satisfactory logical tree. Links within pages may skip over logical levels, the order of siblings may not be correct, and so on.

Moreover, the title field within each record is specified as "*", indicating that the title should be acquired from the <title> field of each HTML file. This may be acceptable, but you may achieve more usable results by directly specifying the title content.

3. Next Step: TreeDec Itself

Read all about it.




Version 1.1
Page last modified: 15 May 2002
National Institute of Standards and Technology (NIST)