Christine Piatko, Christine.Piatko@jhuapl.edu
Johns Hopkins University / Applied Physics Laboratory (JHU/APL)
Johns Hopkins Road
Laurel, MD 20723-6099
NIRVE allows the user to consolidate related keywords into more meaningful concepts; the document set is then organized into clusters, based on these concepts. Users can control granularity of clustering, examine the title and full text of documents, assign a relevance status to documents and clusters and then view subsets based on this evaluation. Finally, the user can generate HTML summaries of an individual cluster, or of the entire subset of documents on display.
In our future work we hope to provide even more powerful user mechanisms than those described herein, for example the ability to compare and manipulate result sets from several queries or search engines. Also, we plan to address evaluation issues, with special attention given to measuring which NIRVE features affect the performance of various user tasks.
The NIRVE Control window is a control menu from which the user can select among the various operations discussed below. Although any entry in the menu can be selected at any time, the order of entries is designed to match approximately a typical search strategy: basic organizational operations are at the top, then viewing, then style details, and finally summarization and quitting the session.
The Concept Control window is used to specify the mapping of query keywords into more comprehensive concepts, and also to assign a "weight" or importance to each of the resulting concepts.
The Document Space window contains a symbolic view of a set of documents. The primary information represented for each document is its concept profile, but other properties may also be displayed (see "Iconic Representation"). The documents are arranged in groups called clusters, based on similarity of their concept profiles. Each cluster has an icon that displays the average profile for the documents therein. There are a number of operations available in this window, as described under "View modes".
The user is presented with a square matrix in the Concept Control window (see figure), initialized to the identity mapping. By clicking on the cells of the matrix, he/she may specify any desired mapping, i.e. any subset of keywords may be mapped into each concept. Typically, the mapping will be such that every keyword is mapped into exactly one concept, but this is not mandatory.
When computing the individual concept coordinates, we have found it to be a useful heuristic to substitute a negative value (such as -0.5) for zero; this serves to emphasize the semantic difference between a document with a one or a few occurrences of a given keyword and a document with none at all.
The user can assign different weights to the concepts in order to express his/her judgments about which are most important. This is done through a set of sliders in the Concept Control window (see figure). This window also associates each named concept with a unique color. Concept weights are used for two purposes: to affect the circular sequence used in the document space window (see below) and to calculate a scalar concept score for each document.
Given the position of each document in concept space, we define the distance between pairs of documents according to the usual Euclidean metric. That space may be uniform in each dimension, or scaled according to the weight assigned by the concept weight sliders. For the scaled metric, a higher weight emphasizes differences among documents with different values for that concept.
We then wish to determine a compact circular order for the documents (note that this is simply the multiscaling problem of reducing N dimensions to one), on the premise that documents with similar concept profiles are likely to be similar in meaning and therefore should be kept geometrically close to each other. We are currently using a simple nearest neighbor heuristic to find a reasonably short path.
The ordering allows sequential scanning of the documents by the user, thus making it easier to be sure that no documents have been overlooked. It also supports a simple algorithm to cluster documents into semantically related groups.
Within the circular sequence of documents, the gaps between adjacent documents vary in size. We define a cluster as a maximal subsequence of documents within which no gap exceeds a given threshhold. By setting the gap threshhold higher or lower, NIRVE generates fewer or more clusters, respectively. Thus, the user can dynamically specify finer or coarser cluster granularity.
The clusters of documents are arranged in document space in rays or spoke-like patterns around a circle. The icons sit perpendicular to the XZ plane, with a cluster icon on the outer edge of each ray. The spacing between adjacent documents within a cluster subsequence is proportional to the distance between them in concept space. Similarly, the angular distance between adjacent clusters in document space is proportional to the distance between their concept profiles in concept space.
The result is that clusters and documents which are close geometrically are also close semantically. Furthermore, since similar documents are lined up as parallel squares, they can easily be compared visually. The converse does not always hold: semantically close documents or clusters may not wind up being close geometrically; this result is inherent in the problem of representing higher dimensional entities in a lower dimensional setting. Nonetheless, semantically similar documents will usually be in the same cluster, even if separated within the cluster.
Note the distinction between concept space and document space. The former is the abstract n-dimensional space within which a document's concept profile specifies its precise location. The latter is a 3-dimensional space in which visible document icons are heuristically arranged for the user's inspection and manipulation.
Document icons may be decorated, at the user's option, with small glyphs for such attributes as document length, document score assigned by the search engine, and document score based on concept weights. This latter concept score is basically a sum of products of the document's concept coordinates times the user-specified weight for the corresponding concept.
In spin mode, the display rotates around its natural axis, allowing the user to see the general nature and location of the clusters. It is also useful for scanning through all the clusters to see which appear promising or not. As we shall see, a cluster judged to be irrelevant can be deleted from the display.
In move mode, the mouse can be used to rotate and translate the display to give any desired view.
In pick mode, the mouse is used to operate on documents or clusters. Simple cursor motion causes the indicated icon to be highlighted and associated text to be displayed: the title for document icons, and a profile summary for cluster icons. Mouse-button 1 displays the details of the selected icon: the full text of a selected document or a generated text summary of a selected cluster is displayed via Netscape. Mouse-button 2 is used to mark the icon with a user-specified relevance status, as described in the next section. Mouse-button 3 causes the display to be rotated and shifted so that the selected icon is brought front and center for closer inspection.
[Chal92] M. Chalmers and P. Chitson (1992). Bead: Explorations in Information Visualization, Proceedings of SIGIR '92, Copenhagen, Denmark, June 21-24, pp. 330-337.
[Chal96] M. Chalmers, R. Ingram, C. Pfranger, Adding Imageability Features to Information Displays, Proceedings of UIST 96, Seattle WA, Nov 6-8.
[Golo97] G.Golovchinsky (1997), Queries? Links? Is there a difference?, Proceedings of CHI 97, Atlanta, GA, March 22-27.
[Hemm95] M. Hemmje, C. Kunkel, and A. Willett (1994). LyberWorld - A visualization user interface supporting full text retrieval, Proceedings of SIGIR '94, Dublin, Ireland, July 3-6, 249-259.
[Kroh95] U. Krohn (1995). Visualization of navigational retrieval in virtual information spaces. Proceedings of the Workshop on New Paradigms in Information Visualization and Manipulation, Baltimore, MD, Dec 2, 26-32.
[Lin92] X. Lin (1992). Visualization for the document space. Proceedings of Visualization '92, Boston, MA, Oct 19-23, 274-281.
[Nowe96] L.T. Nowell, R.K. France, D. Hix, L.S. Heath, and E.A. Fox (1996). Visualizing Search Results: Some Alternatives to Query-Document Similarity, Proceedings of SIGIR '96, Zurich, Switzerland, Aug 18-22.
[Olse93] K.A. Olsen, R.R. Korfhage, K.M. Sochats, M.B. Spring, and J.G. Williams (1993). Visualization of a Document Collection: The VIBE system, Information Processing and Management, 29(1):69-81.
[Veer96] Aravindan Veerasamy, Nicholas J. Belkin (1996). Evaluation of a Tool for Visualization of Information Retrieval Results, Proceedings of SIGIR '96, Zurich, Switzerland, Aug 18-22, pp. 85-92.
[Wise95] J. Wise et al, Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents, Proceedings of IEEE Information Visualization, October 1995, pp. 51-58.