Section 1

Introduction

This document outlines a two-part methodology for evaluating collaborative computing systems. In the first part of the methodology, the researcher uses our framework to analyze and describe a given collaborative computing system in terms that reveal the capabilities of the system and allow preliminary judgments about the kinds of work tasks the system might best support. In the second part, the researcher asks subjects to use the system being evaluated to run scenarios representing the kinds of work tasks highlighted by the initial analysis, and/or the kinds of work tasks a group of potential users need support for. These scenarios help the researcher evaluate how well the system actually supports the work tasks in question.

This methodology was developed to provide a reliable but inexpensive means of evaluating collaborative software tools. At a relatively low cost, researchers in the collaborative computing research community can evaluate their own or others’ collaborative tools, and user groups can determine their requirements and ascertain how well a given collaborative system supports their work.

1.1 The Defense Advanced Research Project Agency (DARPA) Intelligent Collaboration and Visualization (IC&V) Program

The DARPA Intelligent Collaboration and Visualization program (IC&V) has the goal of developing the generation-after-next collaboration middleware and tools that will enable military components and joint staff groups to enhance the effectiveness of collaborations by:

The IC&V program has funded a number of groups to develop collaborative technologies to address these objectives; it has also devoted funds to establishing evaluation metrics, methodologies and tools. The IC&V program objectives are:

  1. Enable access to collaborative systems via diverse portals, from hand-held through room-sized.
  2. Enable interoperability across systems using diverse encoding formats, coordination and consistency protocols, and real-time services.
  3. Scale collaborations to 10 active contributors, 100 questioners, and 1000 observers.
  4. Reduce by an order of magnitude the time needed to generate collaborative applications.
  5. Enable real-time discovery of relevant collaborators and information within task context.
  6. Reduce by an order of magnitude the time required to establish collaborative sessions across heterogeneous environments.
  7. Reduce by an order of magnitude the time required to review collaborative sessions.
  8. Improve task-based performance of collaborators by two orders of magnitude.

The effectiveness of the overall IC&V program will be evaluated with respect to these high-level objectives. The Evaluation Working Group (EWG) of the IC&V program was established to support implementation of the evaluation of collaborative tools developed under IC&V. The EWG will develop the evaluation metrics and methodology, and will develop, or guide the development of, specific tests and tools for achieving effective and economical evaluation of the collaborative technologies that make up the IC&V program.

1.2 The Evaluation Working Group and Its Aims

The original Evaluation Working Group included researchers with diverse backgrounds and interests from several sites: Carnegie Mellon University (CMU), The MITRE Corporation, National Imagery and Mapping Agency (NIMA), National Institute of Standards and Technology (NIST), and Amerind [see footnote 1]. The EWG’s primary task is to define and validate low-cost methods of evaluating collaborative environments, so that researchers can use these methods to evaluate research products and users can use these methods to choose collaborative systems that will best suit their needs. This objective is further refined into a set of goals as follows:

  1. To develop, evaluate, and validate metrics and methodology for evaluating collaborative tools.
  2. To provide reusable evaluation technology, such that research groups can assess their own progress.
  3. To provide evaluation methods that are cheap relative to the requirements.
  4. To apply DOD-relevant criteria when evaluating collaborative systems relevant to areas such as:
    • Planning-design-analysis domains
    • C2 environments to capture planning-reaction-replanning cycle
    • Disaster relief exercises
    • Collaborative information analysis activities
  5. To define an application vision that will drive collaborative computing research.

The technologies supported under the IC&V program range from infrastructure technologies at the level of networking and bus protocols, to middleware for providing easy interoperability, to user-oriented collaborative tools. Given this wide range of technologies and the background of the EWG members, the EWG has decided to focus on the user-oriented end of the spectrum. In addition, specific interests of various EWG members (NIST, in particular) may lead to subgroups working in the area of infrastructure technology evaluation, especially as these areas affect the user level (e.g., sensitivity to network load may limit number of participants in a collaborative session). Currently, there are no plans for the EWG to provide evaluation metrics aimed at the software infrastructure; e.g., how easy it is to make a new application collaborative, or how a given layer of middleware might enhance interoperability. These are clearly important issues that will affect the long-term success of the program, but they lie outside the scope of the EWG as it is currently constituted.

1.3 The Scope and Structure of this Document

This document was developed to encode agreements of the IC&V Evaluation Working Group as we develop a framework and methodology for evaluation of the IC&V technologies.

The IC&V program is not targeted at a specific collaboration problem. Rather, the challenge for the EWG is to provide an evaluation methodology that can be applied across the diverse IC&V research projects and approaches to collaboration. Researchers need tools to measure the incremental progress towards developing useful collaborative systems, as well as methods to evaluate the impact of specific technologies on the effectiveness of collaboration. Users need ways in which to determine which collaborative software systems could meet their needs.

We present a scenario-based approach to evaluation. The long-term goal of the EWG is to develop a repository of scenarios that are scripted for a collaborative community and enacted using the technologies under evaluation. Since the technologies are diverse, the scenarios must be generic enough to provide meaningful evaluation across multiple research projects. Enacting the scenarios will provide data for the functional evaluation and also provide exercise tools developed for the technology evaluation. Different scenarios will exercise different aspects of collaborative work, such as number of participants, kind of shared objects, and ways participants need to interact with each other and with the shared objects.

The remaining sections of this document are structured as follows. Section 2 situates this methodology in the context of current evaluation approaches from human-computer interface (HCI) and computer-supported cooperative work (CSCW) research, and discusses the rationale for scenario-based evaluation. It also defines critical terminology for use in the remainder of the document.

Section 3 presents a framework that defines the design and implementation space for collaborative systems. It includes a set of generic task types that can be used to construct scenarios.

Section 4 discusses the concept of a scenario as a vehicle for simulating a collaborative activity for purposes of evaluation. Our approach to exercise and evaluate specific collaborative technologies requires selection of appropriate scenarios. Section 4 describes methods for using scenarios for purposes such as iterative evaluation, assessment of system appropriateness, and comparison of systems.

Section 5 discusses a range of suggested metrics and measures for evaluating collaborative technologies at various levels and illustrates these with several examples.

Section 6 includes a discussion of how the methodology can be used to design an experiment.