The DARPA Intelligent Collaboration and Visualization program (IC&V) has the goal of developing the generation after next collaboration middleware and tools to enable military components and joint staff groups to enhance the effectiveness of collaborating problem solvers through:
The IC&V program has funded a number of groups to develop collaborative technologies to address these problems; it has also devoted a portion of the funds towards the establishment of evaluation metrics, methodologies and tools. Since the technologies developed are likely to be diverse, it seems appropriate to review the original program objectives and use them as a basis for establishing the direction which the investigation of evaluation methods might take. The IC&V program objectives are:
The effectiveness of the overall IC&V program will be evaluated with respect to these high-level objectives. The Evaluation Working Group (EWG) of the IC&V program has been established to support implementation of this evaluation. The role of the Evaluation Working Group is to develop the metrics and evaluation methodology, and to develop, or guide the development of, specific tests and tools for achieving effective and economical evaluation for the collaborative technologies that make up the IC&V program.
The Evaluation Group comprises researchers from several sites
(CMU, MITRE, NIMA, NIST and Amerind), with
diverse backgrounds and interests.
The EWG has taken as its primary task the definition and
validation of low-cost methods of evaluating collaborative
environments, such that any researcher in the collaborative computing
research community can use these methods to evaluate their own or
other research products. This objective is further refined into a set
of goals as follows:
The technologies supported under the IC&V program range from infrastructure technologies at the level of networking and bus protocols, to middleware to provide easy interoperability, to user-oriented collaborative tools. Given this wide range of technologies and the background of the EWG members, the EWG has decided to focus on the user-oriented end of the spectrum. In addition, specific interests of various EWG members (NIST, in particular) may lead to subgroups working in the area of infrastructure technology evaluation, especially as these areas affect the user level (e.g., sensitivity to network load may limit number of participants in a collaborative session). Currently, there are no plans for the EWG to provide evaluation metrics aimed at the software infrastructure, e.g., how easy it is to make a new application collaborative, or how a given layer of middleware might enhance interoperability. These are clearly important issues that will affect the long-term success of the program, but they lie outside the scope of the EWG as it is currently constituted.
This methodology document has been developed as the vehicle for encoding agreements of the IC&V Evaluation Working Group as we develop a methodology for evaluation of the IC&V technologies.
The set of problems which can be solved through collaboration is independent of any of the efforts funded under the IC&V program, but it is presumed that each technology in the program can be used to help solve some subset of the range of problems. The challenge for the EWG is to provide an evaluation methodology that can be applied across the diverse IC&V research projects and approaches to collaboration. It is critical to provide researchers with tools to measure their own incremental progress, as well as providing methods to evaluate the impact of specific technologies on the overall effectiveness of collaboration.
We outline here a particular approach to evaluation, namely scenario-based evaluation. The goal is to develop a suite of scenarios that are scripted for a problem-solving community and enacted using the technologies under evaluation. Since the technologies are diverse, the scenarios must be carefully crafted to be generic enough that they are capable of providing meaningful evaluation across multiple research projects. Enaction of the scenarios is used to provide data for the functional evaluation, or to exercise tools developed for the technology evaluation. Different scenarios will exercise different aspects of collaborative work ranging from number of participants, to kind of shared objects, to how participants need to interact with each other and with the shared objects.
The remaining sections of this document are structured as follows. Section 2 situates this methodology in the context of current evaluation approaches from human-computer interface (HCI), computer-supported collaborative work (CSCW) and spoken language systems (SLS) research, and discusses the rationale for scenario-based evaluation. It also defines critical terminology for use in the remainder of the document.
In section 3, we present a framework that defines the design and implementation space for collaborative systems. We define a set of generic task types that can be used to construct scenarios.
Section 4 discusses the concept of a scenario as a vehicle for simulating a collaborative activity. The exercise and evaluation of any specific collaborative technology requires selection of appropriate scenarios. The section describes a scenario repository and methods for using scenarios, such as iterative evaluation, assessment of system appropriateness, and comparison of systems.
Section 5 discusses a range of metrics and measures for evaluating collaborative technologies at various levels and illustrates these with several examples.
Finally, section 6 concludes with some suggestions for future directions, including the possibility of (partially) automating participants in scenarios, to limit subject variability and reduce the human effort required to collect data. The reader can view a template for generating scenarios at http://zing.ncsl.nist.gov/~ cugini/icv/domain-template and some sample scenarios at http://www.antd.nist.gov/~ icv-ewg/pages/scenarios.html.