next up previous contents
Next: Measures and Metrics Up: No Title Previous: A Framework for Collaborative

Scenarios for Evaluation of Collaboration Tools

Introduction

Having defined a framework for collaborative systems and generic task types, we can re-examine what a scenario is, and how we can use it for evaluating collaborative systems.

A scenario is an instantiation of a generic task type, or a series of generic tasks linked by transitions. It specifies the characteristics of the group that should carry it out, and the social protocols which should be in place. It describes what the users should (try to) do at the requirements level, but not how they should do it at any of the lower levels of the framework.

Scripts dictate how the users will carry out their tasks, at any of the three lower levels of the framework - the capability, service or technology levels.

Scenarios and the three levels of scripts each serve different, but valuable purposes.

A scenario allows the most natural interaction, of course, since it is not scripted. It can also allow the experimenters to determine whether or not the best ways of doing things are apparent/intuitive to the users.

A script allows a given interaction to be repeated much more exactly. Thus it allows much more accurate comparisons across systems or even across multiple implementations of the same system.

The capability-level script is meant to provide a script that can be used on any system which supports a given set of capabilities.

A service-level script is specific to a certain set of services, but can be used with any implementations of those services. It might be useful in cases where one wants to compare different implementations of the same basic services.

A technology-level script is the most detailed. It tells the users what buttons to press and exactly how to carry out the scenario on a given set of technologies. It provides a detailed, standardized, repeatable sequence of actions. This script can be carried out by even the most inexperienced users, including those lacking the domain knowledge required to run the unscripted scenario as well as those lacking training on the technology being assessed. It could also serve as training material for new users of the technology. It is conceivable that such a script could even be carried out by participatons, removing the reproducibility problems the human in the loop necessarily introduces. It can be used for some types of user interface analysis, such as GOMS modelling, but of course it cannot be used to assess issues such as intuitiveness, as the users are guided through every step of the interaction.

Since it can be useful to have both scenarios and various levels of scripts for the same collaborative activity, it is instructive to consider how one might generate scripts at various levels of detail for a particular collaborative activity starting from a scenario for that activity. It would be possible to start with the script and have real people (with appropriate expertise, if necessary) run through it on a particular set of technologies, logging the interactions. One could then take the logs, edit out any undesirable actions (such as wrong paths or puzzling over the interface) and create a detailed technology-level script from it. By generalizing the script first to generic services and then to the capabilities supported by those services, the scenario developer could then generate higher-level scripts to allow scripted evaluation across a greater variety of collaborative systems.

The Beginning of a Scenario Repository

The Evaluation Working Group (EWG) has created some sample scenarios and scripts which may be used for evaluation. They may be found at http://www.antd.nist.gov/~ icv-ewg/pages/scenarios.html. We encourage others to contribute scenarios or scripts that they have written as well. A scenario template to complete may be found at http://zing.ncsl.
nist.gov/~ cugini/icv/domain-template.html.

Each example clearly sets out the types of tasks, transitions, social protocols and group transitions that it exercises (or aims to exercise). It also lists the capabilities and/or services required or recommended to complete the scenario. This information will be helpful for the user in choosing appropriate scripts and/or scenarios to use for their evaluation, as discussed below.

Using Scenarios for Evaluation

Using Scenarios to Do Iterative Evaluations of a Single System

One major goal of the Evaluation Working Group is to support system developers in their need to do frequent evaluations of the system to validate the theoretical improvements offered by new versions.

In order to do this the developer should attempt to choose one or more scenarios or scripts that exercise the types of capabilities and services their system provides and/or involves the types of tasks, transitions and social protocols that they hope to support.

Developers who devise their own scenarios or scripts to exercise other sets of capabilities are encouraged to contribute them to the scenario collection that the EWG has started.

Using Scenarios to Evaluate a System's Appropriateness for your Requirements

For potential users looking to select a collaborative system to suit their needs, the scenarios can serve a different purpose. Instead of choosing scenarios based upon a particular set of capabilities or services, they can instead choose scenarios that highlight the types of requirements (tasks, transitions, social protocols and group characteristics) that they need a system to support.

Using Scenarios to Compare Systems

If multiple systems are to be compared using scenarios, several scenarios should be chosen for the evaluation. A complete comparison of the systems should compare the performance of the systems on the different scenarios. The set of scenarios and, by extension, the requirements on which each system performs well or poorly is an indicator of the systems's strengths and weaknesses. In most comparisons, it will be impossible to say which system is best, but it may be possible to determine which which is best for a particular set of requirements, or which has the best breadth of requirement support even if it isn't the best for any given set of requirements. The results will have to be interpreted in terms of the goals of the experimenter.


next up previous contents
Next: Measures and Metrics Up: No Title Previous: A Framework for Collaborative

Charles Sheppard
Wed Aug 27 17:05:29 EDT 1997