Section 4

Scenarios for Evaluation of Collaboration Tools

4.1 Introduction

Having defined a framework for describing collaborative systems and generic task types, we can re-examine what a scenario is, how it can be constructed or chosen from a library, and how we can use it for evaluating collaborative systems.

Scripted and unscripted scenarios each serve different, but valuable purposes.

An unscripted scenario allows the most natural interaction, because it does not constrain the user’s actions. It can also allow the experimenters to determine whether the best ways of doing things are apparent or intuitive to the users.

A scripted scenario allows a scenario to be repeated much more exactly. Thus it allows much more accurate comparisons of measurements across systems or even across multiple implementations of the same system.

A scripted scenario at the capability level gives the detailed instructions in terms of the capabilities that the script uses. This script does not require the use of any particular services, so it can be run on any system that provides some means of instantiating the required capabilities. Scripts defined at the capability level can often be used across platforms that provide radically different sets of services in support of the same basic capabilities.

A scripted scenario at the service level specifies the steps in the scenario in terms of the services to be used, without committing itself to a particular technological implementation of the service. It may, for example, state that a particular discussion is to take place via video conferencing, without specifying exactly how that will be achieved in any particular system. A service-level script might be useful in cases where one wants to compare different implementations of the same basic services.

A scripted scenario at the technology-level is the most detailed. It tells the users what buttons to press and exactly how to carry out the scenario on a given set of technologies. It provides a detailed, standardized, repeatable sequence of actions. This script can be carried out by even the most inexperienced users, including those who lack the domain knowledge required to run an unscripted version of the scenario, and those lacking training on the technology being assessed. It could also serve as training material for new users of the technology. Such a script could even be carried out by participatons, removing the reproducibility problems the human in the loop necessarily introduces. It can be used for some types of user interface analysis, such as formal dialog modeling techniques (Card, Moran and Newell 1983), but of course it cannot be used to assess issues such as intuitiveness, since the users are guided through every step of the interaction.

Since it can be useful to have both scripted and unscripted scenarios at various levels for the same collaborative activity, it is instructive to consider how one might generate scripts at various levels of detail for a particular collaborative activity starting from a scenario for that activity. It would be possible to start with the script and have real people (with appropriate expertise, if necessary) run through it on a particular set of technologies, logging the interactions. One could then take the logs, edit out any undesirable actions (such as wrong paths or puzzling over the interface) and create a detailed technology-level script from it. By generalizing the script first to generic services and then to the capabilities supported by those services, the scenario developer could then generate higher-level scripts to allow scripted evaluation across a greater variety of collaborative systems.

4.2 ConstructingScenarios

As an example of a means of constructing scenarios, consider an evaluation at the requirements level to compare several systems. The methodology aids in achieving repeatability across trials using the different systems.

Scenarios are constructed by putting together tasks based on the collaborative task types described in section 3. A group selects the various McGrath task types for which its members will use the collaborative system.nThey also consider the social protocols and group characteristics appropriate for the group. The collaborative task types are generic and thus, to construct scenarios, each group will choose specific tasks that are instantiations of generic task types.

For example, a group might need to plan an activity and, in doing so, solve a problem. These activities can be represented by McGrath task type 1 (planning) and task type 9 (information dissemination). >Here is a scenario that instantiates these two task types:

A frantic call comes in. Colleague #1 is late for an important meeting across town. You know about a few obstacles such as road construction blockages between Colleague #1 and her meeting location, but Colleague #2 has the latest report from the local news radio. Together, you and Colleague #2 must devise the quickest route to the meeting.   You must agree on the route and inform Colleague #1.

Experimenters may find it useful to code their scenarios using consistent terminology so that they can be more quickly compared to each other. For example, in the following description Ti refers to a type of transition task i.  Task Type # refers to a work task type of that number.  Pj refers to a collection j of attributes for a social protocol.  Using the collaborative framework to describe this scenario, we would say:

The group starts with a transition task to begin (Tstart).  They perform a planning task (Task Type 1) using an appropriate social protocol (Ppeer).  They summarize the decision in the transition task (Tsummary) and move to the information dissemination task (Task Type 9), followed by the ending transition task (Tend).

4.3 Choosing Scenarios

Ideally, a set of scenarios developed over time could be shared among experimenters so that scenarios need not always be developed from scratch.  The Evaluation Working Group (EWG) has created some sample unscripted and scripted scenarios that may be used for evaluation. They may be found at http://www.nist.gov/nist-icv/pages/scenarios.html. We encourage others to contribute scenarios or scripts that they have written to help develop this repository. A scenario template to complete may be found at http://zing.ncsl.nist.gov/~cugini/icv/domain-template.html.

Each example clearly sets out the types of tasks according to the extended work task (McGrath-like) categorization in Section 3; and the transitions, social protocols and group transitions that it exercises (or aims to exercise). It also lists the capabilities and/or services required or recommended to complete the scenario. This information will be helpful for users in choosing appropriate scenarios to use for their evaluation, as discussed below.

4.4 Using Scenarios for Evaluation

The utility of any given scenario is obviously much greater if it is chosen to have the property defined as “salience” (Potts 1995). Potts characterizes salient scenarios as those that are pertinent to collaboration goals and model expected obstacles to task completion. The evaluator should begin by identifying the goals addressed by the system under evaluation and the obstacles to meeting those goals. Explicitly identifying goals and obstacles can help the evaluator select the appropriate scenario.

4.4.1 Using Scenarios to Iteratively Evaluate a Single System

One major goal of the Evaluation Working Group is to support system developers in their need to do frequent evaluations of the system to validate the theoretical improvements offered by new versions. This is a bottom-up evaluation, beginning with the technology (the system being evaluated) and moving up to the requirements that the system supports.

Developers should attempt to choose one or more scripted or unscripted scenarios that exercise the types of capabilities and services their system provides and/or involves the types of tasks, transitions and social protocols that they hope to support.

Developers who devise their own scripted or unscripted scenarios (see section 4.3) to exercise other sets of capabilities are encouraged to contribute them to the scenario repository that the EWG has started.

4.4.2 Using Scenarios to Evaluate a System’s Appropriateness for Your Requirements

For potential users looking to select a collaborative system to suit their needs, the scenarios can serve a different purpose. Instead of choosing scenarios based upon a particular set of capabilities or services, they can instead choose scenarios that highlight the types of tasks that they need to perform.

4.4.3 Using Scenarios to Compare Systems

If multiple systems are to be compared using scenarios, several scenarios should be chosen for the evaluation. A complete comparison of the systems should compare the performance of the systems on the different scenarios. The set of scenarios in which each system performs well or poorly highlights the requirements that each system does or does not meet. When it is not possible to say which system is best overall, it should be possible to determine which system is best for a particular set of requirements, or which provides the greatest breadth of requirement support even if it is not the best system for any particular set of requirements. The results must be interpreted in terms of the goals of the experiment.