Section 6

Using the Methodology to Design an Experiment

6.1 Introduction

This section concentrates on the way that the methodology described in this paper influences experimental design. Note that the basics of experimental design will not be covered, but can be found in such references as Pfleeger (1994). Also, we will not give complete descriptions of experiments here because they are not needed to understand how to apply the methodology.< (The interested reader can find a description of an experiment designed with the help of the methodology in Damianos et. al. 1999.)

6.2 Designing an Experiment

To use the methodology, the experimenters need to decide upon a goal. For example, if the users have a particular set of requirements and the experimenters are charged with finding which system best meets those requirements, the experimenters could follow the steps in Table 14.

Table 1.  Example Steps for a Top-Down Evaluation

Step

Title

Description

1

Identify candidate systems.

Given a set of requirements, either look at a checklist of system capabilities similar to what is included in table 5 of this document or, if not available, create a similar table that includes the capabilities of interest.  The result should be a small set of systems that best meet the basic requirements.

2

Determine tasks.

Determine what tasks will need to be performed with the systems and abstract these tasks into the task types described in section 3.

3

Select (or create/modify) a scenario.

Examine a scenario library to see which scenarios support evaluation of the desired task types.  If an appropriate scenario cannot be found, create a scenario (see section 4).  The scenario should be targeted at the desired evaluation level (e.g., capability, service, and/or technology).  For example, if users need to check out and store documents, experimenters might run the scenario at the service level, meaning that several different services for document handling are exercised.  Make sure the scenario includes the salient characteristics of their specific tasks; if not, tailor the scenario accordingly.

4

Determine measures.

Choose appropriate measures based on the type of tasks used in the evaluation, the desired evaluation levels, the hypotheses being tested, and observation resources available (e.g., automated loggers).

5

Pilot the experiment.

Run the experiment on a representative, balanced group of users to make sure the appropriate data are collected.

6

Run the experiment

Repeat the scenario for each system or subsystem under consideration.

The approach just described corresponds to a top-down evaluation, beginning with the requirements level and moving to the technology level.  Alternately, the experimenters might be given an alpha version of a new system and asked to determine the tasks for which it will be useful.  This corresponds to a bottom-up approach, starting with the specific technology and determining the services, capabilities, and requirements it may support.  The experimenters could follow the steps outlined below in Table 15.

 

Table 2.  Example Steps for a Bottom-Up Evaluation

Step

Title

Description

1

Compile a list of services.

Create a list of basic services provided by the system.

2

Determine tasks.

Based on the services the system supports, determine what capabilities this system will provide.  Based on this, devise a list of tasks the intended users might perform which would be supported by these services.  These tasks should be expressed in terms of the work tasks described earlier.

3

Select (or create/tailor) a scenario.

Examine a scenario library to see which scenarios support evaluation of the desired work task types. If an appropriate scenario cannot be found, create a scenario.  Make sure the scenario includes the salient characteristics of their specific tasks; if not, tailor the scenario accordingly.

4

Determine measures.

Using those listed in section 5 as a starting point, choose appropriate measures based on the type of tasks used in the evaluation, the desired evaluation levels (e.g., service, capabilities, and requirements), the hypotheses being tested, and the observation resources available.

 

5

Set criteria.

Set minimum accepted thresholds for those measures, based on common sense, “typical” standards that are currently generally accepted.

6

Pilot the experiment.

Run the experiment on a representative, balanced group of users to make sure the appropriate data are collected.

7

Run the scenario.

Enact the scenario for the system being evaluated to ensure that the system meets minimum acceptable thresholds for providing those services, capabilities, and/or requirements.

 

6.3 The Map Navigation Experiment

As an example of how an experiment could be designed using the methodology, consider a task where two people need to collaborate to share route information.  The will need to agree on the best driving route to use given specific map information, and communicate the information to another party.  This task may be typical of a military mission, where ground-based units must collaborate to share information on how to avoid enemy locations and booby-traps when moving troops.

Suppose that the military planners use MITRE’s CVW to perform these types of route planning tasks.  Making audio available in CVW under field conditions is difficult, so military planners might be interested in determining whether audio significantly enhances task performance.  This research question can be expressed as the following hypotheses:

1.  People collaboratively plan a route faster when audio communication is available.

2.  People collaboratively plan a better route when audio communication is available.

3.  Participants will be more satisfied with collaborative route planning when audio communication is available.

These hypotheses can be evaluated as part of a top-down evaluation at the service level.  Top-down evaluations begin with requirements, which in this case are the need to exchange information and make plans (step 1 in table 14).  These requirements correspond to the generic task types of planning and information dissemination (step 2). 

The experimenters could search the EWG scenario repository, and if no appropriate scenarios are available, create a scenario (step 3).  The scenario should spring from a readily accessible problem domain, in this case, collaborative problem solving.  The experimenters could craft the scenario to be readily familiar to a wide range of potential test subjects: route finding using a street map.  The scenario would require information sharing and collaborative planning by providing each participant with private information.  The scenario could be expressed this way:

A frantic call comes in.  Colleague #1 is late for an important meeting across town.  You know of a few obstacles such as road construction blockages between Colleague #1 and her meeting location, but Colleague #2 has the latest report from the local news radio.  Together, you and Colleague #2 must devise the quickest route to the meeting.  You must agree on the route and inform Colleague #1.

Thus, two people must work together to determine the quickest and best route between two locations.  Each would have a copy of the same map, but each would also have additional (not shared) information about obstacles in the route (for example, annotations indicating heavy traffic, one-way roads, construction sites, and turning restrictions).  A time constraint would be implied; the participants could be told that their colleague was in a hurry to get to a meeting and was awaiting directions.  The work would be completed once both participants had agreed on a route.  The scenario could be tailored (step 3) to ensure that this exercise reflects a planning task done under time constraints, along with the need to pool information and provide the results to non-collocated colleagues. 

Metrics should be based on the hypotheses being tested (step 4).  Material presented in Section 5 of this document provides ideas about how and what to measure.  The metrics presented in Table 16 could be used.

Table 3.  Metrics and Corresponding Hypotheses

Metric

Hypothesis tested

Overall task time

1

Expert judgment/quality of route

2

User questionnaire

3

After performing pilot experiments and refining the experimental design, the experiment should be run under both audio and non-audio conditions (step 6).

A similar experiment was performed as an initial validation of this methodology.nIt is described on the IC&V web site at http://zing.ncsl.nist.gov/nist-icv/experiments/mapnav/mapnav.html