Section 6
Using the Methodology to Design an Experiment
6.1 Introduction
This section concentrates on the way that the methodology described in this paper
influences experimental design. Note that the basics of experimental design will not
be covered, but can be found in such references as Pfleeger (1994). Also, we will not
give complete descriptions of experiments here because they are not needed to
understand how to apply the methodology.< (The interested reader can find a
description of an experiment designed with the help of the methodology in Damianos
et. al. 1999.)
6.2 Designing an Experiment
To use the methodology,
the experimenters need to decide upon a goal. For example, if the users have a
particular set of requirements and the experimenters are charged with finding which
system best meets those requirements, the experimenters could follow the steps in
Table 14.
|
Table 1. Example Steps for a Top-Down Evaluation
|
|
Step
|
Title
|
Description
|
|
1
|
Identify
candidate systems.
|
Given a set of
requirements, either look at a checklist of system capabilities similar to
what is included in table 5 of this document or, if not available, create a similar
table that includes the capabilities of interest. The result should be a small set of systems that best meet the
basic requirements.
|
|
2
|
Determine tasks.
|
Determine what
tasks will need to be performed with the systems and abstract these tasks
into the task types described in section 3.
|
|
3
|
Select (or
create/modify) a scenario.
|
Examine a
scenario library to see which scenarios support evaluation of the desired
task types. If an appropriate
scenario cannot be found, create a scenario (see section 4). The scenario should be targeted at the
desired evaluation level (e.g., capability, service, and/or technology). For example, if users need to check out
and store documents, experimenters might run the scenario at the service
level, meaning that several different services for document handling are
exercised. Make sure the scenario
includes the salient characteristics of their specific tasks; if not, tailor
the scenario accordingly.
|
|
4
|
Determine
measures.
|
Choose
appropriate measures based on the type of tasks used in the evaluation, the
desired evaluation levels, the hypotheses being tested, and observation
resources available (e.g., automated loggers).
|
|
5
|
Pilot the
experiment.
|
Run the
experiment on a representative, balanced group of users to make sure the
appropriate data are collected.
|
|
6
|
Run the
experiment
|
Repeat the
scenario for each system or subsystem under consideration.
|
The approach just
described corresponds to a top-down evaluation, beginning with the requirements
level and moving to the technology level.
Alternately, the experimenters might be given an alpha version of a new
system and asked to determine the tasks for which it will be useful. This corresponds to a bottom-up approach,
starting with the specific technology and determining the services,
capabilities, and requirements it may support.
The experimenters could follow the steps outlined below in Table 15.
|
Table 2. Example Steps for a Bottom-Up Evaluation
|
|
Step
|
Title
|
Description
|
|
1
|
Compile a list of
services.
|
Create a list of
basic services provided by the system.
|
|
2
|
Determine tasks.
|
Based on the
services the system supports, determine what capabilities this system will
provide. Based on this, devise a list
of tasks the intended users might perform which would be supported by these
services. These tasks should be
expressed in terms of the work tasks described earlier.
|
|
3
|
Select (or
create/tailor) a scenario.
|
Examine a
scenario library to see which scenarios support evaluation of the desired
work task types. If an appropriate scenario cannot be found, create a
scenario. Make sure the scenario
includes the salient characteristics of their specific tasks; if not, tailor
the scenario accordingly.
|
|
4
|
Determine
measures.
|
Using those
listed in section 5 as a starting point, choose appropriate measures based on
the type of tasks used in the evaluation, the desired evaluation levels
(e.g., service, capabilities, and requirements), the hypotheses being tested,
and the observation resources available.
|
|
5
|
Set criteria.
|
Set minimum
accepted thresholds for those measures, based on common sense, “typical”
standards that are currently generally accepted.
|
|
6
|
Pilot the
experiment.
|
Run the
experiment on a representative, balanced group of users to make sure the
appropriate data are collected.
|
|
7
|
Run the scenario.
|
Enact the
scenario for the system being evaluated to ensure that the system meets
minimum acceptable thresholds for providing those services, capabilities,
and/or requirements.
|
6.3 The Map Navigation Experiment
As an example of
how an experiment could be designed using the methodology, consider a task
where two people need to collaborate to share route information. The will need to agree on the best driving
route to use given specific map information, and communicate the information to
another party. This task may be typical
of a military mission, where ground-based units must collaborate to share
information on how to avoid enemy locations and booby-traps when moving troops.
Suppose that the
military planners use MITRE’s CVW to perform these types of route planning
tasks. Making audio available in CVW
under field conditions is difficult, so military planners might be interested
in determining whether audio significantly enhances task performance. This research question can be expressed as
the following hypotheses:
1. People
collaboratively plan a route faster when audio communication is available.
2. People
collaboratively plan a better route when audio communication is available.
3. Participants
will be more satisfied with collaborative route planning when audio
communication is available.
These hypotheses
can be evaluated as part of a top-down evaluation at the service level. Top-down evaluations begin with requirements,
which in this case are the need to exchange information and make plans (step 1
in table 14). These requirements
correspond to the generic task types of planning and information dissemination
(step 2).
The experimenters
could search the EWG scenario repository, and if no appropriate scenarios are
available, create a scenario (step 3).
The scenario should spring from a readily accessible problem domain, in
this case, collaborative problem solving.
The experimenters could craft the scenario to be readily familiar to a
wide range of potential test subjects: route finding using a street map. The scenario would require information
sharing and collaborative planning by providing each participant with private
information. The scenario could be expressed
this way:
A frantic call comes in.
Colleague #1 is late for an important meeting across town. You know of a few obstacles such as road
construction blockages between Colleague #1 and her meeting location, but
Colleague #2 has the latest report from the local news radio. Together, you and Colleague #2 must devise
the quickest route to the meeting. You
must agree on the route and inform Colleague #1.
Thus, two people must work together to determine the
quickest and best route between two locations.
Each would have a copy of the same map, but each would also have
additional (not shared) information about obstacles in the route (for example,
annotations indicating heavy traffic, one-way roads, construction sites, and
turning restrictions). A time
constraint would be implied; the participants could be told that their
colleague was in a hurry to get to a meeting and was awaiting directions. The work would be completed once both
participants had agreed on a route. The
scenario could be tailored (step 3) to ensure that this exercise reflects a
planning task done under time constraints, along with the need to pool
information and provide the results to non-collocated colleagues.
Metrics should be
based on the hypotheses being tested (step 4).
Material presented in Section 5 of this document provides ideas about
how and what to measure. The metrics
presented in Table 16 could be used.
Table 3. Metrics and Corresponding Hypotheses
|
Metric
|
Hypothesis
tested
|
|
Overall task
time
|
1
|
|
Expert
judgment/quality of route
|
2
|
|
User
questionnaire
|
3
|
After performing pilot experiments and refining the experimental design, the
experiment should be run under both audio and non-audio conditions (step 6).
A similar experiment was performed as an initial validation of this
methodology.nIt is described on the IC&V web site at
http://zing.ncsl.nist.gov/nist-icv/experiments/mapnav/mapnav.html