Interface Design For Information Retrieval:
Evaluation Issues

Edie Rasmussen
School of Library and Information Science
University of Pittsburgh

Information retrieval (IR) is a complex process in which documents (however defined) are obtained from a database in response to some user need represented as a query. The early and naive view that this was a simple matching operation has been replaced by an appreciation for the complexity of the process which results from the requirement that the user's information need and the document's content must be described and matched in some way. Each search of a database is an interactive and iterative process which proceeds in a unique fashion to a conclusion which may or may not fill the user's information need.

The complexity of the IR process makes it difficult to measure the contribution of the interaction between user and system to IR performance. In fact, IR research has traditionally been carried out under rigorously controlled conditions, requiring a static (and usually small) database of documents with previously identified subsets of documents relevant to a series of established queries. Such a laboratory environment (which is easily duplicated by the research community) allows researchers to focus on those aspects of system design which do not involve user interaction, such as those relating to the means by which documents are represented or indexed, and the means by which a query is matched against the database and subsequently reformulated to retrieve relevant items. The performance measures traditionally used, recall and precision, are based on predetermined relevance judgments, usually involving subject experts dealing with artificially-generated queries.

There has been little or no place for users in these laboratory experiments. Paradoxically, the degree of control exerted on experimental conditions in these experiments, which contributed to the rigour of their experimental design, also led to criticism of the reliability with which the results reflected those to be found under operational conditions. This concern was one factor leading to the TREC (Text REtrieval Conference) series of experiments. The innovations introduced in TREC experiments have included the use of significantly larger databases, full-text documents, pooled relevance judgments made on the fly, and the creation of an international environment for large-scale testing which facilitates the exchange of programs, techniques and results.

As the TREC experiments have progressed through four annual cycles, there has been an increasing interest in making them more representative of the entire information retrieval process. Although the initial TREC methodology permitted manual or automatic query generation, the experimental design did not really allow for significant human-computer interaction within the search process. Recently, however, there has been interest in introducing an "interactive track" into the TREC experiments in order to design experimental protocols which will allow interaction to be evaluated, although the aim is still to provide laboratory conditions for testing, i.e. although the user is real, the "information need" is artificial (Beaulieu et al, 1996).

The research model that ignores the contribution of user interaction to a successful IR process also ignores the role of the interface. In IR the interface is not simply a device for making the system more user-friendly, more efficient and less error-prone, but it can also contribute significantly to the retrieval process by facilitating query construction, incorporating feedback, modifying the query, and adapting the outcome to the user's needs. The challenge is to find a means by which the contribution of the interface to IR performance can be measured. Most IR research which examines user-system interaction has focussed on behavioural aspects using a case study approach, or been limited to an analysis of the interface. An approach to interface evaluation which isolates the interface from the system, while useful in measuring aspects of usability, does not measure the contribution of the interface in facilitating user-system interaction in terms of retrieval performance. In contrast, the laboratory environment does not do a good job of replicating the impact of a real information need on the process. Moreover, the laboratory environment does not necessarily measure performance in terms which reflect the user's preferences (Su, 1992). Opting for operational tests makes a study more realistic but makes it difficult to relate results to specific causes. The relative merits of using a laboratory experiment or an operational environment are discussed by Tague-Sutcliffe (1992).

Tague-Sutcliffe has pointed out that "[t]he user interface must be evaluated on the basis of how well it meets the user's needs along several dimensions, including informativeness, user friendliness, and response time" (Tague and Schultz, 1989, p. 388). It seems that comprehensive testing of interface design for information retrieval requires several components: usability testing of the interface in isolation; laboratory testing of the potential of the interface to improve performance; and operational testing to examine the pattern of user-system interaction. The development of an integrated approach to the evaluation of interfaces for information retrieval would be a valuable contribution to the field.


  1. M. Beaulieu, S. Robertson, and E. Rasmussen (1996). Evaluating interactive systems in TREC. Journal of the American Society for Information Science 47(1): 85-94.
  2. L. Su (1992). Evaluation measures for interactive information retrieval. Information Processing & Management 28(4): 403- 516.
  3. J. Tague and R. Schultz (1989). Evaluation of the user interface in an information retrieval system: a model. Information Processing & Management 25(4): 377-389.
  4. J. Tague-Sutcliffe (1992). The pragmatics of information retrieval experimentation, revisited. Information Processing & Management 28(4): 467-490.

[ Return to Digital Libraries Workshop ]