Much of the focus of standardization for functional access has been on computer-to-computer rather than human-to-resource interactions. For example, the Z39.50 protocol "specifies formats and procedures governing the exchange of messages between a client and server enabling the client to request that the server search a database and identify records which meet specified criteria, and to retrieve some or all of the identified records."[1] During the past six months I have been a member of a working group to develop a Z39.50 profile for digital collections. I have been struck by the insistence of most Z39.50 implementers that such profiles should not include any specifications for the client software, let alone the user interface. Important functional issues related to facilitating user access to large heterogeneous information stores spread across dispersed information systems, such as consulting authority files for formulating searches, appear to be outside the scope of standards such as Z39.50. What Z39.50 is about is inter-operability between dispersed information systems. It is the technical protocol that underlies the capability to search any library catalog from a desktop. It allows access to these remote resources in a familiar mode, the interface to one's local public access system. But how well are these local interfaces designed for end user searching and filtering?
Library public access systems, by and large, have been designed to support a set of access functions or common commands that are the result of another NISO standard, Z39.58. This standard defines "a basic set of commands to be used by those who communicate with online information retrieval systems,"[2] that is, a procedural language for human-to-computer interactions. While few would argue that library public access systems are not an important improvement over card catalogs, their greatest strength is simply in providing distributed access to some rather large, relatively homogeneous databases.
The USMARC (ISO 2709) structural content standard has provided an enormously successful framework for creating the highly homogeneous set of library catalog databases. But this standard has promulgated a "shallow" or "thin" data structure with very few, loosely controlled access points. Libraries and archives have pursued a strategy of building upon these shallow data structures to create their catalogs, inventories, and finding aids among other genres of library documentation.
Data content standards such as the Anglo American Cataloging Rules (AACR) and Library of Congress Subject Headings (LCSH) have been subject to changes in terminology over time, policies that lead to inconsistencies such as superimposition, and inter-indexer inconsistency itself. By design these standards tend not to produce "interpretive" content, and they often fail to accomplish the goal of grouping "like" (related) materials together. They particularly fail when different types of documentary materials such as book catalogs, journal indexes, and archival finding aids created by dispersed agencies and originating from distributed databases are accessed as a whole, as a "virtual library."
Libraries should take their cues from multimedia and Web developers by adopting the well-established design practices for graphical user interfaces, simplified markup schemes to create a homogeneous yet flexible presentation, simple navigation through hypertext links and form-fill-in screens, and the Internet telecommunications infrastructure.
The recent acceptance of World Wide Web technology by a hugely diverse user base is, I believe, due to the apparent homogenization of appearance -- by means of a simplified HTML markup -- of content as diverse as library catalogs and automobile advertisements. Generalized markup, postulated by Goldfarb[3], can be used to describe the structure of a document (its elements), its content (attributes), and links to internal elements or external entities. Any kind of information object can be marked up, and information objects can be marked up with "shallow" or "deep" knowledge representations. While superficially homogeneous in appearance, the heterogeneity of the resources represented and accessible on the Web is increasing daily. On the Web, I think, the emphasis is more on content flow than architecture, allowing a mix of shallow and deep knowledge representations accessible and presented through a simple, homogeneous interface.
For the general user, the Web interface is extremely simple to use, reducing the task of resource discovery and navigating vast information spaces to clicking on hypertext links. Simple form fill-in screens rendered with a similar appearance to general Web pages replace command-driven search interfaces. Gateways can be created from Web servers to other information systems such as Z39.50 clients and servers, (e.g., OCLC's Web-Z). While in many instances this appears to be no more than putting the old online library catalog in a "window," it does facilitate access to these important information resources. The functionality of such gateways may become more sophisticated with the recent introduction of downloadable "applet" technology. However, since the success of the Web is due to its simplicity, whether applets implemented using Sun's Java or Microsoft's Blackbird will confound users with more complex functionality or keep and attract new ones is an open question.
Today, Web client software is free (or inexpensive) making it ubiquitously available. But what makes the Web so stupendously successful is the relative ease of getting connected to the underlying TCP/IP-based telecommunications infrastructure of the Internet. These trends hold out the potential to provide seamless access to information located across many separate systems employing many different technical architectures, data structures, and data content. But discovering and navigating among these resources in meaningful ways is still primitive.
Web filtering tools such as intelligent agents make use of lexicons to interactively index and search the Web. "Pruned" thesauri should be adapted for discipline-specific interactive searching agents. High granularity standards should be applied to create discipline-specific indexes and assemble links among public digital resources to form virtual collections.
The Web's success in providing an infrastructure for heterogeneous knowledge representation and navigation has been at the expense of filtering it through agreement on common and controlled access points. Finding information on the Web is not difficult but exhaustive searching for finding "qualified" answers is practically impossible. A variety of Web indexes and filters such as Yahoo, Infoseek, and Lycos have been created but the granularity of these tools is very coarse. It is not well known that Web indexes, and even more so a new generation of Web filtering tools or intelligent agents such as WebCompass, make use of lexicons as knowledge bases for automatically indexing the Web. More robust than the so-called selective dissemination of information or SDI profiles from the Dialog days, intelligent agents operate on user profiles that are specified as lexicons, micro-thesauri, or knowledge bases. These new tools operate through a succession of filtering steps interacting with the user to successively refine their profile and search results. Beginning with a coarse identification of resources through the generic Web indexes, specific sites that are selected by the user are systematically and continuously searched page-by-page for matches. The results are a database of matches constructed at the user site. Intelligent agent technology relies on building lexicons and user profiles to collect useful results, but the process for building such lexicons is generally not well understood (even by librarians) and there are few tools for building them.
Nevertheless, many lexicons and thesauri exist as by-products of the abstracting and indexing services in circumscribed disciplines or fields. While these thesauri tend to be broader in scope than those needed for intelligent agents, they are available today in machine-readable form. Through "pruning," to remove, for example, entries needed in printed indexes, such thesuari could be adapted for query navigation to expand or focus searches on information resources today. At the Getty Art History Information Program, the Art & Architecture Thesaurus (AAT) and Union List of Artist Names (ULAN) are being tested as lexicons for semi-automatic query generation and navigation against a variety of databases accessible on our Web server. This query navigation tool will become available for public access soon. The AAT is also being used in conjunction with the Cultural Heritage Information Online (CHIO) demonstration project to filter searches against museum exhibition catalog and object record databases.[4]
Applying the same high granularity standards used by abstracting and indexing services in specific disciplines or fields, highly selective and analytical indexes to Web resources could be created to aid users in filtering resources. As more information resources are made accessible on public information servers, virtual collections of digital objects located on remote as opposed to local servers can be created. Assembling pointers among related digital objects and arranging for the appropriate intellectual property rights for such virtual collections could evolve into the role libraries and librarians have played in building physical collections of library materials. The development of virtual collections will require the transformation of the rather informal universal resource locator (URL) scheme into a formal public identifier (FPI) mechanism such as the proposed universal resource number (URN), something analogous to the international standard book number (ISBN).
Notes