Intent to Develop Federated Metadata Search Capabilities Between PICES Member Country Data Centers
The North Pacific Ecosystem
Metadatabase (NPEM), a project co-sponsored by PICES, wishes to extend the
metadata searched by its users to include metadata from PICES member country
Ocean Data Centers. To implement this feature, called a “federated search”,
staff of the NPEM will work closely with international partners to construct
and coordinate the required data translation dictionaries as well as to install
the necessary technical infrastructure to allow remote computers to
communicate.
The PICES Technical Committee on
Data Exchange (TCODE) has offered to help NPEM find partners. At the Twelfth Annual PICES meeting in
The purpose of this document is
to inform national TCODE representatives of this opportunity and to provide
potential partners with basic knowledge about federating. The technology that NPEM suggests for
federated searches is called Z39.50. It
is proven in wide and varied applications.
It is simple to acquire, install and configure.
What is Z39.50?
Z39.50
is a protocol that specifies data structures and interchange rules. The protocol permits a client computer to
search databases on a server computer and retrieve records that the search
identifies. Implementation of the Z39.50 protocol requires installation of
freely available, open-source software on client and server computers. Users who log on to the client computer then
have transparent access to its data and data on any server computers with which
it exchanges information.
What does it do?
Z39.50 enables
communication between databases on computer systems. This communication could
be between Ocean Data Centers (ODCs). A
PC user in China could access the Japan ODC through the WWW, submit an on-line
search for ‘chum salmon’, and specify that the search examine not only the
holdings of the JODC, but also any other ODCs that share the protocol. Search results are returned to the user in
Real examples of this
type of distributed search include NOAAServer
(http://www.esdim.noaa.gov/NOAAServer/), and
The protocol
specifies Facilities, Services, Attributes, Syntaxes and Profiles. Simplifying hugely, initiation might be a
greeting from the client computer ("Hello, do you speak English?")
and a related response from the server ("Hello. Yes, I do. Let's
talk"). Without this positive two-way dialogue, the session cannot
proceed.
A search request is then transmitted from the client ("OK — can I have everything you have on ‘chum salmon'?"), and is responded to by the server ("I've got 25 records matching your request, and here are the first five. As you didn't specify anything else, I've sent them to you in MARC format, so I hope that's OK.").
Finally, the client asks for the data they want ("25,
eh? Can I have the first ten, please? Oh, and I don't really like MARC, can you
send me unstructured text?"), resulting in the transmission of the records
themselves from the server.
See how a federated search works
by exercising the CIIMMS website that has implemented the Z39.50 protocol. You will search several different databases
for the occurrence of the keyword “salmon”.
The spatial domain of your search is all of Alaska.
Start the procedure by connecting
to the CIIMMS website (http://info.dec.state.ak.us/ciimms/). In the left frame, click on “Search” to go to
the search page, or click “Advanced Search” on the main page. In the “Search for” box, type salmon.
Next to the “Search for” box is the “In:” box. Pull down its menu and
select “Subject/Keywords”. Select the geographic limits of your search by
pulling down the menu in the box on the right side of the page under the
compass whose default content reads “Select Co-ords using Pre-defined Areas”. Select “Alaska Statewide”. So far, you have built a search that will
look for the keyword salmon occurring
in records from all over Alaska. Now it
is time to declare what databases you will search. This is the federated feature. At the bottom of the page is a table
containing two columns. The column on
the left is labeled “Databases searched”, the column on the right is labeled
“Database select buttons”. There are
nine databases from which to search: two CIIMMS databases, two public
libraries, three geospatial data clearinghouses, and two web-page
collections. Boxes may be checked
manually or via buttons provided in the right column for easy selection. For
this search, you will select the “CIIMMS Databases” button in the right
column. Finally, click the “Search”
button near the middle of the page to initiate the search.
During the search, a status page
provides a table listing databases searched, status of the search (successful
or failed), and result count. When the
search is completed, the database name can be clicked to view any records
matching the search criteria.
Information and format will vary with database type. For instance, the geospatial data
clearinghouse databases and the in-house databases will provide metadata in the
FDGC data format, while the library databases will provide information
pertaining to publication in the usual library reference format.
In your search for salmon in the CIIMMS databases, you
probably found no matching records. Now
repeat the search, but this time, select the “Clear/Select All” button to
enable searching against all the databases.
The results of this search should show over 3000 matches in the ARLIS
and Web harvest databases.
Regardless of the type of
information and the format given, the CIIMMS website demonstrates the benefits
that the Z39.50 protocol provides. There
is no need for data or database relocation or infusion. Compatibility is not an issue. The organization that is responsible for the
database will continue to manage the database and the data format the way it
was designed. The only cost would come
from the resources needed to implement the Z39.50 protocol. Once that is in place, there should be very
little cost involved.
Z39.50 has been used extensively for long enough to demonstrate its robustness. As new technologies such as XML and RDF begin to fulfill aspects of the information discovery and retrieval process, work is underway to capitalize upon them, and to tie such technologies more closely to Z39.50. It appears for the moment that Z39.50 is the one effective means of enabling simultaneous queries upon distributed heterogeneous databases, and this remains something that the broader user community wants to be able to do.
NPEM hopes that you will consider
the obvious benefits of joining a North Pacific marine data federation. Please pass this information to your
appropriate national Ocean Date Center official, and let NPEM know who that
official is.