With the development of the World Wide Web, building links between heterogeneous, distributed information, has been an important evolution trend recently, and it is also an important topic for international partnership among service providers.
Historically, the CDS services had developed separately, with different contents, functionalities, database management systems and user interfaces. The World Wide Web opened the possibility to increase the synergy between the services, by building links allowing the users to navigate in a transparent way. Maintenance is a major challenge however, as soon as one tries to build links between distributed, heterogeneous services: any change in the service address, or in the query syntax, breaks links. This is particularly difficult when "anchors'' (links in HTML syntax) are hard-coded in the HTML pages. CDS has solved this problem by developing the Générateur de Liens Uniformes (GLU), a software package which manages a distributed dictionary of resources (Fernique et al. [1998]). Each resource is described by its address, the query syntax, test information, links to description and help files, etc. The GLU Dictionary descriptions are maintained up-to-date by each service provider and shared among all participants. The GLU Resolver allows the service manager to use symbolic names, instead of physical names, for the links; these names are then translated on the fly using the information contained in the GLU Dictionary.
The GLU development has allowed CDS to build reliable links between its own services, to manage mirror copies, and to implement a common presentation of the CDS pages, with homogenized headers.
Moreover, the GLU is being shared with all the partners of the AstroBrowse NASA initiative: information retrieval tools are being developed for providing a homogeneous access to a large list of resources maintained in a common GLU Dictionary (Heikkila et al. [1997]). One of these tools, AstroGLU (Egret et al. [1998]), is developed by CDS. It permits us to search on-line services such as observatory archives, databases, etc., by coordinates, astronomical object names, astronomer names, keywords, etc. In fact, AstroGLU is a Web interface to the GLU Dictionary.
GLU is also used by the French
Centre de Données de la Physique des Plasmas
(CDPP).
Starting with the Bibliographical Star Index (BSI, Ochsenbein [1982]) as early as 1975, CDS has always been dealing with bibliographic data: references and objects citations in published papers are stored in SIMBAD, and published tables in the catalogue service. The last few years have seen a revolution in this domain, with the extremely rapid development of electronic publication, which has led to major conceptual evolutions in the work of journal editors and publishers, and in the usage of published information by scientists.
The collaboration with the journal Astronomy and Astrophysics, for which CDS implements on-line abstracts and tables in close cooperation with the editors, was settled in 1993, very early in the history of electronic publication (Ochsenbein & Lequeux [1995]). As explained in the companion paper by Ochsenbein et al. ([2000]), the standard description of tabular catalogues proposed by CDS in 1994 has since then been accepted by other reference journals and by the collaborating data centers. It is now one of the important exchange standards for astronomy, allowing for data exchange, transformation and checks, complementary to FITS which is widely used for binary and image data. A new standard in XML is presently being implemented for formatting tables (Ochsenbein et al. [2000]), and to facilitate interoperability between services. In particular, this standard has been implemented in VIZIER, and is already used for data ingestion by ALADIN.
The CDS role in the world-wide astronomy bibliographical network, sometimes called Urania (Boyce [1998]), has several aspects (Lesteven et al. [1998]):
The definition of exchange standards such as the bibcode and the standard description of tables, the close collaboration with the journals and the ADS, have permitted an excellent synergy among the on-line bibliographic services. For instance, data exchange, links, exchange and installation of mirror copies, have been implemented between CDS and ADS, which also uses SIMBAD as a name resolver. The on-line versions of Astronomy and Astrophysics and the Supplement Series contain links to the CDS catalogue service, as part of the publication, and to the list of SIMBAD objects for each paper.
The Data Center has also brought new methods to validate the journal contents, complementary to the referees' work: tools have been developed to check the consistency of data in electronic tables, and detected errors are reported directly to the author by CDS before publication, and corrected.
In addition, the development of semi-automatic methods for recognition of astronomical object names in texts is being studied (Lesteven et al. [1998]). This is rendered difficult by the extreme complexity of astronomical nomenclature, but there are potentially innovative applications, such as building links between object names in journal articles and the information contained in SIMBAD. A prototype implementation is operational at CDS in a simple case (object names in abstract keywords). New Astronomy also provides links from object names in articles to SIMBAD and NED, with manual tagging and verification. But many fundamental questions remain to be solved, e.g. the management of links between object names in journals that remain unchanged, and object names contained in databases which may change.
The objective is to use the CDS as a "hub'' to observatory archives: each CDS service, with its own functionalities, allows the user to select the observation he or she would like to check, and to access these observations through an http link to the archive service.
VIZIER is potentially a major tool to access observatory databases: the archive holdings are normally listed in a "log'', i.e. in a table which contains the list of available observations with some additional information, such as the instrument mode, time and duration of observation, target position, target name, PI name, etc. Data in tabular form are very easy to include in VIZIER - one just has to build their description in standard format. A data archive log included in VIZIER can be searched by querying any of its fields, thus allowing the user to select the information of interest. The next step is to build links between the log entries in VIZIER, and the data in the archive: this is already operational for several archives, in collaboration with the data providers, and using the GLU to implement the links. One also has to update evolving logs, for implementing links to on-going space missions or ground-based programs. This has been developed in recent years, and is now fully operational. In November 1999, VIZIER was able to access the FIRST/VLA survey data, and the IUE and HST archives. Discussions are under way with several other projects.
Implementation of links from SIMBAD to data archives is less straightforward, since the logs are usually not easy to cross-identify with the database. This is done on a case-by-case basis. Links to IUE and HEASARC are available at present time: the IUE log has been cross-identified with SIMBAD, taking advantage of the fact that CDS had homogenized the mission target nomenclature on behalf of ESA (Jasniewicz et al. [1990]); for the links to HEASARC, the high energy objects are recognized by checking the list of identifiers for names coming from a high-energy mission (e.g., RX or 1RXS, among others, for ROSAT). More will be done in the future through the implementation in SIMBAD of links pointing to VIZIER.
ALADIN gives access to data archives through their logs in VIZIER, and is also able to display archive images. This is a major evolution towards a comprehensive tool permitting comparison of images at different resolutions or wavelengths, with active links to the original data.
The large surveys underway or planned at different wavelengths, such as DENIS and 2MASS in the infrared, SLOAN at optical wavelengths, the large Schmidt telescope plate catalogues (GSC I and II, USNO, APM, etc.), play an important role, both for multi-wavelength studies, and by providing reference objects. Astronomers thus need easy access to the data of each survey, and also tools to use the data from one survey, together with information from other origins. These needs have recently been summarized in the concept of "Virtual Observatory'' (see e.g. Szalay & Brunner [1998]).
CDS has been involved in active discussions with the major survey projects in the last few years. As explained in Ochsenbein et al. ([2000]), an efficient method to query very large tables by position has been implemented in the CDS catalogue service, with the same user interface as VIZIER, for tables larger than the few million objects manageable in relational systems. The USNO catalogue (520 million objects), the public data of DENIS and 2MASS, have been made rapidly available in this service. The APM catalogue will also be installed soon, as well as GSC II as soon as it will be publicly available. ALADIN gives access to the surveys implemented in VIZIER, and is very useful for data validation and for the assessment of criteria for statistical cross-identification.
Moreover, CDS has been contributing to the DENIS project, by developing an on-line service to distribute public and private information (Derriere et al. [2000]), and data comparison with the information in the other CDS services has already served for data validation. CDS also participates in TERAPIX (data pipeline of the CFHT MEGAPRIME project): it will distribute the result catalogue and probably also summary images.
In addition to the present access to very large catalogues by coordinate queries, evaluation of the usage of commercial Object Oriented database systems for multicriteria access to very large catalogues is under way (Wenger et al. [2000b]). Moreover, the ESO/CDS Data Mining project aims at accessing and combining information stored at ESO or CDS, and to perform cross-correlations in all the parameter space provided by the data catalogues - not restricting the correlations to positional ones (Ortiz et al. [1999]).
Copyright The European Southern Observatory (ESO)