next previous
Up: The VizieR database of


  
4 Standardized description of astronomical catalogues


  \begin{figure}\par\includegraphics[clip]{figure1.eps}\par\end{figure} Figure 1: Example of a documentation ReadMe file

Making use of the data contained in a set of rapidly evolving catalogues, as illustrated by Table 2, raises the problem of accessing and understanding accurately the parameters contained in catalogues which are constantly improved. Typical questions to be addressed are: does the catalogue contain colours; if yes what is their reliability; are they expressed in a well-known standard system; are they taken from other publications or catalogues; how can the associated data file be processed? All these details which describe the data -- the metadata -- are traditionally presented in the introduction of the printed catalogue, or detailed in one or several published papers presenting and/or analyzing the catalogued data.

Metadata play therefore a fundamental role: first the scientists have to get information about the environment of the data in order to make their judgement about the suitability of the data for their project, such as: date and/or method of acquisition, related publications, estimation of the internal and external errors, purpose of the data collection, etc.; but also a minimal knowledge of the metadata is required by the data processing system in order to merge or compare data from different origins -- for instance, the comparison of data expressed in different units requires a unit-to-unit conversion which can be performed automatically only if the units are specified unambiguously.

This need for a description which is readable both by a computer and by a scientist led to a standardized way of documenting astronomical catalogues and tables, promoted by CDS from 1993 in the form of a dedicated ReadMe file associated to each catalogue (Ochsenbein [1994]). An example of such a file is presented in Fig. 1: it is a plain ascii file, quite easy to interpret for a scientist, and at the same time structured enough to be interpreted by a dedicated software. The ReadMe description file starts with a header specifying the basic references -- title, authors, references -- and contains a few key sections introduced by standard titles like Description: or Byte-by-byte Description of file:. Such a file is relatively easy to produce by someone who knows the catalogue contents. The example of Fig. 1 represents the documentation of a very simple catalogue, made of just two data tables, each with a small set of parameters. The output catalogue of the Hipparcos mission[*] is an example of a much more complex catalogue: it is composed of two fundamental large tables (HIP with 105 stars and TYC with 106 stars) and includes a dozen of annex tables, but can still be described by the the same kind of simple standardized documentation.

The most important part of the ReadMe file is the Byte-by-byte Description which details the table structures in terms of formats, units, column naming or labels, existence of data (possibility of unspecified or null values), and brief explanations. Among the conventions, some fundamental parameters are assigned fixed labels like sky coordinates (components of right ascension RA... and declination DE... in Fig. 1); a prefix convention, detailed in Table 5, is also used to specify obvious relations between a value, its mean error, its origin, etc.


 

 
Table 5: Conventions used for label prefixes
Symbol Explanation
a_ label aperture used for parameter label
E_ label mean error (upper limit) on parameter label
e_ label mean error ($\sigma$) on parameter label
f_ label flag on parameter label
l_ label limit flag on parameter label
m_ label multiplicity index on parameter label to resolve ambiguities
n_ label note (remark) on parameter label
o_ label number of observations on parameter label
q_ label quality on parameter label
r_ label reference (source) for parameter label
u_ label uncertainty flag on parameter label
w_ label weight of parameter label
x_ label unit in which parameter label is expressed


This standardized way of presenting the metadata proved to be extremely useful, especially for data checking and format conversion: many errors were detected in old catalogues simply because a general checking mechanism became available. Tools have been developed for generating a Fortran source code which loads the data into memory, or for converting the data into the FITS format which is presently the most "universal'' data format understood by data processing systems in astronomy -- but unfortunately a data format which is not convenient outside this context (see e.g.Grøsbøl et al. [1988]).

During the six years since this standardized way of describing astronomical catalogues has been defined, over 2 600 astronomical catalogues have been described by means of this ReadMe file, and the same conventions have been adopted by the other astronomical data centers and journals for the electronic publication of tables. The present (October 1999) figures of the amount of standardized catalogues are summarized in the rightmost column of Table 2; previous figures were presented in an earlier paper (Ochsenbein [1997]).

It is expected, in the future, that the authors will supply the documentation of their data in this simple form; it is already the case for a very significant fraction of the tables mailed to the CDS, and in order to help the authors, template files as well as a few tips on how to create the ReadMe file are accessible on the Web[*]. The ReadMe files and the data files are then checked by a specialist, who contacts the authors if errors are detected or when changes are necessary to increase the clarity or homogeneity of the description.


next previous
Up: The VizieR database of

Copyright The European Southern Observatory (ESO)